Hi, guys! The codes of multi-scale implementation and data argumentation have been upd

Multi-Scale Implementation about caffe-vdsr HOT 10 CLOSED

huangzehao commented on August 16, 2024 1

Multi-Scale Implementation

from caffe-vdsr.

Comments (10)

jiaxue-ai commented on August 16, 2024

Thanks for the update, I have some naive questions regrading your solver.prototxt setting

why you set the base_lr: 0.0001, have you tried a higher learning rate?
you experiment with this learning rate through whole experiment, is that correct?

from caffe-vdsr.

jiaxue-ai commented on August 16, 2024

another concern is the time consuming. In original paper, in the paper, the author said it took them 4 hours to finish experiment with Titan Z.
I experiment with your code for over 1 hour, and it only goes to iteration 10000, which means it needs 93 more hours to get your max_iter: 935840 ? my GPU is also Titan Z

from caffe-vdsr.

huangzehao commented on August 16, 2024

(1) I use Adam instead of SGD, 0.0001 or 0.001 is general choice, and you don't need to decrease learning rate.
(2) About time consuming, there are some differences between original paper and this implementation.
First, the author used MatconvNet and this implementation used Caffe. There maybe some speed differences.
Second, the author didn't mention how many gpus they used for training.
Third, it is impossible to train 80 epochs (9960 iterations with batch size 64) in 4 hours with single Titan Z. You can try with SGD and compute the time.
Thanks.

from caffe-vdsr.

jiaxue-ai commented on August 16, 2024

why you use Adam instead of SGD here? will this achieves better performance? or faster training?

from caffe-vdsr.

huangzehao commented on August 16, 2024

(1) First, if you use SGD with high learning rate like original paper, you need to set clip_gradient, and I can not achieve good performance with a simple setting of the value of clip_gradient. Adjust the value of clip_gradient is time consuming and meaningless.
(2) Second, the convergence of Adam is faster than SGD in begining.
Thanks.

from caffe-vdsr.

jiaxue-ai commented on August 16, 2024

got it, thanks

from caffe-vdsr.

jiaxue-ai commented on August 16, 2024

my experiment goes to iteration 85000 now, according to my experiment, the model saturate at around iteration 20000 (I just experiment with factor 4). In your experiment, when will the model saturate? did you get the max_iter: 935840 ?

from caffe-vdsr.

huangzehao commented on August 16, 2024

Hi,
(1) since the number of iteration is depend on the number of samples, I recommend you to convert the iteration number to epoch number, thus epoch = iteration * batch_szie / sample_numbers.
In my multi scale experiments, the total sample number is about 748000. So I set max_iter = 80 (epoch) * 748000 / 64 (batch_size) = 935000.
If you only experiment with factor 4, the sample number should be about 748000 / 3 = 250000. And you should set max_iter = 80 (epoch) * 250000 / 64 (batch_size) = 312500.
(2) In my experiments, the model saturate at about 20-30 epoch (you can check the training log for details). The final model I uploaded is about 50 epoch. But I didn't do single scale experiment with Adam. And I recommend you to test the PSNR (no test loss) of the trained model to check if the model is saturated.
(3) The total time in my experiment (80 epoch) is about 31 hours with single Titan X (Old).

My english is not very good, hope this helps. Thanks.

from caffe-vdsr.

jiaxue-ai commented on August 16, 2024

that's really helpful, thanks

from caffe-vdsr.

huangzehao commented on August 16, 2024

I have uploaded the log of my experiments. You can check it for more details. @mrxue1993

from caffe-vdsr.

Multi-Scale Implementation about caffe-vdsr HOT 10 CLOSED

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent