Giter Club home page Giter Club logo

Comments (10)

jiaxue-ai avatar jiaxue-ai commented on August 16, 2024

Thanks for the update, I have some naive questions regrading your solver.prototxt setting

  1. why you set the base_lr: 0.0001, have you tried a higher learning rate?
  2. you experiment with this learning rate through whole experiment, is that correct?

from caffe-vdsr.

jiaxue-ai avatar jiaxue-ai commented on August 16, 2024

another concern is the time consuming. In original paper, in the paper, the author said it took them 4 hours to finish experiment with Titan Z.
I experiment with your code for over 1 hour, and it only goes to iteration 10000, which means it needs 93 more hours to get your max_iter: 935840 ? my GPU is also Titan Z

from caffe-vdsr.

huangzehao avatar huangzehao commented on August 16, 2024

(1) I use Adam instead of SGD, 0.0001 or 0.001 is general choice, and you don't need to decrease learning rate.
(2) About time consuming, there are some differences between original paper and this implementation.
First, the author used MatconvNet and this implementation used Caffe. There maybe some speed differences.
Second, the author didn't mention how many gpus they used for training.
Third, it is impossible to train 80 epochs (9960 iterations with batch size 64) in 4 hours with single Titan Z. You can try with SGD and compute the time.
Thanks.

from caffe-vdsr.

jiaxue-ai avatar jiaxue-ai commented on August 16, 2024

why you use Adam instead of SGD here? will this achieves better performance? or faster training?

from caffe-vdsr.

huangzehao avatar huangzehao commented on August 16, 2024

(1) First, if you use SGD with high learning rate like original paper, you need to set clip_gradient, and I can not achieve good performance with a simple setting of the value of clip_gradient. Adjust the value of clip_gradient is time consuming and meaningless.
(2) Second, the convergence of Adam is faster than SGD in begining.
Thanks.

from caffe-vdsr.

jiaxue-ai avatar jiaxue-ai commented on August 16, 2024

got it, thanks

from caffe-vdsr.

jiaxue-ai avatar jiaxue-ai commented on August 16, 2024

my experiment goes to iteration 85000 now, according to my experiment, the model saturate at around iteration 20000 (I just experiment with factor 4). In your experiment, when will the model saturate? did you get the max_iter: 935840 ?

from caffe-vdsr.

huangzehao avatar huangzehao commented on August 16, 2024

Hi,
(1) since the number of iteration is depend on the number of samples, I recommend you to convert the iteration number to epoch number, thus epoch = iteration * batch_szie / sample_numbers.
In my multi scale experiments, the total sample number is about 748000. So I set max_iter = 80 (epoch) * 748000 / 64 (batch_size) = 935000.
If you only experiment with factor 4, the sample number should be about 748000 / 3 = 250000. And you should set max_iter = 80 (epoch) * 250000 / 64 (batch_size) = 312500.
(2) In my experiments, the model saturate at about 20-30 epoch (you can check the training log for details). The final model I uploaded is about 50 epoch. But I didn't do single scale experiment with Adam. And I recommend you to test the PSNR (no test loss) of the trained model to check if the model is saturated.
(3) The total time in my experiment (80 epoch) is about 31 hours with single Titan X (Old).

My english is not very good, hope this helps. Thanks.

from caffe-vdsr.

jiaxue-ai avatar jiaxue-ai commented on August 16, 2024

that's really helpful, thanks

from caffe-vdsr.

huangzehao avatar huangzehao commented on August 16, 2024

I have uploaded the log of my experiments. You can check it for more details. @mrxue1993

from caffe-vdsr.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.