Hi, Thank you for your great work! I am currently trying to run your

How to adjust learning rate for larger batch size about imagenet21k HOT 2 CLOSED

alibaba-miil commented on July 18, 2024

How to adjust learning rate for larger batch size

from imagenet21k.

Comments (2)

mrT23 commented on July 18, 2024

The answer is a bit tricky.
some initial assumptions:

transformed are harder to train than conv nets, and are more sensitive to hyperparameters (including lr)
there is a major difference in the optimal learning rate when you train a network from scratch, compared to when you fine-tune it from a previous pretraining.
if you don't have a lots (!) of GPU resources, hard to do hyperparameters tuning and searching on imagenet 21K.

given 1-3, the scheme I suggested in the article is to initially train a net on imagenet1K, and then transfer it to imagenet21K for 80 epochs. Since this is transfer learning, I used regular adam with lr=3e-4. i believe that this scheme is quite robust, on TResNet-m i trained with a large batch (4168). with ViT the batch was smaller (648).

you can try other learning rates, I think a reasonable range for adam optimizer is 2e-4 to 5e-4.
DeiT paper talks about training from scratch, which is different. notice that they suggest 1000 epochs !

from imagenet21k.

Yuzz1020 commented on July 18, 2024

Thank you for the fast response! I'll try with the lr you suggested.

from imagenet21k.

How to adjust learning rate for larger batch size about imagenet21k HOT 2 CLOSED

Comments (2)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent