Comments (9)
I have also tried parameters from the paper (batch size 2048, lr=3e-8, etc.). The finetunning is still exploding (loss quickly to 0 and then NaN).
[12-07 18:37:04] (nstream_imagenet/main.py, line 174)=> [ep0 it 3/626] L: 0.6937 Acc: 0.00 lr: 3.1e-05~3.8e-04 Remain: 3:26:47
[12-07 18:40:10] (nstream_imagenet/main.py, line 174)=> [ep0 it313/626] L: 0.0078 Acc: 0.00 lr: 5.5e-04~6.7e-03 Remain: 0:04:24
[12-07 18:43:23] (nstream_imagenet/main.py, line 174)=> [ep0 it625/626] L: 0.0059 Acc: 9.72 lr: 1.1e-03~1.3e-02 Remain: 0:00:00
[12-07 18:44:04] (nstream_imagenet/main.py, line 84)=> [ep0/300] Max (Last) Acc: 8.97 (8.97 o 50000.0) EMA: 0.13 (0.01 o 50000.0) Ep cost: 500.25s, Ev cost: 23.38, Remain: 1 day, 17:32:55, Finish @ 12-09 05:16
[12-07 18:44:06] (nstream_imagenet/main.py, line 60)=> [loader_train.sampler.set_epoch(1)]
[12-07 18:44:13] (nstream_imagenet/main.py, line 174)=> [ep1 it 3/626] L: 0.0059 Acc: 15.62 lr: 1.1e-03~1.3e-02 Remain: 0:18:02
[12-07 18:47:18] (nstream_imagenet/main.py, line 174)=> [ep1 it313/626] L: 0.0055 Acc: 21.09 lr: 1.6e-03~1.9e-02 Remain: 0:03:11
[12-07 18:50:15] (nstream_imagenet/main.py, line 174)=> [ep1 it625/626] L: 0.0056 Acc: 23.61 lr: 2.1e-03~2.6e-02 Remain: 0:00:00
[12-07 18:50:15] (nstream_imagenet/main.py, line 84)=> [ep1/300] Max (Last) Acc: 8.97 (8.97 o 50000.0) EMA: 0.13 (0.01 o 50000.0) Ep cost: 370.16s, Ev cost: -, Remain: 1 day, 6:38:28, Finish @ 12-08 18:28
[12-07 18:50:17] (nstream_imagenet/main.py, line 60)=> [loader_train.sampler.set_epoch(2)]
[12-07 18:50:28] (nstream_imagenet/main.py, line 174)=> [ep2 it 3/626] L: 0.0055 Acc: 23.44 lr: 2.1e-03~2.6e-02 Remain: 0:29:35
[12-07 18:53:36] (nstream_imagenet/main.py, line 174)=> [ep2 it313/626] L: 0.0071 Acc: 13.28 lr: 2.6e-03~3.2e-02 Remain: 0:03:18
[12-07 18:56:33] (nstream_imagenet/main.py, line 174)=> [ep2 it625/626] L: 0.0069 Acc: 5.56 lr: 3.2e-03~3.9e-02 Remain: 0:00:00
[12-07 18:56:33] (nstream_imagenet/main.py, line 84)=> [ep2/300] Max (Last) Acc: 8.97 (8.97 o 50000.0) EMA: 0.13 (0.01 o 50000.0) Ep cost: 376.92s, Ev cost: -, Remain: 1 day, 7:05:45, Finish @ 12-08 19:02
[12-07 18:56:34] (nstream_imagenet/main.py, line 60)=> [loader_train.sampler.set_epoch(3)]
[12-07 18:56:48] (nstream_imagenet/main.py, line 174)=> [ep3 it 3/626] L: 0.0077 Acc: 0.78 lr: 3.2e-03~3.9e-02 Remain: 0:34:59
[12-07 18:59:55] (nstream_imagenet/main.py, line 174)=> [ep3 it313/626] L: 62.9384 Acc: 0.00 lr: 3.7e-03~4.5e-02 Remain: 0:03:20
[12-07 19:02:52] (nstream_imagenet/main.py, line 174)=> [ep3 it625/626] L: 317.5974 Acc: 0.00 lr: 4.2e-03~5.1e-02 Remain: 0:00:00
[12-07 19:02:52] (nstream_imagenet/main.py, line 84)=> [ep3/300] Max (Last) Acc: 8.97 (8.97 o 50000.0) EMA: 0.13 (0.01 o 50000.0) Ep cost: 378.86s, Ev cost: -, Remain: 1 day, 7:09:03, Finish @ 12-08 19:11
[12-07 19:03:08] (nstream_imagenet/main.py, line 174)=> [ep4 it 3/626] L: 267.8481 Acc: 0.00 lr: 4.2e-03~5.1e-02 Remain: 0:38:13
[12-07 19:06:16] (nstream_imagenet/main.py, line 174)=> [ep4 it313/626] L: 352016.5938 Acc: 0.00 lr: 4.7e-03~5.8e-02 Remain: 0:03:21
[12-07 19:09:15] (nstream_imagenet/main.py, line 174)=> [ep4 it625/626] L: 3266225152.0000 Acc: 0.00 lr: 5.3e-03~6.4e-02 Remain: 0:00:00
[12-07 19:09:15] (nstream_imagenet/main.py, line 84)=> [ep4/300] Max (Last) Acc: 8.97 (8.97 o 50000.0) EMA: 0.13 (0.01 o 50000.0) Ep cost: 382.58s, Ev cost: -, Remain: 1 day, 7:21:01, Finish @ 12-08 19:30
[12-07 19:09:31] (nstream_imagenet/main.py, line 174)=> [ep5 it 3/626] L: 3494824192.0000 Acc: 0.00 lr: 5.3e-03~6.4e-02 Remain: 0:38:32
[12-07 19:12:40] (nstream_imagenet/main.py, line 174)=> [ep5 it313/626] L: nan Acc: 1.56 lr: 5.3e-03~6.4e-02 Remain: 0:03:22
[12-07 19:15:39] (nstream_imagenet/main.py, line 174)=> [ep5 it625/626] L: nan Acc: 0.00 lr: 5.3e-03~6.4e-02 Remain: 0:00:00
from spark.
Hi @ds2268, the 800-ep pre-training seems normal. The fine-tuning loss before explosion (5e-3, close to zero) is also as expected, since we are using BCE loss instead of CE. (ps: we never observed any loss explosion problem in all of our finetuning experiments)
Have you used mixed precision?
I also found that the default batch size should be 2048, maybe you can also try this.
from spark.
I have tried 2048 configs from the paper, with no success. I think that downstream ImageNet is not using mixed precision. I could only find apex libs in downstream mmdet.
from spark.
Could you try running with timm==0.5.4?
from spark.
I am already running with:
timm 0.5.44
torch 1.12.0
torchvision 0.13.1
from spark.
Looks like the issue with ResNet-50 is related to #27
from spark.
Honestly I have no idea what the problem is with the fine-tuning code (yes #27 is similar). Maybe you can try again with base_lr < 0.002. I will run this too.
from spark.
@keyu-tian, I have now pretrained ConvNext-S model (800 epochs) and performed ImageNet finetuning:
It's not yet finished (140 epochs / 200), but looks like it's working on ConvNext-S. The reported results for ConvNext-S are 84.1. I will probably not reach it by 200 epochs, but probably due to only 800 epochs pretraining.
The problem is then really just with the Resnet-50 stability.
from spark.
@ds2268 thanks for your verification. So it should be LAMB or BCE causing the problem.
Currently I don't have enough GPU or time to debug more, you can start with convnext, or try to use a smaller finetune learning rate of resnet50, or try resnet101.
ps: it is always recommended to use the default hyperparameters in downstream_imagenet/args.py, not from the paper (which may be old) or elsewhere.
from spark.
Related Issues (20)
- 对比convnextv2 HOT 1
- reducing pre-training to 200 epochs HOT 9
- Tutorial for finetune on my own dataset HOT 1
- Are there any plans to make a port to tensorflow and Keras? HOT 1
- there is no requirements.txt file. HOT 1
- SparK for semantic segmentation HOT 3
- Resuming ImageNet fine-tuning HOT 2
- About sparse convolution HOT 4
- How to transfer this method to 3D situation. HOT 1
- ConvNext B for reconstruct images HOT 3
- recommend a great library designed for sparse tensors HOT 1
- Can SparK be used for few-shot learning? HOT 2
- SparseBatchNorm2d can not mask correctly ? HOT 3
- A Code Issue About “pretrain/main.py” HOT 2
- SparK ResNet and global feature interaction HOT 8
- ConvNext implementation performance HOT 4
- Increasing batch size HOT 1
- Necessity of Mask Tokens
- The versions of mmdet and mmcv?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from spark.