Comments (4)
I was training a face recognition with SAM (backbone is ResNet, and the loss is arcface). Firstly, the backbone load a pretrianed model, and then train the classifier while freeze the backbone. Finally, I train the whole model with SAM. But something wired happens:
- When I freeze the bn running variable as you recommend, which is right, the first loss is larger than second loss. And the feature obtained from backbone will become much larger (up to 10^9) in the second iteration. And the model cannot converge.
- When I update the bn running variable, which is wrong. The first loss is smaller than second loss. And the feature obtained from backbone becomes normal. But the model still cannot converge.
Hope to get your reply, thanks.
from sam.
Most likely, the BN freezing won't make a significant difference, so I would advise you to not focus on that until you fix the convergence issue. I guess the losses should be of similar magnitude, but I don't see a problem if one is slightly larger than the other one.
Does your model converge with a standard optimizer? Have you tried different hyperparameters?
from sam.
My mode will converge with standard optimizer, but not with sam.
I test for a few times, and the loss become a little more normal. Here is my opinion:
When we set the model.eval()
, the BN will use the running_mean
and running_var
to normalize the input, instead of the statistics of current batch data. Which will make the output of second forward different from first forward. So I change the process as follow:
# first time
loss_fn(model(input), label).backward()
optimizer.first_step(zero_grad=True)
# second time
bn_bak = save_bn_running(model) # save the running_mean and running_var
loss_fn(model(input), label).backward()
optimizer.second_step(zero_grad=True)
reset_bn_running(model, bn_bak)
Before the second forward, I will save the running_mean
and running_var
of BN and set the model
to be train
mode, so the statistics of BN will be the current, which is constant with first forward, and reset the running_mean
and running_var
of BN after second backward to avoid modification of BN statistics.
from sam.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
from sam.
Related Issues (20)
- Any plans to implement the paper "Sharpness-Aware Training for Free"? HOT 3
- "TypeError: __init__() missing 1 required positional argument: 'base_optimizer'" with 'ddp_sharded'' HOT 1
- Any chance for the implementation of the recent Fisher SAM? HOT 3
- Is saving the state by calling .state_dict() sufficient? HOT 4
- sam install HOT 1
- RuntimeError: stack expects a non-empty TensorList?? HOT 1
- RuntimeError: stack expects a non-empty TensorList HOT 2
- i found it hard to implement this optimizer on yolov5.looking forward to s.b. could do me a FAVOR. THX HOT 5
- Training Tips for multiple GPUs may be invalid! HOT 3
- Using SAM with torch.cuda.amp.GradScaler HOT 1
- Setting Rho == 0 is NOT equivalent to running the base optimizer HOT 1
- Wrong Adaptive mode? HOT 1
- SAM yolov5 HOT 1
- Has anyone reproduce the ViT on ImageNet results using this torch implementation? HOT 2
- bayesian-sam HOT 1
- Readme.MD Usage typo issue HOT 1
- SAM doesn't seem to be doing well HOT 2
- `model.no_sync()` should include the forward pass HOT 1
- bypass_bn is missing HOT 1
- Using the step function with closure HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sam.