Comments (6)
Thanks for the suggestion! We didn't have a lot of prior experience with SSL so we chose to match the defaults of the original SimCLR/MoCo papers. Do you know of any papers that demonstrate that no weight decay works better? I'm surprised Google/FAIR didn't find this during their hyperparameter tuning.
from torchgeo.
I don't think it is mentioned in the SimCLR paper but it is in the code here: https://github.com/google-research/simclr/blob/383d4143fd8cf7879ae10f1046a9baeb753ff438/tf2/model.py#L40-L42
BYOL does the same: https://github.com/google-deepmind/deepmind-research/blob/f5de0ede8430809180254ee957abf36ed62579ef/byol/byol_experiment.py#L191-L195
But I just noticed that you are not using LARS optimizer and in SimCLR they only did this for LARS. For the other optimizers they didn't use weight decay at all, but I am not sure if they benchmarked their code with these settings.
from torchgeo.
Yeah, PyTorch doesn't have a LARS optimizer. Let me do some digging and figure out where I found these weight decay values.
from torchgeo.
Okay, finally had time to look into this.
SimCLR
I don't think it is mentioned in the SimCLR paper
Weight decay is mentioned in:
For the other optimizers they didn't use weight decay at all
You are correct that weight decay is not used in the optimizer, although it is used in the loss function.
MoCo
Weight decay is mentioned in:
It isn't mentioned in MoCo v2, although the code for v2 is largely the same as v1. The value of weight decay for v3 is not mentioned in the paper, just that it was used.
In the code base, weight decay is used with SGD in v1/v2, LARS in v3, and AdamW in v3.
from torchgeo.
If you want to submit a PR that removes weight decay from our SimCLR optimizer and adds it to our loss function, I would be happy to accept it. I'm a little afraid to remove it entirely though.
from torchgeo.
I think this issue can be closed. If users want to reproduce the original MoCo/SimCLR papers, they can use our current defaults. If they want to try to improve performance, they can use weight_decay=0
.
from torchgeo.
Related Issues (20)
- Incompatible image size with RandomGeoSampler HOT 3
- Easier way to use Data Processing steps outside of datamodule HOT 4
- Benchmarking of all pre-trained weights HOT 4
- Add instructions on downloading the DeepGlobeLandCover dataset HOT 5
- The new lightly release breaks BaseTask with timm imports HOT 5
- Migrate from Radiant MLHub to Source Cooperative HOT 13
- Datamodule augmentation defaults HOT 8
- NCCM checksum error HOT 6
- Support additional SatlasPretrain models. HOT 6
- Document significance of macro vs micro averaging HOT 3
- Add BalancedRandomGeoSampler balancing positives and negatives HOT 2
- Add support for Lightning Streaming Dataset HOT 14
- OSCDDataModule initialises with batch_size 1, ignoring the configured batch_size HOT 4
- Add `ignore_index` support for Jaccard Loss HOT 1
- Unpin torch, use a min or range? HOT 4
- trainers.segmentation JaccardLoss receiving num_classes, should be a List[int]? HOT 8
- GeoDataset: non-deterministic behavior HOT 5
- Sentinel 2 dataset can't see files downloaded from Copernicus Browser - filename doesn't fit regex HOT 1
- Errors & improvements in Metrics descriptions HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from torchgeo.