Giter Club home page Giter Club logo

Comments (6)

adamjstewart avatar adamjstewart commented on May 26, 2024

Thanks for the suggestion! We didn't have a lot of prior experience with SSL so we chose to match the defaults of the original SimCLR/MoCo papers. Do you know of any papers that demonstrate that no weight decay works better? I'm surprised Google/FAIR didn't find this during their hyperparameter tuning.

from torchgeo.

guarin avatar guarin commented on May 26, 2024

I don't think it is mentioned in the SimCLR paper but it is in the code here: https://github.com/google-research/simclr/blob/383d4143fd8cf7879ae10f1046a9baeb753ff438/tf2/model.py#L40-L42

BYOL does the same: https://github.com/google-deepmind/deepmind-research/blob/f5de0ede8430809180254ee957abf36ed62579ef/byol/byol_experiment.py#L191-L195

But I just noticed that you are not using LARS optimizer and in SimCLR they only did this for LARS. For the other optimizers they didn't use weight decay at all, but I am not sure if they benchmarked their code with these settings.

from torchgeo.

adamjstewart avatar adamjstewart commented on May 26, 2024

Yeah, PyTorch doesn't have a LARS optimizer. Let me do some digging and figure out where I found these weight decay values.

from torchgeo.

adamjstewart avatar adamjstewart commented on May 26, 2024

Okay, finally had time to look into this.

SimCLR

I don't think it is mentioned in the SimCLR paper

Weight decay is mentioned in:

For the other optimizers they didn't use weight decay at all

You are correct that weight decay is not used in the optimizer, although it is used in the loss function.

MoCo

Weight decay is mentioned in:

It isn't mentioned in MoCo v2, although the code for v2 is largely the same as v1. The value of weight decay for v3 is not mentioned in the paper, just that it was used.

In the code base, weight decay is used with SGD in v1/v2, LARS in v3, and AdamW in v3.

from torchgeo.

adamjstewart avatar adamjstewart commented on May 26, 2024

If you want to submit a PR that removes weight decay from our SimCLR optimizer and adds it to our loss function, I would be happy to accept it. I'm a little afraid to remove it entirely though.

from torchgeo.

adamjstewart avatar adamjstewart commented on May 26, 2024

I think this issue can be closed. If users want to reproduce the original MoCo/SimCLR papers, they can use our current defaults. If they want to try to improve performance, they can use weight_decay=0.

from torchgeo.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.