Giter Club home page Giter Club logo

Comments (1)

JRC1995 avatar JRC1995 commented on June 9, 2024 1

I mostly implemented this, and tried a few combinations in some NLI tasks...nothing exhaustive, just looking at how it runs, how the lr changes and so on. So I don't have a real combination. For research, if you aren't focusing on optimizers, it will be probably better to just go with Adam or AdamW because of how standard it is. For application purposes, it's hard to say. I think overall:

(1) RAdam is becoming more popular. But it may be equivalent to a heuristic based warmup method: https://arxiv.org/abs/1910.04209v1 . And Adam with properly tuned warm up and everything still may be better or equivalent.

(2) Lookahead is probably a decent technique to use. First, it is a general technique that can be applied to almost any optimizer, and it was published in NeurIPs. But it may make computation heavier so that is a trade off to keep in mind. So this is something you can add on whichever optimizer you use.

(3) I think AMSGrad usually have mixed results for day to day results. Nostalgic Adam, PAdam all seems to demonstrate better results in their respective paper, you can just try to use the newer one which shows improvement over the older one. But if lacking time, I would just recommend sticking to 'good-ol' adam or better adamW than experimenting with all of them, but if you Nostalgic Adam, or PAdam can be some alternatives to consider as well. QHAdam may be ok too. The repo allows "combining" different techniques from different papers, but I don't have any results of those. So you have to experiment with them, or just stick with standard stuffs.

(4) Other methods I didn't mention, may not be as well established in the literature. Again you can experiment with them but I don't really have anything extra on them besides what the papers already mostly show in the links.

from demonrangeroptimizer.

Related Issues (6)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.