Comments (1)
I mostly implemented this, and tried a few combinations in some NLI tasks...nothing exhaustive, just looking at how it runs, how the lr changes and so on. So I don't have a real combination. For research, if you aren't focusing on optimizers, it will be probably better to just go with Adam or AdamW because of how standard it is. For application purposes, it's hard to say. I think overall:
(1) RAdam is becoming more popular. But it may be equivalent to a heuristic based warmup method: https://arxiv.org/abs/1910.04209v1 . And Adam with properly tuned warm up and everything still may be better or equivalent.
(2) Lookahead is probably a decent technique to use. First, it is a general technique that can be applied to almost any optimizer, and it was published in NeurIPs. But it may make computation heavier so that is a trade off to keep in mind. So this is something you can add on whichever optimizer you use.
(3) I think AMSGrad usually have mixed results for day to day results. Nostalgic Adam, PAdam all seems to demonstrate better results in their respective paper, you can just try to use the newer one which shows improvement over the older one. But if lacking time, I would just recommend sticking to 'good-ol' adam or better adamW than experimenting with all of them, but if you Nostalgic Adam, or PAdam can be some alternatives to consider as well. QHAdam may be ok too. The repo allows "combining" different techniques from different papers, but I don't have any results of those. So you have to experiment with them, or just stick with standard stuffs.
(4) Other methods I didn't mention, may not be as well established in the literature. Again you can experiment with them but I don't really have anything extra on them besides what the papers already mostly show in the links.
from demonrangeroptimizer.
Related Issues (6)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from demonrangeroptimizer.