Unofficial implementation of 'Ranger - Synergistic combination of RAdam + LookAhead for the best of both' using mxnet gluon.
Original medium article: https://medium.com/@lessw/new-deep-learning-optimizer-ranger-synergistic-combination-of-radam-lookahead-for-the-best-of-2dc83f79a48d
Official implementation: https://github.com/lessw2020/Ranger-Deep-Learning-Optimizer
Import 'ranger_optimizer' file
import ranger_optimizer
Create optimizer object
optimizer = ranger_optimizer.Ranger(learning_rate=0.001, wd=0)
Create trainer object
trainer = gluon.Trainer(net.collect_params(), optimizer)
learning_rate : float, default 0.001
The initial learning rate. If None, the optimization will use the
learning rate from lr_scheduler. If not None, it will overwrite
the learning rate in lr_scheduler. If None and lr_scheduler
is also None, then it will be set to 0.001 by default.
wd : float, default 0
Weight decay value.
alpha : float, default 0.5
Scale the subtraction between 'fast learner' and 'slow learner'.
k : int, default 6
Update 'slow learner' after 6 updates.
beta1 : float, default 0.9
Exponential decay rate for the first moment estimates.
beta2 : float, default 0.999
Exponential decay rate for the second moment estimates.
epsilon : float, default 1e-8
Small value to avoid division by 0.
use_gc : bool, default True
Center the gradients by subtract mean
gc_conv_only : bool, default False
Whether to center only convolution layers or everything