The official implementation (TensorFlow) of MobileNet v2 use RMSprop as the optimizer. However, I failed to reproduce the result with RMSprop in PyTorch. The loss seems not decrease at all with RMSprop.
Any suggestions are welcomed!
The training code is based on https://github.com/pytorch/examples/tree/master/imagenet with minimum modification (use different optimizers).
I conduct three experiments with different optimizers:
- SGD (./sgd.sh)
- RMSprop (./rmsprop.sh)
- RMSprop closer to TensorFlow (./rmsprop_tf.sh), thanks to @vincentqb's advice here.
The training logs are uploaded to Google Drive.
I also tried smaller learning rate and larger batch size on RMSprop, but it still doesn't work.
- https://github.com/marvis/pytorch-mobilenet reports the RMSprop result on MobileNet v1 (does not work at all).