Comments (2)
Are you sure you are converting within 1-2 epochs with Adam?
In any case, this technique should work with Adam as well, since it will just scale Adams final gradient computations. However, the improvements won't be as dramatic as using simple SGD. However, small improvements will still be there due to ensemble effect.
If you are converting that fast, then you can set number of epochs to 10 or 20 instead of 300, and get a snapshot every 2-4 epochs.
I personally have stopped using this technique since the relative performance improvement is not worth the linear increase in evaluation time.
Since DenseNet FCN has an already large inference time, I would not suggest using this technique for production systems. For research, it is fine.
To answer your questions in the order you stated above :
- Yes, it can be used with Adam
- Ofcourse. It was originally meant to help SGD in the first place.
- I've only tried on Cifar 10. There, the default settings of the paper worked the best.
- Yes you can initialize the model with pretrained weights. The improvements won't be much however even after ensembled predictions.
- It does not affect training time. However, it linearly increases prediction time if you use ensemble averaging over all snapshots (by a factor of how many snapshots there are)
from snapshot-ensembles.
Do note that for point 4, it can cause a bad initialization for the full dataset training when the distribution of the full dataset is different from the distribution of the smaller dataset.
To partially avoid this, try to sample your smaller dataset in the same proportions as the large dataset. But with semantic Segmentation, and such large quantities of data, I think the effect of poor initialization for the full set is going to be very small.
from snapshot-ensembles.
Related Issues (16)
- Show Example with Sequential or Functional API HOT 2
- Quick Refresher HOT 3
- Alpha Zero HOT 2
- May I use this checkback with keras 2.1.5 using python 2.7 HOT 1
- model accuracy
- this result is worse HOT 1
- Update per batch, not per epoch
- how to predict the ground truth image with weighted ensemble?
- Getting Error HOT 25
- Jupyter notebook error HOT 2
- AttributeError: 'list' object has no attribute 'set_model' HOT 3
- How to do ensemble prediction? HOT 5
- How well does this work in practice? HOT 2
- About the performance of W-16-4
- load a weight file containing 30 layers into a model with 33 layers.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from snapshot-ensembles.