Comments (5)
Totally true .. I guess the batch size of 100 gives "good enough" statistics for the problem so I forgot to add it in.
Will try to update with a version that stores population statistics and properly uses those at test time.
from bnlstm.
TF slim has population statistics recurrent batch norm you can check out
I like your implementation style more though since it is elegant and in
pure TF
You might also want to play around with the random permutation mnist task
since it's only an extra line of code :)
On Tuesday, August 30, 2016, Olav Nymoen [email protected] wrote:
Totally true .. I guess the batch size of 100 gives "good enough"
statistics for the problem so I totally forgot about adding it in.Will try to update with a version that stores population statistics and
properly uses those at test time.—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#2 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AGBoHiDDVeuf_9zbZCYfqN8JSCekBDZNks5qlDeBgaJpZM4JvF4O
.
from bnlstm.
I've tried running with population statistics a bit now with really poor results on sequential mnist. Same results when using slim.batch_norm.
The model seems to be dependent on the batch normalization.
To test I tried using local batch statistics, but increasing the batch from 100 to 1000. That works better than full population statistics, but much worse than batch statistics for a batch of 100.
The graphs in the paper looks very much like mine when using local batch statistics, however they explicitly mention using population statistics for their final results, so I'm not sure what's going on in my code.
from bnlstm.
I've been trying the same recently, and share similar frustrations as you.
I think I found out what's going on and it is not pretty. Basically in my
implementation and I think also with slim, the pop statistics is recorded
at one layer and assumed to be the same for each time step of the sequence.
But I think in the paper, the actual statistics are recorded separately at
each time step. So for MNIST there would be 784 set of statistics. He
shows in the paper that all the statistics converge over time for certain
tasks (I guess for text given the distribution must be time invariant) but
I suspect for MNIST, the statistics over time will not converge...
I also got really good results just using vanilla LSTM but initializing the
hidden to hidden layer to the exact identity (not .95 identity)
On Saturday, September 3, 2016, Olav Nymoen [email protected]
wrote:
I've tried running with population statistics a bit now with really poor
results on sequential mnist. Same results when using slim.batch_norm.The model seems to be dependent on the batch normalization.
To test I tried using local batch statistics, but increasing the batch
from 100 to 1000. That works better than full population statistics, but
much worse than batch statistics for a batch of 100.The graphs in the paper looks very much like mine when using local batch
statistics, however they explicitly mention using population statistics for
their final results, so I'm not sure what's going on in my code.—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#2 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AGBoHjnOd-R2vE_7OtGKNNMc2GUnlsg-ks5qmWuSgaJpZM4JvF4O
.
from bnlstm.
@hardmaru Recently I also got worse result in test data set use pop mean and variance. You said you got good result just using vanilla lstm, could you please share you code and tell me what's going on.
I also got really good results just using vanilla LSTM but initializing the
hidden to hidden layer to the exact identity (not .95 identity)
from bnlstm.
Related Issues (13)
- BN in cell upate HOT 1
- Why two separate tf.matmul operations in BNLSTMCell.__call__? HOT 5
- bn_lstm_identity_initializer vs orthogonal_initializer discrepancy HOT 4
- Unique population statistics for each time step? HOT 1
- Terrible results on mnist with bn-lstm, why?
- Something about your update part HOT 1
- Tensorflow contrib function
- some error on tensorflow 1.8.0 HOT 1
- Update for tensorflow 1.12.0 HOT 2
- Bug report
- Type mismatch error while running test.py for tensorflow 1.13.1
- Applying bnlstm for video dataset
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bnlstm.