olavhn / bnlstm Goto Github PK

Batch normalized LSTM for tensorflow

Home Page: http://olavnymoen.com/2016/07/07/rnn-batch-normalization

Python 100.00%

bnlstm's Introduction

Batch normalized LSTM

An implementation of Recurrent Batch Normalization by Cooijmans et al. in TensorFlow together with a reimplementation of their results on sequential MNIST.

A short writeup on Olav Nymoen's blog

Requirements

Tensorflow 0.9
python 2.x

Acknowledgement

Cooijmans implementation

bnlstm's People

Contributors

Stargazers

Watchers

bnlstm's Issues

some error on tensorflow 1.8.0

1.tf.cond(training,batch_statistics, population_statistics) in line 138.
training is python bool,but cond need tf.bool. so modify it like this:
tf.cond(tf.cast(training, tf.bool), batch_statistics, population_statistics)
2.In line 81 i, j, f, o = tf.split(1, 4, hidden) not specify name arg. Modify like this:
i, j, f, o = tf.split(axis=1, num_or_size_splits=4, value=hidden)

Applying bnlstm for video dataset

can anyone please tell me how to use a different dataset specifically video dataset (Lena or UCF101) instead of MNIST and get the accuracy of the object detection?

Unique population statistics for each time step?

Thanks for the implementation. Does this implementation compute population statistics for each time step? I may be missing something, but It seems to be me that it is shared across time steps and I don't think this is what the Recurrent Batch Norm paper was doing.

Update for tensorflow 1.12.0

Just some version problems, and I believe you can also find solutions easily by yourself :D

changes:

scalar_summary should be renamed to tf.summary.scalar
tf.histogram_summary should be renamed to tf.summary.histogram
tf.merge_summary should be renamed to tf.summary.merge
tf.train.SummaryWriter should be renamed to tf.summary.FileWriter

BN on test set

Nice blog post!

If you see any performance error I might’ve done, I’d love to know!

One comment: When you evaluate the validation/test set, you should use the saved statistics from training. Looking at the code, I think you are calculating the moments as well during validation/test runs.

Terrible results on mnist with bn-lstm, why?

BN in cell upate

This is not really an issue of your code, but as you have studied this paper in a more intense way, I think you might be able to help me with this:

Page 4 of the paper says:

In our formulation, we normalize the recurrent term Wh_ht−1 and the input term Wx_xt separately.
Normalizing these terms individually gives the model better control over the relative contribution
of the terms using the γh and γx parameters. We set βh = βx = 0 to avoid unnecessary redundancy,
instead relying on the pre-existing parameter vector b to account for both biases. In order to
leave the LSTM dynamics intact and preserve the gradient flow through ct, we do not apply batch
normalization in the cell update

1. Question
The last section confuses me a bit. In the formula (and your code) they use three Batch Norms. But does the last section say, that we should not use BN in term (8) of the paper?
Or is term (7) the cell update they talk about?

2. Question
Additionally, they talk about applying BN to both, input-hidden and hidden-hidden updates. For me this sounds like two BNs, not three. Or are both BN in term (6) input-hidden updates, and BN in term (8) hidden-hidden updates?

Any idea? :)

Thank you in advance! :)

(And sorry in case my questions are unclear/confusing. At least I feel they are...)

Something about your update part

Thanks for your implementation.

When I try your code, I first calculate the validation error after each iteration. Then I only calculate the error after all iterations. And the accuracy is far low in the second case.

The iteration is set to be 1000 in both cases. And I use the same validation data in each iteration.

Does it mean the code still updates the moving moments in testing(i.e. feeding train=False)?

Tensorflow contrib function

Hi, have you tried to pull request into the tensorflow contrib layer? That would be much more convenient to use your code! Thanks!

bn_lstm_identity_initializer vs orthogonal_initializer discrepancy

The input to hidden weights are initialized with one SVD meaning i, j, f, o all have orthogonal starting weights. However for the hidden to hidden weights, bn_lstm_identity_initializer makes three SVD function calls so it's not the case that the weights for the previous input j, the forget gate f and the output gate o are orthogonal. Do you think this affects learning? I would guess that it doesn't matter, but it's hard to think it through fully.

Bug report

Please change this part:

train_mean_op = tf.assign(pop_mean, pop_mean * decay + batch_mean * (1 - decay))
train_var_op = tf.assign(pop_var, pop_var * decay + batch_var * (1 - decay))
def batch_statistics():
    with tf.control_dependencies([train_mean_op, train_var_op]):
        return tf.nn.batch_normalization(x, batch_mean, batch_var, offset, scale, epsilon)

to:

 def batch_statistics():
    train_mean_op = tf.assign(pop_mean, pop_mean * decay + batch_mean * (1 - decay))
    train_var_op = tf.assign(pop_var, pop_var * decay + batch_var * (1 - decay))
    with tf.control_dependencies([train_mean_op, train_var_op]):
        return tf.nn.batch_normalization(x, batch_mean, batch_var, offset, scale, epsilon)

According to the problem at stack overflow at here.
Otherwise, the train_mean_op and train_var_op will always be executed despite the value of training.

By the way, this implementation does not work well in my case, sad thing lol

Type mismatch error while running test.py for tensorflow 1.13.1

while running the test.py I am getting the following error. can anyone please tell me how to resolve this issue. FYI, The tensorflow version I am using is 1.13.1 (on windows platform)

Entire log message
C:\Users\spaul\Downloads\bnlstm-master\bnlstm-master>python test.py
C:\python37\lib\site-packages\numpy\core_init.py:29: UserWarning: loaded more than 1 DLL from .libs:
C:\python37\lib\site-packages\numpy.libs\libopenblas.IPBC74C7KURV7CB2PKT5Z5FNR3SIBV4J.gfortran-win_amd64.dll
C:\python37\lib\site-packages\numpy.libs\libopenblas.TXA6YQSD3GCQQC22GEQ54J2UDCXDXHWN.gfortran-win_amd64.dll
stacklevel=1)
WARNING:tensorflow:From test.py:13: read_data_sets (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.
Instructions for updating:
Please use alternatives such as official/mnist/dataset.py from tensorflow/models.
WARNING:tensorflow:From C:\python37\lib\site-packages\tensorflow\contrib\learn\python\learn\datasets\mnist.py:260: maybe_download (from tensorflow.contrib.learn.python.learn.datasets.base) is deprecated and will be removed in a future version.
Instructions for updating:
Please write your own downloading logic.
WARNING:tensorflow:From C:\python37\lib\site-packages\tensorflow\contrib\learn\python\learn\datasets\base.py:252: _internal_retry..wrap..wrapped_fn (from tensorflow.contrib.learn.python.learn.datasets.base) is deprecated and will be removed in a future version.
Instructions for updating:
Please use urllib or similar directly.
Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
WARNING:tensorflow:From C:\python37\lib\site-packages\tensorflow\contrib\learn\python\learn\datasets\mnist.py:262: extract_images (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.
Instructions for updating:
Please use tf.data to implement this functionality.
Extracting MNIST_data/train-images-idx3-ubyte.gz
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
WARNING:tensorflow:From C:\python37\lib\site-packages\tensorflow\contrib\learn\python\learn\datasets\mnist.py:267: extract_labels (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.
Instructions for updating:
Please use tf.data to implement this functionality.
Extracting MNIST_data/train-labels-idx1-ubyte.gz
WARNING:tensorflow:From C:\python37\lib\site-packages\tensorflow\contrib\learn\python\learn\datasets\mnist.py:110: dense_to_one_hot (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.
Instructions for updating:
Please use tf.one_hot on tensors.
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
WARNING:tensorflow:From C:\python37\lib\site-packages\tensorflow\contrib\learn\python\learn\datasets\mnist.py:290: DataSet.init (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.
Instructions for updating:
Please use alternatives such as official/mnist/dataset.py from tensorflow/models.
WARNING:tensorflow:From test.py:26: dynamic_rnn (from tensorflow.python.ops.rnn) is deprecated and will be removed in a future version.
Instructions for updating:
Please use keras.layers.RNN(cell), which is equivalent to this API
WARNING:tensorflow:From C:\python37\lib\site-packages\tensorflow\python\ops\tensor_array_ops.py:162: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
Traceback (most recent call last):
File "C:\python37\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 511, in _apply_op_helper
preferred_dtype=default_dtype)
File "C:\python37\lib\site-packages\tensorflow\python\framework\ops.py", line 1175, in internal_convert_to_tensor
ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
File "C:\python37\lib\site-packages\tensorflow\python\framework\ops.py", line 977, in _TensorTensorConversionFunction
(dtype.name, t.dtype.name, str(t)))
ValueError: Tensor conversion requested dtype int32 for Tensor with dtype float32: 'Tensor("rnn/while/BNLSTMCell/add_1:0", shape=(100, 400), dtype=float32)'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "test.py", line 26, in
outputs, state = dynamic_rnn(lstm, x_inp, initial_state=initialState, dtype=tf.float32)
File "C:\python37\lib\site-packages\tensorflow\python\util\deprecation.py", line 324, in new_func
return func(*args, **kwargs)
File "C:\python37\lib\site-packages\tensorflow\python\ops\rnn.py", line 671, in dynamic_rnn
dtype=dtype)
File "C:\python37\lib\site-packages\tensorflow\python\ops\rnn.py", line 879, in _dynamic_rnn_loop
swap_memory=swap_memory)
File "C:\python37\lib\site-packages\tensorflow\python\ops\control_flow_ops.py", line 3556, in while_loop
return_same_structure)
File "C:\python37\lib\site-packages\tensorflow\python\ops\control_flow_ops.py", line 3087, in BuildLoop
pred, body, original_loop_vars, loop_vars, shape_invariants)
File "C:\python37\lib\site-packages\tensorflow\python\ops\control_flow_ops.py", line 3022, in _BuildLoop
body_result = body(*packed_vars_for_body)
File "C:\python37\lib\site-packages\tensorflow\python\ops\control_flow_ops.py", line 3525, in
body = lambda i, lv: (i + 1, orig_body(*lv))
File "C:\python37\lib\site-packages\tensorflow\python\ops\rnn.py", line 847, in time_step
(output, new_state) = call_cell()
File "C:\python37\lib\site-packages\tensorflow\python\ops\rnn.py", line 833, in
call_cell = lambda: cell(input_t, state)
File "C:\Users\spaul\Downloads\bnlstm-master\bnlstm-master\lstm.py", line 81, in call
i, j, f, o = tf.split(1, 4, hidden)
File "C:\python37\lib\site-packages\tensorflow\python\ops\array_ops.py", line 1508, in split
axis=axis, num_split=num_or_size_splits, value=value, name=name)
File "C:\python37\lib\site-packages\tensorflow\python\ops\gen_array_ops.py", line 10742, in split
"Split", split_dim=axis, value=value, num_split=num_split, name=name)
File "C:\python37\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 534, in _apply_op_helper
(prefix, dtypes.as_dtype(input_arg.type).name))
TypeError: Input 'split_dim' of 'Split' Op has type float32 that does not match expected type of int32.

Why two separate tf.matmul operations in BNLSTMCell.call?

Is there a reason why you're doing two separate tf.matmul instead of one large one (which would likely be faster)? E.g. bn_lstm_identity_initializer could initialize [2 * self.num_units, 4 * self.num_units] instead. I'm thinking it has something to do with wanting to keep the running means and variances separate for the input-to-hidden and hidden-to-hidden connections? Or is it just so that the input can be of a smaller size than the hidden state transitions?