batzner / indrnn Goto Github PK
View Code? Open in Web Editor NEWTensorFlow implementation of Independently Recurrent Neural Networks
Home Page: https://arxiv.org/abs/1803.04831
License: Apache License 2.0
TensorFlow implementation of Independently Recurrent Neural Networks
Home Page: https://arxiv.org/abs/1803.04831
License: Apache License 2.0
Looking forward to reply~~
Hi,
First, thanks a lot for this example.
I just noticed that you wrote "I let it run for two days and stopped it after 60,000
training steps ". In your example, LEARNING_RATE_DECAY_STEPS=600000
meaning learning rate starts to drop after 600,000
training steps. Does this mean that the result shown on your page is obtained before decaying the learning rate?
If this is the case, further dropping the learning rate might improve the performance.
Also, from your results, although the validation error keeps dropping, it drops relatively slow. So I think maybe by setting LEARNING_RATE_DECAY_STEPS=20000
, it gives you a better result than the one you presented (although it is not the best).
By the way, "I let it run for two days and stopped it after 60,000 training steps"
, it seems much slower than mine. I am not very familiar with TensorFlow, but did it compute the input*W for all the time steps together? If not, then I would suggest removing this process from the IndRNN cell and add an extra layer computing this. This could improve the efficiency a lot I think.
Thanks.
In your paper, you have mentioned that the result of action recognition outperform the state-of-the-art work. Could you please release the code of this part?
Thank you a lot!
Line 68 in 551f9fe
Hey, I checked your implementation of the paper and noticed that instead of constraining the recurrent weight between 0 and max, where max is pow(2, 1/T), you constrain between -max and max. Since this matrix is applied element wise for every time step, wouldn't negative weights potentially result in outputs oscillating between positive and negative signs? This might explain why your version did not converge as fast as in the paper.
If this is the case, the I think the standard weight initialization might not be optimal since it's centered around 0, so half of the weights would be immediately truncated to 0. Maybe a uniform distribution between 0 and max might help initial convergence. Just a thought.
Let me know what you think. I feel like this architecture night be very promising because of it's simplicity and I'd love to see more results.
Sorry to open an issue here. However, I think that you are an expert on this topic, and IndRNNCell related upgrade in TensorFlow 1.10.0 might also be created by you.
I am trying to apply relu activation to IndyLSTMCell in the new TensorFlow (1.10). However, the loss becomes NAN after I make that change. The default activation tanh works for this cell. IndyGRUCell has the same problem. For IndRNNCell, both tanh and relu work. However, when I stack it to multiple layer, I did not see any performance (capacity) increase of the model (concerning the loss decrease speed over training epochs).
Can you please give me a hint to address this? Any suggestions would be very appreciated. Thanks!
Hello!
I noticed that your implementation of indrnn is the basic version(not the residual version).
And in original paper, there should two batch normalization operations after the cell's input and before activation layer. Though it's not required in indrnn's structure, but it's 'recommended' in the paper :)
It seems like a problem to run the code with tensorflow v1.4, can I just replace _LayerRNNCell with RNNCell?
line 149 in idn_rnn_cell.py
gate_inputs = math_ops.matmul(inputs, self._input_kernel)
inputs: Tensor, 2-D tensor of shape [batch, num_units]
.
self._input_kernel = self.add_variable( "input_kernel", shape=[input_depth, self._num_units], initializer=self._input_initializer)
self._input_kernel: shape=[input_depth, self._num_units]
Hi,
Thanks a lot for your work! However when I run the example code I got a Value Error:
'Variable rnn/multi_rnn_cell/cell_0/ind_rnn_cell/input_kernel already exists, disallowed. Did you mean to set reuse=True or reuse=tf.AUTO_REUSE in VarScope? Originally defined at:...'
Could you help to figure out what was went wrong?
Tanks again!
Happy to review your PRs. May want to address the issue opened by EdeMeijer first.
hello, could you help share some ideas about how to reimplement the idea on the action recognition dataset?
I tried to implement this, but the performance is not good, hope you can give some instruction.
my code is here:
https://github.com/jren2019/test_ind
I can give you the data that i used for my code. Thanks
in ind_rnn_cell.py:
def build(self, inputs_shape):
if inputs_shape[1].value is None:
raise ValueError("Expected inputs.shape[-1] to be known, saw shape: %s"
% inputs_shape)
if inputs_shape[1].value is None:
should be if inputs_shape[-1].value is None:
because we need to check if depth is defined
Hello,I found a performance issue in the definition of get_training_set
,
batzner/indrnn/blob/master/examples/sequential_mnist.py,
dataset.map was called without num_parallel_calls.
I think it will increase the efficiency of your program if you add this.
The same issues also exist in dataset = dataset.map(preprocess_data) and
dataset = dataset.map(preprocess_data)
Here is the documemtation of tensorflow to support this thing.
Looking forward to your reply. Btw, I am very glad to create a PR to fix it if you are too busy.
Hi,
Thanks for the implementation.
I have one comment on "why IndRNN on 5000 time steps does not work in your implementation". I think it is related to the initialization of the recurrent weight.
As shown in the paper, to keep long-term memory, the recurrent weight needs to be around 1. For the adding problem, and for the last IndRNN layer, only the last output is useful. Therefore, there is no need to keep the short-term memory. Accordingly, the recurrent weight of the last IndRNN layer can be initialized to be all 1 or a range (1-epsilon, 1+epsilon) where epsilon is small. By the way, for relu, the recurrent weight for the other layers can be initialized to (0, recurr_max) without the negative part, and it is initialized with Uniform distribution to keep all kinds of memory.
In your implementation, only 128 units are used. With the uniform distribution for the last IndRNN layer, the number of units that can keep long-term memory is very small, thus making it very hard to solve tasks of long sequences.
This applies to other tasks that only require the final output such as mnist classification and action recognition.
Could you please give it a try? It works on my end.
Thanks.
Hello, I found a performance issue in the definition of build_rnn
, examples/sequential_mnist.py, tf.equal in line 201 and 202 will be created repeatedly during program execution, resulting in reduced efficiency. I think it should be created before the loop in build_rnn
.
Looking forward to your reply. Btw, I am very glad to create a PR to fix it if you are too busy.
when i use IndRNN in bidirectional rnn, it raises an error,
ValueError:cannot use '/bidirectional_rnn/fw/fw/while/ind_rnn/cell/Mul_1' as input to 'bidirectional_rnn/fw/fw/while/fw/ind_rnn_cell/clip_by_value' because they are in different while loops.
snippet of my code
recurrent_max = pow(2, 1.0 / time_steps)
fw _rnn_cell = IndRNNCell(hidden_size, recurrent_max_abs=recurrent_max)
bw_rnn_cell = IndRNN(hidden_size, recurrent_max_abs=recurrent_max)
bi_states, _ = tf.nn.bidirectional_dynamic_rnn(
fw_rnn_cell,
bw_rnn_cell,
inputs,
sequence_length=lengths,
dtype=tf.float32
)
any body encounter this problem? how to solve it?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.