Comments (7)
Hi @lukaspj, I am still maintaining this repo(kind of).
It looks like the problem is that the input_with_flags
should be of shape (batch_size, input_dim + 1)
- i.e. the flag is appended to the end of the word embedding dimension, not to the length of the sentence. Additionally, the cell is run a single step at a time, so the input to the cell doesn't include the length - this means that we don't want to have the call to tf.nn.dynamic_rnn
within the act_step
, as this runs a cell over a sequence of inputs - here we are literally just calling the actual cell once. Also, if this flag bit is causing you a real headache, I don't mind if you just remove it - I implemented it to stay true to the paper, but I didn't find it to be very important.
I can help further if you open a PR with your current port, so I can play around with it, or if you post the full stack trace of the error. I think the problem is that you are passing in something 3D to the LSTM cell, but I can't be entirely sure without the full trace. Thanks!
from act-tensorflow.
When I used the dynamic_rnn, it complained that I the inputs was 2-dimensional and not 3, and I have previously used it for my LSTMCell like so:
with tf.name_scope('flatten_input'):
inputFlat = tf.reshape(tf.contrib.layers.flatten(layer_input), [self.batch_size, self.train_length, input_shape_size])
with tf.name_scope('build_lstm') as scope:
self.state_in = self.cell.zero_state(self.batch_size, tf.float32)
rnn, self.rnn_state = tf.nn.dynamic_rnn(
inputs=inputFlat,
cell=self.cell,
dtype=tf.float32,
initial_state=self.state_in,
scope=scope
)
That's why I'm providing it with 3D inputs.
So we want the dynamic_rnn
code outside of act_step, is this because the previous rnn
call works differently than dynamic_rnn
?
Furthermore, do you believe we should use static_rnn
instead of the dynamic_rnn
?
from act-tensorflow.
@DeNeutoy I'm sorry I haven't been able to get back to you sooner, I was busy yesterday.
Wow thanks! You did a ton of work on that, I really appreciate that. I did not expect it.
I'm still seeing the issue, and I'm still trying to figure out where I go wrong. One thing that I guess might trigger it, is that I want to feed in the batch_size as a tensor, and not an integer, because it varies from my training and my testing scenario. I can't see myself how that could trigger it however.
I'm also using dynamic_rnn whereas adaptive_computation_time.py uses static_rnn (
) and I'm also feeding it a few more parameters:https://gist.github.com/lukaspj/3abb0a0d3225abd70276a4089faaa3e9#file-simple_ddqrn-py-L62-L68
Any clues?
from act-tensorflow.
Also, stack-trace in case it matters:
https://gist.github.com/lukaspj/06baef80b3e368ec2af0d44fddae6072
from act-tensorflow.
@DeNeutoy Oh it just occured to me, I'm probably passing it the wrong initial_state!
I was using:
self.state_in = self.cell.zero_state(self.batch_size, tf.float32)
rnn, self.rnn_state = tf.nn.dynamic_rnn(
inputs=inputFlat,
cell=self.cell,
dtype=tf.float32,
initial_state=self.state_in,
scope=scope
)
But self.cell here is the ACTCell, however the initial_state that needs to be passed should be the inner_cell's initial_state as that is what ACTCell uses it for.
A note here, it might be an idea for ACTCell to override the zero_state function, and simply delegate it to the inner_cell?
So I changed it to:
self.state_in = self.inner_cell.zero_state(self.batch_size, tf.float32)
rnn, self.rnn_state = tf.nn.dynamic_rnn(
inputs=inputFlat,
cell=self.cell,
dtype=tf.float32,
initial_state=self.state_in,
scope=scope
)
Now, it fails somewhere else, but progress is progress!
Now, in the line right below it:
output, new_state = static_rnn(cell=self.cell, inputs=[input_with_flags], initial_state=state, scope=type(self.cell).__name__)
with tf.variable_scope('sigmoid_activation_for_pondering'):
p = tf.squeeze(tf.layers.dense(new_state, 1, activation=tf.sigmoid))
It fails in tf.layers.dense
, saying:
TypeError: int() argument must be a string, a bytes-like object or a number, not 'TensorShape'
I'm investigating it, but I suspect it might be because I use a variable batch_size. I.e. the batch_size is a tensor not a number.
from act-tensorflow.
I tried reverting tf.layers.dense to:
p = tf.squeeze(tf.sigmoid(core_rnn_cell_impl._linear(new_state, 1, True)))
But then I got:
ValueError: The shape for main_DDQRN/lstm_layer/build_lstm/while/ACTCell/while/Merge_7:0 is not an invariant for the loop. It enters the loop with shape (2, ?, 15), but has shape after one iteration. Provide shape invariants using either the
shape_invariants
argument of tf.while_loop or set_shape() on the loop variables.
I'm pretty unfamiliar with TensorFlow loops, but the 15 seems to match my input size, and I'm guessing the 2 comes from the size of the tuple in the state. So it seems to me like that state is going from having some partially known size, to having an unknown size.
from act-tensorflow.
Nothing inside the while loop in the ACT cell should be 3 dimensional - I would suggest that checking this would be a good start. Probably try using the static rnn implementation as well to see if you get the same error.
from act-tensorflow.
Related Issues (1)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from act-tensorflow.