Hey, I hope you are still watching this repo. I'm trying to port thi

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Also, stack-trace in case it matters: <a href="https://gist.github.com/lukaspj/06b

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

I tried reverting tf.layers.dense to: <div class="highlight highlight-source-pytho

Porting to tensorflow 1.0 about act-tensorflow HOT 7 CLOSED

deneutoy commented on September 22, 2024

Porting to tensorflow 1.0

from act-tensorflow.

Comments (7)

DeNeutoy commented on September 22, 2024

Hi @lukaspj, I am still maintaining this repo(kind of).

It looks like the problem is that the input_with_flags should be of shape (batch_size, input_dim + 1) - i.e. the flag is appended to the end of the word embedding dimension, not to the length of the sentence. Additionally, the cell is run a single step at a time, so the input to the cell doesn't include the length - this means that we don't want to have the call to tf.nn.dynamic_rnn within the act_step, as this runs a cell over a sequence of inputs - here we are literally just calling the actual cell once. Also, if this flag bit is causing you a real headache, I don't mind if you just remove it - I implemented it to stay true to the paper, but I didn't find it to be very important.

I can help further if you open a PR with your current port, so I can play around with it, or if you post the full stack trace of the error. I think the problem is that you are passing in something 3D to the LSTM cell, but I can't be entirely sure without the full trace. Thanks!

from act-tensorflow.

lukaspj commented on September 22, 2024

When I used the dynamic_rnn, it complained that I the inputs was 2-dimensional and not 3, and I have previously used it for my LSTMCell like so:

            with tf.name_scope('flatten_input'):
                inputFlat = tf.reshape(tf.contrib.layers.flatten(layer_input), [self.batch_size, self.train_length, input_shape_size])

            with tf.name_scope('build_lstm') as scope:
                self.state_in = self.cell.zero_state(self.batch_size, tf.float32)
                rnn, self.rnn_state = tf.nn.dynamic_rnn(
                    inputs=inputFlat,
                    cell=self.cell,
                    dtype=tf.float32,
                    initial_state=self.state_in,
                    scope=scope
                )

That's why I'm providing it with 3D inputs.

So we want the dynamic_rnn code outside of act_step, is this because the previous rnn call works differently than dynamic_rnn?
Furthermore, do you believe we should use static_rnn instead of the dynamic_rnn?

from act-tensorflow.

lukaspj commented on September 22, 2024

@DeNeutoy I'm sorry I haven't been able to get back to you sooner, I was busy yesterday.

Wow thanks! You did a ton of work on that, I really appreciate that. I did not expect it.

I'm still seeing the issue, and I'm still trying to figure out where I go wrong. One thing that I guess might trigger it, is that I want to feed in the batch_size as a tensor, and not an integer, because it varies from my training and my testing scenario. I can't see myself how that could trigger it however.

I'm also using dynamic_rnn whereas adaptive_computation_time.py uses static_rnn (

act-tensorflow/src/adaptive_computation_time.py

Line 49 in db7e8dd

self.outputs, final_state = static_rnn(act, inputs, dtype = tf.float32)

) and I'm also feeding it a few more parameters:
https://gist.github.com/lukaspj/3abb0a0d3225abd70276a4089faaa3e9#file-simple_ddqrn-py-L62-L68

Any clues?

from act-tensorflow.

lukaspj commented on September 22, 2024

Also, stack-trace in case it matters:
https://gist.github.com/lukaspj/06baef80b3e368ec2af0d44fddae6072

from act-tensorflow.

lukaspj commented on September 22, 2024

@DeNeutoy Oh it just occured to me, I'm probably passing it the wrong initial_state!

I was using:

                self.state_in = self.cell.zero_state(self.batch_size, tf.float32)
                rnn, self.rnn_state = tf.nn.dynamic_rnn(
                    inputs=inputFlat,
                    cell=self.cell,
                    dtype=tf.float32,
                    initial_state=self.state_in,
                    scope=scope
                )

But self.cell here is the ACTCell, however the initial_state that needs to be passed should be the inner_cell's initial_state as that is what ACTCell uses it for.

A note here, it might be an idea for ACTCell to override the zero_state function, and simply delegate it to the inner_cell?

So I changed it to:

                self.state_in = self.inner_cell.zero_state(self.batch_size, tf.float32)
                rnn, self.rnn_state = tf.nn.dynamic_rnn(
                    inputs=inputFlat,
                    cell=self.cell,
                    dtype=tf.float32,
                    initial_state=self.state_in,
                    scope=scope
                )

Now, it fails somewhere else, but progress is progress!

Now, in the line right below it:

        output, new_state = static_rnn(cell=self.cell, inputs=[input_with_flags], initial_state=state, scope=type(self.cell).__name__)

        with tf.variable_scope('sigmoid_activation_for_pondering'):
            p = tf.squeeze(tf.layers.dense(new_state, 1, activation=tf.sigmoid))

It fails in tf.layers.dense, saying:

TypeError: int() argument must be a string, a bytes-like object or a number, not 'TensorShape'

I'm investigating it, but I suspect it might be because I use a variable batch_size. I.e. the batch_size is a tensor not a number.

from act-tensorflow.

lukaspj commented on September 22, 2024

I tried reverting tf.layers.dense to:

            p = tf.squeeze(tf.sigmoid(core_rnn_cell_impl._linear(new_state, 1, True)))

But then I got:

ValueError: The shape for main_DDQRN/lstm_layer/build_lstm/while/ACTCell/while/Merge_7:0 is not an invariant for the loop. It enters the loop with shape (2, ?, 15), but has shape after one iteration. Provide shape invariants using either the shape_invariants argument of tf.while_loop or set_shape() on the loop variables.

I'm pretty unfamiliar with TensorFlow loops, but the 15 seems to match my input size, and I'm guessing the 2 comes from the size of the tuple in the state. So it seems to me like that state is going from having some partially known size, to having an unknown size.

from act-tensorflow.

DeNeutoy commented on September 22, 2024

Nothing inside the while loop in the ACT cell should be 3 dimensional - I would suggest that checking this would be a good start. Probably try using the static rnn implementation as well to see if you get the same error.

from act-tensorflow.

Porting to tensorflow 1.0 about act-tensorflow HOT 7 CLOSED

Comments (7)

Related Issues (1)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent