Giter Club home page Giter Club logo

Comments (7)

DeNeutoy avatar DeNeutoy commented on September 22, 2024

Hi @lukaspj, I am still maintaining this repo(kind of).

It looks like the problem is that the input_with_flags should be of shape (batch_size, input_dim + 1) - i.e. the flag is appended to the end of the word embedding dimension, not to the length of the sentence. Additionally, the cell is run a single step at a time, so the input to the cell doesn't include the length - this means that we don't want to have the call to tf.nn.dynamic_rnn within the act_step, as this runs a cell over a sequence of inputs - here we are literally just calling the actual cell once. Also, if this flag bit is causing you a real headache, I don't mind if you just remove it - I implemented it to stay true to the paper, but I didn't find it to be very important.

I can help further if you open a PR with your current port, so I can play around with it, or if you post the full stack trace of the error. I think the problem is that you are passing in something 3D to the LSTM cell, but I can't be entirely sure without the full trace. Thanks!

from act-tensorflow.

lukaspj avatar lukaspj commented on September 22, 2024

When I used the dynamic_rnn, it complained that I the inputs was 2-dimensional and not 3, and I have previously used it for my LSTMCell like so:

            with tf.name_scope('flatten_input'):
                inputFlat = tf.reshape(tf.contrib.layers.flatten(layer_input), [self.batch_size, self.train_length, input_shape_size])

            with tf.name_scope('build_lstm') as scope:
                self.state_in = self.cell.zero_state(self.batch_size, tf.float32)
                rnn, self.rnn_state = tf.nn.dynamic_rnn(
                    inputs=inputFlat,
                    cell=self.cell,
                    dtype=tf.float32,
                    initial_state=self.state_in,
                    scope=scope
                )

That's why I'm providing it with 3D inputs.

So we want the dynamic_rnn code outside of act_step, is this because the previous rnn call works differently than dynamic_rnn?
Furthermore, do you believe we should use static_rnn instead of the dynamic_rnn?

from act-tensorflow.

lukaspj avatar lukaspj commented on September 22, 2024

@DeNeutoy I'm sorry I haven't been able to get back to you sooner, I was busy yesterday.

Wow thanks! You did a ton of work on that, I really appreciate that. I did not expect it.

I'm still seeing the issue, and I'm still trying to figure out where I go wrong. One thing that I guess might trigger it, is that I want to feed in the batch_size as a tensor, and not an integer, because it varies from my training and my testing scenario. I can't see myself how that could trigger it however.

I'm also using dynamic_rnn whereas adaptive_computation_time.py uses static_rnn (

self.outputs, final_state = static_rnn(act, inputs, dtype = tf.float32)
) and I'm also feeding it a few more parameters:
https://gist.github.com/lukaspj/3abb0a0d3225abd70276a4089faaa3e9#file-simple_ddqrn-py-L62-L68

Any clues?

from act-tensorflow.

lukaspj avatar lukaspj commented on September 22, 2024

Also, stack-trace in case it matters:
https://gist.github.com/lukaspj/06baef80b3e368ec2af0d44fddae6072

from act-tensorflow.

lukaspj avatar lukaspj commented on September 22, 2024

@DeNeutoy Oh it just occured to me, I'm probably passing it the wrong initial_state!

I was using:

                self.state_in = self.cell.zero_state(self.batch_size, tf.float32)
                rnn, self.rnn_state = tf.nn.dynamic_rnn(
                    inputs=inputFlat,
                    cell=self.cell,
                    dtype=tf.float32,
                    initial_state=self.state_in,
                    scope=scope
                )

But self.cell here is the ACTCell, however the initial_state that needs to be passed should be the inner_cell's initial_state as that is what ACTCell uses it for.

A note here, it might be an idea for ACTCell to override the zero_state function, and simply delegate it to the inner_cell?

So I changed it to:

                self.state_in = self.inner_cell.zero_state(self.batch_size, tf.float32)
                rnn, self.rnn_state = tf.nn.dynamic_rnn(
                    inputs=inputFlat,
                    cell=self.cell,
                    dtype=tf.float32,
                    initial_state=self.state_in,
                    scope=scope
                )

Now, it fails somewhere else, but progress is progress!

Now, in the line right below it:

        output, new_state = static_rnn(cell=self.cell, inputs=[input_with_flags], initial_state=state, scope=type(self.cell).__name__)

        with tf.variable_scope('sigmoid_activation_for_pondering'):
            p = tf.squeeze(tf.layers.dense(new_state, 1, activation=tf.sigmoid))

It fails in tf.layers.dense, saying:

TypeError: int() argument must be a string, a bytes-like object or a number, not 'TensorShape'

I'm investigating it, but I suspect it might be because I use a variable batch_size. I.e. the batch_size is a tensor not a number.

from act-tensorflow.

lukaspj avatar lukaspj commented on September 22, 2024

I tried reverting tf.layers.dense to:

            p = tf.squeeze(tf.sigmoid(core_rnn_cell_impl._linear(new_state, 1, True)))

But then I got:

ValueError: The shape for main_DDQRN/lstm_layer/build_lstm/while/ACTCell/while/Merge_7:0 is not an invariant for the loop. It enters the loop with shape (2, ?, 15), but has shape after one iteration. Provide shape invariants using either the shape_invariants argument of tf.while_loop or set_shape() on the loop variables.

I'm pretty unfamiliar with TensorFlow loops, but the 15 seems to match my input size, and I'm guessing the 2 comes from the size of the tuple in the state. So it seems to me like that state is going from having some partially known size, to having an unknown size.

from act-tensorflow.

DeNeutoy avatar DeNeutoy commented on September 22, 2024

Nothing inside the while loop in the ACT cell should be 3 dimensional - I would suggest that checking this would be a good start. Probably try using the static rnn implementation as well to see if you get the same error.

from act-tensorflow.

Related Issues (1)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.