ematvey / tensorflow-seq2seq-tutorials Goto Github PK

View Code? Open in Web Editor NEW

997.0 997.0 375.0 742 KB

Dynamic seq2seq in TensorFlow, step by step

License: MIT License

Jupyter Notebook 92.41% Python 7.59%

tensorflow-seq2seq-tutorials's Introduction

seq2seq with TensorFlow

Collection of unfinished tutorials. May be good for educational purposes.

1 - simple sequence-to-sequence model with dynamic unrolling

Deliberately slow-moving, explicit tutorial. I tried to thoroughly explain everything that I found in any way confusing.

Implements simple seq2seq model described in Sutskever at al., 2014 and tests it against toy memorization task.

Picture from Sutskever at al., 2014

2 - advanced dynamic seq2seq

Encoder is bidirectional now. Decoder is implemented using tf.nn.raw_rnn. It feeds previously generated tokens during training as inputs, instead of target sequence.

Picture from Deep Learning for Chatbots

3 - Using `tf.contrib.seq2seq` (TF<=1.1)

New dynamic seq2seq appeared in r1.0. Let's try it.

UPDATE: that this tutorial doesn't work with tf version > 1.1, API. I recommend checking out new official tutorial instead to learn high-level seq2seq API.

tensorflow-seq2seq-tutorials's People

Stargazers

Watchers

Forkers

wolfws shashankg7 vyraun omprakash95 sam2015 ml-ai-nlp-ir little1tow sunjieee chagge yinhuagang hydercps allensmile benjamesbabala flydsc aerani flybirp xmb-cipher anuragreddygv323 vybhavk xuanheiiis hundred06 fmr-fmr valerian-sky kwresearch agistrueai jiangnanhugo louiekang jeskarha alexpanasucla vvishwa kelvict adriantorrie wubizhi andrewszwec mintcloud mageswaran1989 deeplearningsky peratham jack71728 mhnatiuk raghavendranpm karthikshivaram24 dcarlyle savourylie engp0958 iodmitri songzeballboy tab4space zxlwrz vlnguyen92 andresn elenduuche menxia tanay1998 th1nk4data wangchenctg sankiteth adakum all3xfx durgaprasd chaitanyacixlive mtfelix shobhitmaheshwari zexin89 sammy4321 easyfly007 ahmedbendebba libardo1 aggounix snci nipandha zhangruiskyline chulakar poipoi-5353 zxsted kmario23 nagappankv youngkwonjo kormilitzin 0ptimiz3dprime pzfok williamd4112 sxdkxgwan ppartha03 zsgchinese zichuliu buy1 hunterchen yilinshen fendouai akirannz tutty427 weizhili-relfektion yuhanchengo aaronzira saurav-31 coralythuang singlakdeepak sakhawatsumit ilustreous

tensorflow-seq2seq-tutorials's Issues

Problem with AdamOptimizer?

Hi,
You have a great tutorial. Thanks!
While running your tutorial 2 code, i ran into this error:

InvalidArgumentError Traceback (most recent call last)
in ()
5 for batch in range(max_batches):
6 fd = next_feed()
----> 7 _, l = sess.run([train_op, loss], fd)
8 loss_track.append(l)
9

InvalidArgumentError: Cannot assign a device to node 'Adam/update_Variable/sub_3': Could not satisfy explicit device specification '' because the node was colocated with a group of nodes that required incompatible device '/job:localhost/replica:0/task:0/GPU:0'
Colocation Debug Info:
Colocation group had the following types and devices:
NoOp: GPU CPU
AssignSub: GPU CPU
ScatterAdd: GPU CPU
StridedSlice: GPU CPU
Shape: GPU CPU
Unique: CPU
Sub: GPU CPU
Const: GPU CPU
VariableV2: GPU CPU
UnsortedSegmentSum: GPU CPU
Identity: GPU CPU
Gather: GPU CPU
Mul: GPU CPU
RealDiv: GPU CPU
Assign: GPU CPU
Sqrt: GPU CPU
Enter: GPU CPU
Add: GPU CPU
Switch: GPU CPU
[[Node: Adam/update_Variable/sub_3 = Sub[T=DT_FLOAT, _class=["loc:@variable"]](Adam/update_Variable/sub_3/x, Adam/beta2)]]

Caused by op u'Adam/update_Variable/sub_3', defined at:
File "/home/gangeshwark/anaconda2/lib/python2.7/runpy.py", line 174, in _run_module_as_main
"main", fname, loader, pkg_name)
File "/home/gangeshwark/anaconda2/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/home/gangeshwark/anaconda2/lib/python2.7/site-packages/ipykernel/main.py", line 3, in
app.launch_new_instance()
File "/home/gangeshwark/anaconda2/lib/python2.7/site-packages/traitlets/config/application.py", line 658, in launch_instance
app.start()
File "/home/gangeshwark/anaconda2/lib/python2.7/site-packages/ipykernel/kernelapp.py", line 474, in start
ioloop.IOLoop.instance().start()
File "/home/gangeshwark/anaconda2/lib/python2.7/site-packages/zmq/eventloop/ioloop.py", line 177, in start
super(ZMQIOLoop, self).start()
File "/home/gangeshwark/anaconda2/lib/python2.7/site-packages/tornado/ioloop.py", line 887, in start
handler_func(fd_obj, events)
File "/home/gangeshwark/anaconda2/lib/python2.7/site-packages/tornado/stack_context.py", line 275, in null_wrapper
return fn(*args, **kwargs)
File "/home/gangeshwark/anaconda2/lib/python2.7/site-packages/zmq/eventloop/zmqstream.py", line 440, in _handle_events
self._handle_recv()
File "/home/gangeshwark/anaconda2/lib/python2.7/site-packages/zmq/eventloop/zmqstream.py", line 472, in _handle_recv
self._run_callback(callback, msg)
File "/home/gangeshwark/anaconda2/lib/python2.7/site-packages/zmq/eventloop/zmqstream.py", line 414, in _run_callback
callback(*args, **kwargs)
File "/home/gangeshwark/anaconda2/lib/python2.7/site-packages/tornado/stack_context.py", line 275, in null_wrapper
return fn(*args, **kwargs)
File "/home/gangeshwark/anaconda2/lib/python2.7/site-packages/ipykernel/kernelbase.py", line 276, in dispatcher
return self.dispatch_shell(stream, msg)
File "/home/gangeshwark/anaconda2/lib/python2.7/site-packages/ipykernel/kernelbase.py", line 228, in dispatch_shell
handler(stream, idents, msg)
File "/home/gangeshwark/anaconda2/lib/python2.7/site-packages/ipykernel/kernelbase.py", line 390, in execute_request
user_expressions, allow_stdin)
File "/home/gangeshwark/anaconda2/lib/python2.7/site-packages/ipykernel/ipkernel.py", line 196, in do_execute
res = shell.run_cell(code, store_history=store_history, silent=silent)
File "/home/gangeshwark/anaconda2/lib/python2.7/site-packages/ipykernel/zmqshell.py", line 501, in run_cell
return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
File "/home/gangeshwark/anaconda2/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 2717, in run_cell
interactivity=interactivity, compiler=compiler, result=result)
File "/home/gangeshwark/anaconda2/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 2821, in run_ast_nodes
if self.run_code(code, result):
File "/home/gangeshwark/anaconda2/lib/python2.7/site-packages/IPython/core/interactiveshell.py", line 2881, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "", line 7, in
train_op = tf.train.AdamOptimizer(lr).minimize(loss)
File "/home/gangeshwark/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/optimizer.py", line 289, in minimize
name=name)
File "/home/gangeshwark/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/optimizer.py", line 413, in apply_gradients
update_ops.append(processor.update_op(self, grad))
File "/home/gangeshwark/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/optimizer.py", line 66, in update_op
return optimizer._apply_sparse_duplicate_indices(g, self._v)
File "/home/gangeshwark/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/optimizer.py", line 557, in _apply_sparse_duplicate_indices
return self._apply_sparse(gradient_no_duplicate_indices, var)
File "/home/gangeshwark/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/adam.py", line 156, in _apply_sparse
v_scaled_g_values = (grad.values * grad.values) * (1 - beta2_t)
File "/home/gangeshwark/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/math_ops.py", line 808, in r_binary_op_wrapper
return func(x, y, name=name)
File "/home/gangeshwark/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/gen_math_ops.py", line 2775, in _sub
result = _op_def_lib.apply_op("Sub", x=x, y=y, name=name)
File "/home/gangeshwark/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 763, in apply_op
op_def=op_def)
File "/home/gangeshwark/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2327, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/home/gangeshwark/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1226, in init
self._traceback = _extract_stack()

InvalidArgumentError (see above for traceback): Cannot assign a device to node 'Adam/update_Variable/sub_3': Could not satisfy explicit device specification '' because the node was colocated with a group of nodes that required incompatible device '/job:localhost/replica:0/task:0/GPU:0'
Colocation Debug Info:
Colocation group had the following types and devices:
NoOp: GPU CPU
AssignSub: GPU CPU
ScatterAdd: GPU CPU
StridedSlice: GPU CPU
Shape: GPU CPU
Unique: CPU
Sub: GPU CPU
Const: GPU CPU
VariableV2: GPU CPU
UnsortedSegmentSum: GPU CPU
Identity: GPU CPU
Gather: GPU CPU
Mul: GPU CPU
RealDiv: GPU CPU
Assign: GPU CPU
Sqrt: GPU CPU
Enter: GPU CPU
Add: GPU CPU
Switch: GPU CPU
[[Node: Adam/update_Variable/sub_3 = Sub[T=DT_FLOAT, _class=["loc:@variable"]](Adam/update_Variable/sub_3/x, Adam/beta2)]]

I think it is something to do with AdamOptimizer(). However, the code works with GradientDescentOptimizer.
Any idea how to solve this?

Thanks in advance!

Typo in bidirectional encoder?

Apologies if I'm misunderstanding something here. In model_new.py, the bidirectional encoder outputs a concatenation of the forward output with itself. Should this be a concatenation of the forward and backward outputs?
(See line 176)

Multilayered Bidirectional seq2seq

Hi ,
Thanks a lot for creating this tutorial.
Would appreciate if u can help me modify this code to make it multilayered bi-directional seq2seq.
I could not find any function in tf.contrib.seq2seq (version 1.0) to make it multilayered.

How does inference work? Always getting same predictions

Hi all,

I am currently trying to perfom inference on a task using the example of the third tutorial notebook as my guide.

The input is a sequence of english language sentences and the output is also an english language phrase or sentence.

My problem is at inference time all my input sentences are returning the same output sentence. Even my training examples. I have to think this is a problem with how I am executing inference because I can clearly see in the training output that the predictions are matching the training examples and the loss decreases to <0.1.

Here are some relevant snippets of my code:

model = Seq2SeqModel(encoder_cell=LSTMCell(250),
decoder_cell=LSTMCell(500),
vocab_size=len(vocab_dict),
embedding_size=300,
attention=True,
bidirectional=True,
debug=False)

session.run(tf.global_variables_initializer())
saver = tf.train.Saver()

if train:
for epoch in range(3001):
train_model(epoch, session, model,
length_from=df_all['n_quote'].min(), length_to=df_all['n_quote'].max(),
vocab_lower=2, vocab_upper=len(vocab_dict),
batch_size=batch_size,
max_batches=batches_in_epoch, verbose=True)

    if epoch % 500 ==0: saver.save(session, 'seq2seq_fixed_labels', global_step=epoch)

else:

saver.restore(session, './seq2seq_fixed_labels_v4-1000')
fd = model.make_inference_inputs([encodeSentence('Next we year we will focus on retention'), \
                                  encodeSentence('Cybersecurity incidents could disrupt business operations  result in the loss of critical and confidential information  and adversely impact our reputation and results of operations'),
                                  encodeSentence('Expanding into emerging markets is important for us'), \
                                  encodeSentence('In closing we had a strong start to the year and we expect this performance to continue throughout the balance of')])

inf_out = session.run(model.decoder_prediction_inference, fd)
inf_prob_out = session.run(model.decoder_prediction_prob_inference, fd)

print (inf_out)
for i, (e_in, dt_pred, dt_pred_prob) in enumerate(zip(fd[model.encoder_inputs].T, 
                         inf_out.T, inf_prob_out.T)):
    #if i > 0: break
    print('    sample {}:'.format(i + 1))
    print('    enc input                > {}'.format([inv_map[i] for i in e_in]))
    print('    dec train predicted      > {}'.format([inv_map[i] for i in dt_pred]))
    print('    dec train predicted prob > {}'.format(dt_pred_prob))

The predicted output of this is always the sentence string:

u'Goal', u'', u'', u'A', u'highly', u'effective', 'majoreconomic', u'engine', u'creating', 'newpartnerships', u'to', u'build', u'a', 'strongand', u'sustainable', u'future', u'for', u'', u'Florida', u'in', u'the', u'global', u'economy', ''

I have checked that the input "fd" is different for every block of text going into model.decoder_prediction_inference. But the output is always the same. Any ideas?

Thank You,

Kuhan

there is no input argument when loop_fn method is called.

I am studying Advanced dynamic seq2seq with TensorFlow.
When i executed source code, it works well.
But i have a question among the below source code.

In [21]:

def loop_fn(time, previous_output, previous_state, previous_loop_state):
    if previous_state is None:    # time == 0
        assert previous_output is None and previous_state is None
        return loop_fn_initial()
    else:
        return loop_fn_transition(time, previous_output, previous_state, previous_loop_state)

decoder_outputs_ta, decoder_final_state, _ = tf.nn.raw_rnn(decoder_cell, loop_fn)
decoder_outputs = decoder_outputs_ta.stack()

decoder_outputs_ta, decoder_final_state, _ = tf.nn.raw_rnn(decoder_cell, loop_fn)

Above bolded line call loop_fn method, but there's no input argument.
Then, i wonder how loop_fn method works well.

Doubt about prediction.

Appreciating for the turioal!
But i doubt that if the function "simple_decoder_fn_inference" is the prediction function of dynamic_rnn?

last cell in '3-seq2seq-native-new.ipynb'

should be:
tracks_time.plot(figsize=(8, 5), title='GRU vs LSTM loss, compute-time')

[Suggestion] PAD added twice

Your first tutorial has this code snippet:

decoder_targets_, _ = helpers.batch(
    [(sequence) + [EOS] + [PAD] * 2 for sequence in batch]
)
decoder_inputs_, _ = helpers.batch(
    [[EOS] + (sequence) + [PAD] * 2 for sequence in batch]
)

I'm opening this issue to suggest that you add an explanation for why you append PAD twice, if you have time, since I don't understand this part myself. Thanks for these tutorials; I'm finding them very helpful!

Using the same weights in two different positions seems wrong

At In[17] you are setting the weights W (and bias b) once and then you use them in two different places
First, you use them inside loop function at In[20] and then you use them again at In[23].

You are doing the same thing, aren't you? You want to calculate, at In[23] what you have already calculated inside In[20] but you have trouble extracting it?...

Cannot understand what is going on exactly. Could you help clarify it?

module 'tensorflow.contrib.seq2seq' has no attribute 'prepare_attention'

I'm trying the 3rd tutorial, and got error: module 'tensorflow.contrib.seq2seq' has no attribute 'prepare_attention'. The tensorflow version of mine is 1.1.0. How to I change the code for the 1.1.0 version? Thanks.

Increasing max sequence length and vocab size

Dear experts,

Thank you for this excellent tutorial. This is one of the first seq2seq tutorials I have read that has really helped me to internalize some of the concepts (and that I could get running off the ground without too much trouble!).

I had question, I am currently working on the first tutorial notebook: "1-seq2seq" and trying to understand the relationship between model performance and sequence length and vocab size. In a real world example it may be possible to control sequence length by limiting the sentences to certain sizes but it would surely not be possible to significantly reduce the vocabulary below a threshold.

Indeed at the end of the tutorial it is suggested to play around with these parameters to observe how training speed and quality degrades.

My question is: how would I best translate the toy model of predicting a random sequence of limited sequence of 2-8 and vocab between 1-10 to a more realistic scenario where the vocab can be thousands of terms?

Currently I am simply trying to extend the problem to predicting a random sequence of numbers between 2-5000 instead of 2-10 and play around with the hyperparameters to figure out which ones will help increase the quality of my results.

Is there any intution towards understanding how the embedding size, # encoder units affect model quality? I already noticed that batch size directly effects the quality when the sequence length increases.

Thank you!

Kuhan

Tutorial 1: decoder is fed correct values during prediction

As far as I can tell, the decoder in tutorial 1 is always fed the true values, not just during training but also for inference.

decoder_logits, decoder_final_state = tf.nn.dynamic_rnn(
    decoder_cell, decoder_inputs_onehot,
    
    initial_state=encoder_final_state,

    dtype=tf.float32, time_major=True, scope="plain_decoder",
)

For training, it makes sense that the inputs are decoder_inputs_onehot. However, for inference, the inputs should be the decoder's prediction from the previous timestep.

Am I misunderstanding something, or is this a bug?

Beam search toy example

Hi, I think beam search is very important for seq2seq, but there seems no clear examples on this. Is there any plan for tensorflow beam search tutorial? Especially using raw_rnn.

what does _init_decoder_train_connectors actually do?

Can you please explain a little bit? Thanks!

new changes in tensorflow

Hi there,

Thank you for the great work! I was wondering if you figured it out how to change the code to run in tf 1.1?

OOM error after running some batches

Hi, I tried to use the raw_rnn function. but after first few successful training, the tensor flow will throw an OOM Error:
which was originally created as op 'Train/HRED/my_model/tower_0/loop_utterances/rnn_3/while/lstm_cell/lstm_cell/MatMul', defined at:
File "train_multi_gpu.py", line 24, in
train_op,train_global_step = multi_gpu_model(Model, is_training=True, scope=train_scope)
File "/mnt/home/yangshao/social_bot/hred/dl_utils.py", line 186, in multi_gpu_model
model = model_class(is_training, scope)
File "/mnt/home/yangshao/social_bot/hred/hred.py", line 28, in init
self.build_graph(emb)
File "/mnt/home/yangshao/social_bot/hred/hred.py", line 172, in build_graph
decoder_outpuots_ta, decoder_final_state, _ = tf.nn.raw_rnn(self.decoder_cell, self.loop_fn)
File "/mnt/home/yangshao/tf1.1/lib/python3.6/site-packages/tensorflow/python/ops/rnn.py", line 1043, in raw_rnn
swap_memory=swap_memory)
File "/mnt/home/yangshao/tf1.1/lib/python3.6/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2623, in while_loop
result = context.BuildLoop(cond, body, loop_vars, shape_invariants)
File "/mnt/home/yangshao/tf1.1/lib/python3.6/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2456, in BuildLoop
pred, body, original_loop_vars, loop_vars, shape_invariants)
File "/mnt/home/yangshao/tf1.1/lib/python3.6/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2406, in _BuildLoop
body_result = body(*packed_vars_for_body)
File "/mnt/home/yangshao/tf1.1/lib/python3.6/site-packages/tensorflow/python/ops/rnn.py", line 987, in body
(next_output, cell_state) = cell(current_input, state)
File "/mnt/home/yangshao/tf1.1/lib/python3.6/site-packages/tensorflow/contrib/rnn/python/ops/core_rnn_cell_impl.py", line 404, in call
lstm_matrix = _linear([inputs, m_prev], 4 * self._num_units, bias=True)
File "/mnt/home/yangshao/tf1.1/lib/python3.6/site-packages/tensorflow/contrib/rnn/python/ops/core_rnn_cell_impl.py", line 1048, in _linear
res = math_ops.matmul(array_ops.concat(args, 1), weights)
File "/mnt/home/yangshao/tf1.1/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py", line 1801, in matmul
a, b, transpose_a=transpose_a, transpose_b=transpose_b, name=name)

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[812,2048]
[[Node: Train/HRED/my_model/tower_0/gradients/Train/HRED/my_model/tower_0/loop_utterances/rnn_3/while/lstm_cell/lstm_cell/MatMul_grad/MatMul_1 = MatMul[T=DT_FLOAT, transpose_a=true, transpose_b=false, _device="/job:localhost/replica:0/task:0/gpu:0"](Train/HRED/my_model/tower_0/gradients/Train/HRED/my_model/tower_0/loop_utterances/rnn_3/while/lstm_cell/lstm_cell/MatMul_grad/MatMul_1/StackPop, Train/HRED/my_model/tower_0/gradients/Train/HRED/my_model/tower_0/loop_utterances/rnn_3/while/lstm_cell/lstm_cell/BiasAdd_grad/tuple/control_dependency)]]
[[Node: Train/HRED/Adam/update/NoOp_1/_52370 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_7406_Train/HRED/Adam/update/NoOp_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]]

The tensor flow version I use is 1.1.0? Is there anyone have the same problem?

Duplicated computation of decoder outputs?

Thanks for mentioning raw_rnn in tutorial 2 as I completely don't know about this function before. But the document is a bit not clear for me... May I ask a question? In tutorial 2, is the decoder_logits computed of all time steps the same as what has been computed (output_logits) in the loop_fn? If so, can we save time by reusing the latter ones?

Logits and labes have different shapes when computing cross-entropy loss

Hi,

I tried to implement snippets of your code for a simple word-reversal problem, where I have 3 words in a sentence but when I compute the cross-entropy it gives me an error like this: InvalidArgumentError (see above for traceback): logits and labels must be broadcastable: logits_size=[832,28] labels_size=[960,28].

I believe it is because the labels in my code have been padded whereas the logits, which are outputs of the dynamic_decode function, have variable sequence lengths. How did you manage to get the cross-entropy to work with variable sequence length for the logits?

https://github.com/kcang2/Udacity-Deep-Learning-Assignment/blob/master/Assignment_6_3.ipynb

Best regards.

Drop projection wrappers in tutorial #1

One-hot encoding followed by matrix multiply for input encoding is just bad — I suspected it back when I was writing the tutorial, but thought that using wrappers simplifies tutorial a bit. Now I know that it is not worth it, time to fix it.

Links broken

Many links to this repo in the notebooks are broken, e.g this link returns Unknown type.

Mixing teacher forcing with "feed previous"

As you mentioned in the start of the 2nd tutorial, it is good idea to mix teacher forcing with "feed previous" technique, while decoding. Just thought I could share some ideas on how to do that.

prob = 0.5 # set as placeholder or tf.constant
r = tf.random_normal(shape=[],mean=prob, stddev=0.5, dtype=tf.float32) # get a random value
feed_previous = r > prob # sample -> True/False

In the loop_fn_transition function, you could add an outer condition like this.

if feed_previous:
  input = tf.cond(finished, padded_next_input, search_for_next_input)
else:
  input = tf.cond(finished, padded_next_input, fetch_next_decoder_target)

The fetch_next_decoder_target function is supposed to fetch the next decoder target by indexing decoder_targets with time - decoder_targets[time]. Though you need to transpose decoder_targets to "time major" format.

Hope this helps. I will try this and add a pull request if I find time.

Update tutorial #3 to match new TF seq2seq API

Whenever it is released.

ask for help

I enjoyed your "3-seq2seq-native-new.ipynb" tutorial, and decided to try it out myself!

I've run into an error :
AttributeError: module 'tensorflow.contrib.seq2seq' has no attribute 'prepare_attention'

I know this mistake but I don't know how to deal with it.Therefore could you give me some advices for this problem?
Looking for your reply,thank you.

A bug in the second model?

The codes in the article Advanced dynamic seq2seq with TensorFlow (In [13])
encoder_outputs = tf.concat((encoder_fw_outputs, encoder_fw_outputs), 2)
should be
encoder_outputs = tf.concat((encoder_fw_outputs, encoder_bw_outputs), 2)
isn't it?

Correct Way Prepare Data?

Hi, thks for this wonderful tutorial, in new_seq2seq.py seems index 0 and 1 was placed by EOS and PAD, so should I prepare my data start from 2 rather than from 0. For instance:

if I have these corpus:

Alice saw a big rabbit in forest.

then should I change vocab_int_map into this?

{'Alice': 2, 'saw': 3, 'a': 4,...}

so that all vocab index will not have 0 and 1.
wish get your help, cause I trained the model but not sure how to deal with this problem. thnks in advance!

How to connect multilayered encoder to decoder?

I want to use an encoder cell which looks like this :

tf.contrib.rnn.MultiRNNCell([tf.contrib.rnn.DropoutWrapper(tf.contrib.rnn.LSTMCell(rnn_size), keep_prob) for _ in range(num_layers)])

When I use multilayered encoder (say num_layers=2). This is what I get:

AttributeError: 'tuple' object has no attribute 'c'

Please help. I want to understand what dynamic decoder is returning case of multilayered encoder? I also tried to read the source code on tensorflow repo. But, those are far too difficult for me to understand.

testing

All the three tutorials are very good but it would be nice if you show us how to test the model.
We have trained the model but with the real data how would we able to use the model to translate the unknown test data.

[Discussion] General tutorial improvements?

I would like to ask people who used these tutorials and opened issues early on to reflect a bit:
After you learned seq2seq implementation in TF, what would you change in the tutorials? What was confusing/misleading?

@brortao
@suriyadeepan
@yanwii

decoder_outputs_train have different dim!

            (self.decoder_outputs_train,
             self.decoder_state_train,
             self.decoder_context_state_train) = (
                seq2seq.dynamic_rnn_decoder(
                    cell=self.decoder_cell,
                    decoder_fn=decoder_fn_train,
                    inputs=self.decoder_train_inputs_embedded,
                    sequence_length=self.decoder_train_length,
                    time_major=self.time_major,
                    scope=scope,
                )
            )

i got some different dim of decoder_outputs_train.

Traceback liks this:

InvalidArgumentError (see above for traceback): logits and labels must have the same first dimension, got logits shape [500,1546] and labels shape [550]

if the format of my train data is wrong ?
the labels' shape is 550!

decoder input

decoder input is not explained well. I do not understand why we have input for decoder.
because decoder input should be the start token and output of previous state of decoder

My own embedding

I am new to tensorflow. So pardon me if some of these things are too basic.

I just wanted to confirm what changes will I have to do in case I want to use my own encoding. As you told, I have the input vector of [max_time, batch_size, embedding_size]. Here is my thought about this.
I will remove these lines:
encoder_inputs = tf.placeholder(shape=(None, None), dtype=tf.int32, name='encoder_inputs')
decoder_targets = tf.placeholder(shape=(None, None), dtype=tf.int32, name='decoder_targets')
decoder_inputs = tf.placeholder(shape=(None, None), dtype=tf.int32, name='decoder_inputs')

Instead of these lines:

embeddings = tf.Variable(tf.random_uniform([vocab_size, input_embedding_size], -1.0, 1.0), dtype=tf.float32)
encoder_inputs_embedded = tf.nn.embedding_lookup(embeddings, encoder_inputs)
decoder_inputs_embedded = tf.nn.embedding_lookup(embeddings, decoder_inputs)

I'll put these:

encoder_inputs_embedded = tf.placeholder(shape=(time, None, 4), dtype=tf.float32, name='encoder_embeddings')
decoder_inputs_embedded = tf.placeholder(shape=(time, None, 4), dtype=tf.float32, name='encoder_embeddings')

But my doubt is in the projection layer. You are projecting it to vocabulary size although I have the embedding size. So, If I change everything to embedding_size, will the logic and working of the code still remain the same? According to me, it should.

Thanks

2-seq2seq-advanced Last Checkpoint: 26 minutes ago (unsaved changes)

While running the following code, I came cross the errors, please help. Many thanks
max_batches = 3001
batches_in_epoch = 1000

try:
for batch in range(max_batches):
fd = next_feed()
_, l = sess.run([train_op, loss], fd)
loss_track.append(l)

    if batch == 0 or batch % batches_in_epoch == 0:
        print('batch {}'.format(batch))
        print('  minibatch loss: {}'.format(sess.run(loss, fd)))
        predict_ = sess.run(decoder_prediction, fd)
        for i, (inp, pred) in enumerate(zip(fd[encoder_inputs].T, predict_.T)):
            print('  sample {}:'.format(i + 1))
            print('    input     > {}'.format(inp))
            print('    predicted > {}'.format(pred))
            if i >= 2:
                break
        print()

except KeyboardInterrupt:
print('training interrupted')

InvalidArgumentError: Cannot assign a device to node 'Adam/update_Variable/sub_3': Could not satisfy explicit device specification '' because the node was colocated with a group of nodes that required incompatible device '/job:localhost/replica:0/task:0/GPU:0'
Colocation Debug Info:
Colocation group had the following types and devices:
NoOp: GPU CPU
AssignSub: GPU CPU
ScatterAdd: GPU CPU

Attempted an implementation

I really enjoyed your "Advanced dynamic seq2seq with TensorFlow" tutorial, and decided to try it out myself! I wanted to take a corpus of english quotes, and create an encoder-decoder that could reconstruct the quotes from the meaning vector (the hidden state).

I've run into an error in the softmax_entropy_with_logits:
InvalidArgumentError (see above for traceback): logits and labels must be same size: logits_size=[1000,27994] labels_size=[500,27994] ( sequences have 5 timesteps, batch size is 100, vocab size is 27994).

I've been looking over my code for hours now, but can't find the mistake. I know it's a long shot, but would you be willing to take a look to see where I've gone wrong?

The code is here, and the 'problem' might be around line 246:
https://github.com/scottleith/lstm/blob/master/Attempted%20encoder-decoder%20LSTM.py

The raw data can be downloaded here: https://github.com/alvations/Quotables/blob/master/author-quote.txt

I also apologize if this is an inappropriate place to ask - I wanted to contact you, but github doesn't make it easy!

toy task-example1-error

I have no problem executing Test forward pass.
but the output for Test forward pass are
batch_encoded:
[[6 3 9]
[0 4 8]
[0 0 7]]
decoder inputs:
[[1 1 1]
[0 0 0]
[0 0 0]
[0 0 0]]
decoder predictions:
[[9 3 3]
[8 3 0]
[0 3 0]
[0 3 0]]

I belived the decoder predictions need not to be same as yours. so I move forward to Training on Toy task. I am getting error.

InvalidArgumentError Traceback (most recent call last)
/home/bharathi/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
1322 try:
-> 1323 return fn(*args)
1324 except errors.OpError as e:

/home/bharathi/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py in _run_fn(session, feed_dict, fetch_list, target_list, options, run_metadata)
1301 feed_dict, fetch_list, target_list,
-> 1302 status, run_metadata)
1303

/home/bharathi/anaconda3/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py in exit(self, type_arg, value_arg, traceback_arg)
472 compat.as_text(c_api.TF_Message(self.status.status)),
--> 473 c_api.TF_GetCode(self.status.status))
474 # Delete the underlying status object from memory otherwise it stays alive

InvalidArgumentError: ConcatOp : Dimensions of inputs should match: shape[0] = [100,20] vs. shape[1] = [3,20]
[[Node: plain_decoder/while/plain_decoder/lstm_cell/concat = ConcatV2[N=2, T=DT_FLOAT, Tidx=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"](plain_decoder/while/TensorArrayReadV3, plain_decoder/while/Identity_3, plain_decoder/while/plain_decoder/lstm_cell/concat/axis)]]

During handling of the above exception, another exception occurred:

InvalidArgumentError Traceback (most recent call last)
in ()
7 for batch in range(max_batches):
8 fd = next_feed()
----> 9 _, l = sess.run([train_op, loss], fd)
10 loss_track.append(l)
11

/home/bharathi/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py in run(self, fetches, feed_dict, options, run_metadata)
887 try:
888 result = self._run(None, fetches, feed_dict, options_ptr,
--> 889 run_metadata_ptr)
890 if run_metadata:
891 proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

/home/bharathi/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
1118 if final_fetches or final_targets or (handle and feed_dict_tensor):
1119 results = self._do_run(handle, final_targets, final_fetches,
-> 1120 feed_dict_tensor, options, run_metadata)
1121 else:
1122 results = []

/home/bharathi/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
1315 if handle is None:
1316 return self._do_call(_run_fn, self._session, feeds, fetches, targets,
-> 1317 options, run_metadata)
1318 else:
1319 return self._do_call(_prun_fn, self._session, handle, feeds, fetches)

/home/bharathi/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
1334 except KeyError:
1335 pass
-> 1336 raise type(e)(node_def, op, message)
1337
1338 def _extend_graph(self):

Caused by op 'plain_decoder/while/plain_decoder/lstm_cell/concat', defined at:
File "/home/bharathi/anaconda3/lib/python3.5/runpy.py", line 184, in _run_module_as_main
"main", mod_spec)
File "/home/bharathi/anaconda3/lib/python3.5/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/bharathi/anaconda3/lib/python3.5/site-packages/ipykernel/main.py", line 3, in
app.launch_new_instance()
File "/home/bharathi/anaconda3/lib/python3.5/site-packages/traitlets/config/application.py", line 653, in launch_instance
app.start()
File "/home/bharathi/anaconda3/lib/python3.5/site-packages/ipykernel/kernelapp.py", line 474, in start
ioloop.IOLoop.instance().start()
File "/home/bharathi/anaconda3/lib/python3.5/site-packages/zmq/eventloop/ioloop.py", line 162, in start
super(ZMQIOLoop, self).start()
File "/home/bharathi/anaconda3/lib/python3.5/site-packages/tornado/ioloop.py", line 887, in start
handler_func(fd_obj, events)
File "/home/bharathi/anaconda3/lib/python3.5/site-packages/tornado/stack_context.py", line 275, in null_wrapper
return fn(*args, **kwargs)
File "/home/bharathi/anaconda3/lib/python3.5/site-packages/zmq/eventloop/zmqstream.py", line 440, in _handle_events
self._handle_recv()
File "/home/bharathi/anaconda3/lib/python3.5/site-packages/zmq/eventloop/zmqstream.py", line 472, in _handle_recv
self._run_callback(callback, msg)
File "/home/bharathi/anaconda3/lib/python3.5/site-packages/zmq/eventloop/zmqstream.py", line 414, in _run_callback
callback(*args, **kwargs)
File "/home/bharathi/anaconda3/lib/python3.5/site-packages/tornado/stack_context.py", line 275, in null_wrapper
return fn(*args, **kwargs)
File "/home/bharathi/anaconda3/lib/python3.5/site-packages/ipykernel/kernelbase.py", line 276, in dispatcher
return self.dispatch_shell(stream, msg)
File "/home/bharathi/anaconda3/lib/python3.5/site-packages/ipykernel/kernelbase.py", line 228, in dispatch_shell
handler(stream, idents, msg)
File "/home/bharathi/anaconda3/lib/python3.5/site-packages/ipykernel/kernelbase.py", line 390, in execute_request
user_expressions, allow_stdin)
File "/home/bharathi/anaconda3/lib/python3.5/site-packages/ipykernel/ipkernel.py", line 196, in do_execute
res = shell.run_cell(code, store_history=store_history, silent=silent)
File "/home/bharathi/anaconda3/lib/python3.5/site-packages/ipykernel/zmqshell.py", line 501, in run_cell
return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
File "/home/bharathi/anaconda3/lib/python3.5/site-packages/IPython/core/interactiveshell.py", line 2717, in run_cell
interactivity=interactivity, compiler=compiler, result=result)
File "/home/bharathi/anaconda3/lib/python3.5/site-packages/IPython/core/interactiveshell.py", line 2821, in run_ast_nodes
if self.run_code(code, result):
File "/home/bharathi/anaconda3/lib/python3.5/site-packages/IPython/core/interactiveshell.py", line 2881, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "", line 8, in
dtype=tf.float32, time_major=True, scope="plain_decoder",
File "/home/bharathi/anaconda3/lib/python3.5/site-packages/tensorflow/python/ops/rnn.py", line 614, in dynamic_rnn
dtype=dtype)
File "/home/bharathi/anaconda3/lib/python3.5/site-packages/tensorflow/python/ops/rnn.py", line 777, in _dynamic_rnn_loop
swap_memory=swap_memory)
File "/home/bharathi/anaconda3/lib/python3.5/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2816, in while_loop
result = loop_context.BuildLoop(cond, body, loop_vars, shape_invariants)
File "/home/bharathi/anaconda3/lib/python3.5/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2640, in BuildLoop
pred, body, original_loop_vars, loop_vars, shape_invariants)
File "/home/bharathi/anaconda3/lib/python3.5/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2590, in _BuildLoop
body_result = body(*packed_vars_for_body)
File "/home/bharathi/anaconda3/lib/python3.5/site-packages/tensorflow/python/ops/rnn.py", line 762, in _time_step
(output, new_state) = call_cell()
File "/home/bharathi/anaconda3/lib/python3.5/site-packages/tensorflow/python/ops/rnn.py", line 748, in
call_cell = lambda: cell(input_t, state)
File "/home/bharathi/anaconda3/lib/python3.5/site-packages/tensorflow/python/ops/rnn_cell_impl.py", line 183, in call
return super(RNNCell, self).call(inputs, state)
File "/home/bharathi/anaconda3/lib/python3.5/site-packages/tensorflow/python/layers/base.py", line 575, in call
outputs = self.call(inputs, *args, **kwargs)
File "/home/bharathi/anaconda3/lib/python3.5/site-packages/tensorflow/python/ops/rnn_cell_impl.py", line 611, in call
lstm_matrix = self._linear1([inputs, m_prev])
File "/home/bharathi/anaconda3/lib/python3.5/site-packages/tensorflow/python/ops/rnn_cell_impl.py", line 1189, in call
res = math_ops.matmul(array_ops.concat(args, 1), self._weights)
File "/home/bharathi/anaconda3/lib/python3.5/site-packages/tensorflow/python/ops/array_ops.py", line 1099, in concat
return gen_array_ops._concat_v2(values=values, axis=axis, name=name)
File "/home/bharathi/anaconda3/lib/python3.5/site-packages/tensorflow/python/ops/gen_array_ops.py", line 706, in _concat_v2
"ConcatV2", values=values, axis=axis, name=name)
File "/home/bharathi/anaconda3/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/bharathi/anaconda3/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2956, in create_op
op_def=op_def)
File "/home/bharathi/anaconda3/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1470, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): ConcatOp : Dimensions of inputs should match: shape[0] = [100,20] vs. shape[1] = [3,20]
[[Node: plain_decoder/while/plain_decoder/lstm_cell/concat = ConcatV2[N=2, T=DT_FLOAT, Tidx=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"](plain_decoder/while/TensorArrayReadV3, plain_decoder/while/Identity_3, plain_decoder/while/plain_decoder/lstm_cell/concat/axis)]]

good job

Potential bug of the dynamic_decoder

If I set the num_units of the cell in decoder as the twice as the one in encoder (just like what the tutorial does), everything goes well. But If I set, say, both of the encoder and decoder's cell have the same num_units. Then the incompatible shape error occurs. Is it related to the LSTM issue? Thanks.