nyu-dl / dl4mt-tutorial Goto Github PK

License: BSD 3-Clause "New" or "Revised" License

Python 80.95% Smalltalk 0.62% Emacs Lisp 5.54% NewLisp 0.51% Perl 7.31% Ruby 0.54% Slash 0.12% SystemVerilog 0.06% Shell 4.09% JavaScript 0.27%

dl4mt-tutorial's Introduction

dl4mt-material

dl4mt-tutorial's People

Contributors

Stargazers

Watchers

Forkers

samim23 menkeleev mbartoli chrishokamp alvations icemansina epavlick pengli09 agduran multipath skaasj vseledkin qingxianlai orhanf zhengyuan-nlp ustctf-zz tttwwy kastnerkyle osmanbaskaya gaebor le02146 hermanschaaf g-wang franck-dernoncourt jakezhaojb merc85garcia liangkai lemaoliu trumanhe soroushmehr wolfhu se4u cinjon caigaojiang stephanheijl fandongmeng vineetm yanweifu anirudh9119 tihanyi zouwuhe clemdoum kelvinxu twangnyc jayparks aalmah totuta cc13ny shashankg7 v-chuqin chubbymaggie chiraagrlala tnq177 kevinwenya happywwy yuxiang-wu hrishikeshvganu einsnull ml-ai-nlp-ir lngvietthang zhangjcqq chagge sunqf miradel51 xnlp bastings tsingcoo amirj udemirezen rjbashar csdnlzh cxysteven gregnwosu ypruan zbxzc35 melaniechenmc watermars yinghanwang chqiwang hexingwei fancyerii yufish mosesli zhouh adrianlsk howardchenhd liuhy0908 walkerwu zhimingz hfxunlp zephyrzilla ml-lab yangzlthu aripakman kangfend junhongwang redsuncmx slye0612 yochju qjay612

dl4mt-tutorial's Issues

GRU attention with 2 step process - seesion 2

First you computed the hidden activation as the normal gru layer (h1). Then adding context information to another set of gru computation to get the final output of h2.
Could you site the reference for the GRU attention equations in session 2 ? Or any reason to that ?

trying to run CPU multicore

I added the OMP_NUM_THREADS=8
in front of
python ./train_lm.py
but I get the feeling it is still running on 1 core.

any idea ?

Random Translations?

Hi,
I've managed to run train_nmt.py on my parallel (monolingual - sarcastic english to non sarcastic english) data set, but the samples generated during training have nothing to do with the source sentences.

When the training is finished, the translations created are also meaningless - "I I I " and "UNK" and such.

What am I doing wrong? thanks !

Lotem

NaN with modified attention

@laulysta, @jakezhaojb, @kyunghyuncho

This might be related to issue #29. I'm having numerical issues (NaNs) with the modified attention introduced in pull request #15.

I'll go though the code to see what it does soon. What are the equations that are supposed to describe the modified GRU with attention? Do you have empirical results comparing both versions?

Preparing dataset for neural MT

I want to train the Attention-based encoder-decoder model for machine translation (Session 2). It is unclear to me how to prepare the dataset for this. I tried running preprocess.sh in data/ with S=en and T=fr and specifying appropriate paths for P1 and P2. I get the following error:
./preprocess.sh: line 18: all_en-fr.en: No such file or directory
./preprocess.sh: line 19: all_en-fr.fr: No such file or directory
./preprocess.sh: line 23: all_en-fr.en.tok: No such file or directory
./preprocess.sh: line 26: all_en-fr.fr.tok: No such file or directory
./preprocess.sh: line 29: all_en-fr.en.tok: No such file or directory
./preprocess.sh: line 30: all_en-fr.fr.tok: No such file or directory

Are files all_en-fr.en and all_en-fr.fr need to be downloaded? Sorry if these questions are already answered and I didn't find it. Thanks for the help!

What's the difference between 'nmt.py' in session 2 and session 3?

Unfortunately, following the difference in the source codes does not help me to solve the logical difference between 'nmt.py' in session 2 and session 3. Would you please describe the logical difference?

UNK replacement

Does a good will feel like implementing this promising http://arxiv.org/pdf/1410.8206v4.pdf way to replace UNK using this code https://github.com/sebastien-j/LV_groundhog/blob/master/experiments/nmt/replace_UNK.py ?

why convert the value of matrix to the type with astype('float32')?

Hi~
the 106 lines of lm.py in session0, why to return the value of matrix of weight with type 'float32'?
this is a bit stupid question.
Thanks!

I can't re-run a previous perfect train_nmt.py

I had trained a NMT network about one month ago, it was great.
Today, I can't run the same code for the same dataset. Here is the output:

`Loading data
Building model
Buliding sampler
Building f_log_probs... Done
Building f_cost... Done
Computing gradient... Done
Building optimizers... Done
Optimization
Traceback (most recent call last):
File "train_nmt.py", line 54, in
'learning-rate': [0.001]})
File "train_nmt.py", line 33, in main
use_dropout=params['use-dropout'][0])
File "/Users/AmirHJ/projects/deep-learning/seq2seq/nmt2.py", line 1176, in train
cost = f_grad_shared(x, x_mask, y, y_mask)
File "/usr/local/lib/python2.7/site-packages/theano/compile/function_module.py", line 871, in call
storage_map=getattr(self.fn, 'storage_map', None))
File "/usr/local/lib/python2.7/site-packages/theano/gof/link.py", line 314, in raise_with_op
reraise(exc_type, exc_value, exc_trace)
File "/usr/local/lib/python2.7/site-packages/theano/compile/function_module.py", line 859, in call
outputs = self.fn()
ValueError: all the input array dimensions except for the concatenation axis must match exactly
Apply node that caused the error: Join(TensorConstant{0}, InplaceDimShuffle{x,0,1,2}.0, InplaceDimShuffle{x,0,1,2}.0, InplaceDimShuffle{x,0,1,2}.0)
Toposort index: 999
Inputs types: [TensorType(int8, scalar), TensorType(float32, (True, False, False, False)), TensorType(float32, (True, False, False, False)), TensorType(float32, (True, False, False, False))]
Inputs shapes: [(), (1, 26, 32, 200), (1, 26, 32, 400), (1, 26, 32, 24)]
Inputs strides: [(), (665600, 25600, 800, 4), (1331200, 51200, 1600, 4), (79872, 3072, 96, 4)]
Inputs values: [array(0, dtype=int8), 'not shown', 'not shown', 'not shown']
Outputs clients: [[InplaceDimShuffle{0,1,2,3}(Join.0)]]

Backtrace when the node is created(use Theano flag traceback.limit=N to make it longer):
File "train_nmt.py", line 54, in
'learning-rate': [0.001]})
File "train_nmt.py", line 33, in main
use_dropout=params['use-dropout'][0])
File "/Users/AmirHJ/projects/deep-learning/seq2seq/nmt2.py", line 1076, in train
build_model(tparams, model_options)
File "/Users/AmirHJ/projects/deep-learning/seq2seq/nmt2.py", line 639, in build_model
prefix='ff_logit_lstm', activ='linear')
File "/Users/AmirHJ/projects/deep-learning/seq2seq/nmt2.py", line 220, in fflayer
tensor.dot(state_below, tparams[_p(prefix, 'W')]) +

HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.

If you suspect this is an IPython bug, please report it at:
https://github.com/ipython/ipython/issues
or send an email to the mailing list at [email protected]

You can print a more detailed traceback right now with "%tb", or use "%debug"
to interactively debug it.

Extra-detailed tracebacks for bug-reporting purposes can be enabled via:
%config Application.verbose_crash=True`

NaN detected

After some iterations the following error is occurred:

...
128 samples computed
256 samples computed
384 samples computed
462 samples computed
Valid 2.78936
Epoch 0 Update 88100 Cost 0.587470054626 UD 1.63904905319
NaN detected

Reloading the model using reload=True parameter, continue the learning process.
It maybe related to this issue.

How I can run the language model

Hi,
I think it is just a simple question.

I'm new to dl4mt, and I wonder how I can run the neural language model of session0, since I can't find the code to download the wiki data needed.

Thanks.

Would it be possible to release code in Python 3.4+ for any future repos?

I find this repo is a reference in this domain and highly important. I just hope we could migrate to Python 3.4+ for any future repos, so that we're future-oriented and moving forward.

MT corpus and approach

I'm training the session3/nmt.py with attention on the europarl corpus with 5000-term vocabulary, 250 word vector dimension, 500 internal representation dimension. It takes a couple of days for a full epoch on AWS GPU instance (Nvidia K40) with 4G GPU memory. I'm just wondering if there's any knowingly more basic parallel corpus (eg understandable by 10-yo) for training.
BLEU metric could make it anodyne for the translation to omit some pivotal words, eg if the correct translation is "I believe, that xxxx" and the machine translation omitted "believe". Would there by any idea that an MT approach could better conserve the structural/compositional information?

What's the role of 'dim_word' parameter in the 'train()' method?

It seems that a kind of embedding has been leveraged in nmt. Specially, there is a 'dim_word' parameter in the train() method.
The embedding matrix has been defined in the 'init_params()' method but I don't understand the role of embedding in the architecture of Encoder or Decoder. I think that the representation of words is 1-of-K coding as described here. Would you please clear why you're using this kind of embedding and the point of that?

discrepancy between paper and code

Starting at line 716 of nmt.py (https://github.com/nyu-dl/dl4mt-tutorial/blob/master/session2/nmt.py#L716) to compute \tilde{t}_{i}, why does the code put variables of each addend through a separate fflayer for each addend as opposed to just multiplying variables of each addend like in the last equation of A.2.2 DECODER section in https://arxiv.org/pdf/1409.0473.pdf: \tilde{t}_{i} =& U_o s_{i - 1} + V_o E y_{i-1} + C_o c_i.

README missing

Please provide a README for this.

On decoder to softmax layer projection in session 1

In session 1, nmt.py(https://github.com/kyunghyuncho/dl4mt-material/blob/master/session1/nmt.py)
Line 529, I'm wondering whether 'proj_h = proj[0]' should be replaced as 'proj_h = proj'.

The proj[0] only records the decoder hidden state in time 0 (note in gru_cond_simple_layer, you are returning rval not [rval]). Once the shape of logit_lstm and logit_prev are output, you will find them inconsistent.

For session 2 with attention, there is no problem.

Would you please document the code?

Hi,
I'm going to play with this project to train an encode-decoder model. Unfortunately, the source code has not been documented so reading the source is difficult. Would you please refer me to some pages to better understand the code?
Thank you.

Refactoring

Would you mind if I cast the entire code to Python 3.4+, separate out common code as the optimizers, and possibly rewrite a bit the command line handling using argparse?

why decoupling 'rescore' from 'translate'?

It seems that the role of 'restore_with_lm' is to rescoring the output of 'translate.py' using language model. Is it true?
I think the decoder ('translate.py') MUST leverage a language model in generating the output. Why these two modules have been decoupled?

Session 1

Hi,
Is it "correct" that session 1 test after training gives meaningless sentences ?

thanks

Question

Hi,
i read your articles and your posts in different places.
is this repo the best starting point to try to build a proof of concept system ?

I would love to try with a big corpus, because my guess is that it will perform well with lots of data.

cheers,
Vinny

Small problem in merge.sh

The following line:
https://github.com/kyunghyuncho/dl4mt-material/blob/master/data/merge.sh#L11
should be:
for F in *.${1}

Why the grads have to be shared?

We find in theano tutorial, the update in the theano function uses grads directly.
However, in the "sgd" function in your code, the grads are put into gshared, could you tell me the reason? Thank you!

compare to groundhog

Is it as accurate as groundhog in machine translation?

Thanks in advance

"reload = true" behavior ?

Hi,
Just a quick question regarding reload=true.

I would expect, after reloading an existing model to ge the same range of cost and "accuracy" on translation for the displayed samples.
After a reload, I observe much higher costs and the samples during retraining get apretty bad translation.. it seem to take time to "recatch" the accuracy.

is this normal behavior ?

thanks

ValueError: unsupported pickle protocol: 3

Traceback (most recent call last): File "./train_lm.py", line 41, in <module> 'reload': [False]}) File "./train_lm.py", line 27, in main use_dropout=params['use-dropout'][0]) File "/mnt/f/Programing/dl4mt_codes/codes/dl4mt-tutorial/session0/lm.py", line 641, in train worddicts = pkl.load(f) ValueError: unsupported pickle protocol: 3
i use the data of europarl-v7.fr-en suggest at #62 instead wiki data, and hanpen the problem above.

Thanks

Output contains lot of UNKs

I changed the paths in train_nmt.py and nmt.py to point to where they are located locally. I have pasted below the output after running it. It doesn't look right to me since most of them are UNKs. Please let me know whether this output is correct or not.

Using gpu device 2: Graphics Device (CNMeM is disabled)
{'use-dropout': [False], 'dim': [1024], 'optimizer': ['adadelta'], 'dim_word': [512], 'reload': [True], 'clip-c': [1.0], 'n-words': [30000], 'model': ['model_hal.npz'], 'learning-rate': [0.0001], 'decay-c': [0.0]}
Loading data
Building model
/share/apps/python-2.7.10/lib/python2.7/site-packages/theano/scan_module/scan.py:1019: Warning: In the strict mode, all neccessary shared variables must be passed as a part of non_sequences
'must be passed as a part of non_sequences', Warning)
Building sampler
Building f_init... Done
Building f_next.. Done
Building f_log_probs... Done
Building f_cost... Done
Computing gradient... Done
Building optimizers... Done
Optimization
Minibatch with zero sample under length 50
Epoch 0 Update 10 Cost 44.2246017456 UD 0.444377183914
Epoch 0 Update 20 Cost 52.9744186401 UD 0.699009895325
Epoch 0 Update 30 Cost 29.5254020691 UD 0.416825056076
Epoch 0 Update 40 Cost 43.1912841797 UD 0.670005083084
Epoch 0 Update 50 Cost 21.6586799622 UD 0.362446069717
Minibatch with zero sample under length 50
Epoch 0 Update 60 Cost 32.2629852295 UD 0.534498929977
Epoch 0 Update 70 Cost 15.2928380966 UD 0.297899007797
Minibatch with zero sample under length 50
Epoch 0 Update 80 Cost 24.2991943359 UD 0.480392932892
Epoch 0 Update 90 Cost 87.2202224731 UD 0.497414112091
Epoch 0 Update 100 Cost 25.941608429 UD 0.506604909897
Saving the best model... Done
Saving the model at iteration 100... Done
Source 0 : UNK UNK UNK UNK UNK UNK UNK UNK UNK UNK UNK UNK UNK UNK UNK UNK .
Truth 0 : À UNK UNK UNK UNK UNK UNK , UNK UNK a UNK UNK UNK UNK UNK UNK UNK UNK UNK UNK .
Sample 0 : UNK UNK UNK UNK UNK UNK UNK UNK UNK UNK UNK UNK UNK UNK à UNK UNK UNK UNK UNK
Source 1 : I UNK UNK UNK UNK UNK UNK UNK UNK UNK UNK , UNK UNK a UNK UNK UNK UNK UNK UNK UNK UNK .
Truth 1 : UNK UNK UNK UNK UNK UNK UNK UNK UNK UNK UNK , UNK UNK UNK UNK UNK UNK UNK UNK UNK .
Sample 1 : UNK UNK UNK UNK UNK UNK UNK UNK UNK UNK UNK UNK UNK UNK UNK UNK UNK UNK , UNK UNK UNK UNK UNK UNK UNK UNK UNK UNK UNK
Source 2 : UNK UNK , UNK UNK UNK , UNK UNK UNK UNK UNK UNK UNK UNK UNK .
Truth 2 : UNK UNK UNK , j UNK UNK UNK d UNK UNK UNK UNK à UNK UNK UNK UNK UNK UNK UNK .
Sample 2 : UNK UNK UNK UNK UNK , UNK UNK UNK UNK UNK UNK UNK UNK à UNK UNK UNK UNK UNK , UNK UNK
Source 3 : UNK , UNK UNK UNK UNK UNK UNK UNK UNK .
Truth 3 : UNK UNK , UNK UNK UNK UNK UNK UNK UNK UNK UNK UNK UNK UNK UNK UNK UNK UNK UNK .
Sample 3 : UNK UNK UNK , UNK UNK UNK UNK UNK UNK UNK UNK UNK UNK
Source 4 : UNK UNK UNK UNK UNK UNK UNK UNK UNK UNK a UNK UNK UNK UNK UNK , UNK UNK UNK UNK UNK .
Truth 4 : UNK UNK UNK UNK UNK UNK UNK UNK , UNK UNK UNK , UNK UNK UNK UNK UNK UNK UNK .
Sample 4 : UNK UNK UNK UNK UNK UNK UNK UNK UNK UNK UNK UNK UNK UNK UNK UNK UNK UNK UNK UNK UNK UNK UNK UNK UNK UNK UNK UNK UNK UNK

How to build the dataset 'all.en.concat.gz.pkl' in session2/train_nmt_all.py?

Sorry, it may not be an issue. However, can you provide any idea about how to make the dataset?

Asymmetry in read gate application

This probably does not make much difference, but I noticed that the read gates r1 and r2 in gru_cond_layer method are used slightly differently:

Here (https://github.com/nyu-dl/dl4mt-tutorial/blob/master/session3/nmt.py#L448) the hidden state is computed as:

h1 = tanh(xx_ + r1*(Ux*h)) where xx_ is Wx*state_below + bx [Notice that the read gate r1 is not applied onto the bias bx]

However, when computing the second hidden state h2 at (https://github.com/nyu-dl/dl4mt-tutorial/blob/master/session3/nmt.py#L477) the hidden state is computed as:

h2 = tanh(Wcx*ctx_ + r2*(Ux_nl*h1 + bx_nl)) [Notice that the read gate r2 is applied onto the bias bx_nl]
If r2 "kills" some dimensions of the bias term bx_nl then some decision hyperplanes of Wcx are forced to go through origin.

Is this asymmetry intended?

dim == dim_nonlin and nin == nin_nonlin must be always true?

Hello,
It's probable that I just misunderstood the code but I think that in the param_init_gru_cond function,
https://github.com/nyu-dl/dl4mt-tutorial/blob/master/session3/nmt.py#L390
the variable dim must equal dim_nonlin. Same is true for nin and nin_nonlin.

This is because the matrix W and Wx have dimensions (nin, 2*dim) and (nin_nonlin, dim_nonlin) respectively (https://github.com/nyu-dl/dl4mt-tutorial/blob/master/session3/nmt.py#L339).

However, both W and Wx are multiplied with state_below_ (https://github.com/nyu-dl/dl4mt-tutorial/blob/master/session3/nmt.py#L429) which would imply that nin==nin_nonlin.

Similarly, at (https://github.com/nyu-dl/dl4mt-tutorial/blob/master/session3/nmt.py#L445) r1 (of size dim) is multiplied with tensor.dot(h_, Ux) (of size dim_nonlin) which would imply dim==dim_nonlin. Is my understanding correct? If yes, is there a reason for having dim_nonlin and nin_nonlin?

Thank you.

Any accompanying slides?

Just wondering if there're any slides that'd make the code even clearer. Thanks in advance.

Multiple translation for a single input

It is well known in the field of translation that a single sentence has multiple translations. Although current translation corpora don't support this feature but I think it is better if the tools support this feature.
Training data contains an input sentence and a list of corresponding translations. During training the model, the output of decoder is compared with all possible translations and the minimum cost will be back-propagated,

savez() argument after ** must be a mapping, not NoneType

I'm trying to train a model (using 'Session1') without the validation set. So, the same training set is passed as 'valid_datasets' parameter and set 'validFreq' parameter to a large number-please update me if it's not correct.

By the way, after training the model, the following error has been appeared:

...
17184 samples computed
17188 samples computed
Valid  20.967
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/Users/AmirHJ/projects/test/train_nmt.py in <module>()
     42         'use-dropout': [False],
     43         'learning-rate': [0.0001],
---> 44         'reload': [False]})
     45
     46

/Users/AmirHJ/projects/test/train_nmt.py in main(job_id, params)
     27                 sampleFreq=1000,
     28                 max_epochs=2,
---> 29                 use_dropout=params['use-dropout'][0])
     30     return validerr
     31

/Users/AmirHJ/projects/test/nmt.py in train(dim_word, dim, encoder, decoder, patience, max_epochs, finish_after, dispFreq, decay_c, alpha_c, lrate, n_words_src, n_words, maxlen, optimizer, batch_size, valid_batch_size, saveto, validFreq, saveFreq, sampleFreq, datasets, valid_datasets, dictionaries, use_dropout, reload_)
   1138     numpy.savez(saveto, zipped_params=best_p,
   1139                 history_errs=history_errs,
-> 1140                 **params)
   1141
   1142     return valid_err

TypeError: savez() argument after ** must be a mapping, not NoneType

reload training has unpredictable error

When setting reload to True, it has unpredictable error during validation error calculation. Traceback showed below:

Traceback (most recent call last):
  File "train_nmt.py", line 51, in <module>
    'reload': [True]})
  File "train_nmt.py", line 32, in main
    use_dropout=params['use-dropout'][0])
  File "/home/jz1672/Projects/charNMT/archive/CN_EN/2015_1203_cn_en_conv3x256_gru_unseg/nmt.py", line 1320, in train
    bad_counter += 1
UnboundLocalError: local variable 'bad_counter' referenced before assignment

L2 regularization on bias terms?

It is common not to apply L2 regularization on bias terms. However, from:
(https://github.com/nyu-dl/dl4mt-tutorial/blob/master/session3/nmt.py#L1079)
it seems that bias terms are included in the regularization. Is this intended?

hardcoded paths need to be fixed in nmt.py or train_nmt.py or both ?

Can someone please tell me if we need to fix both files ?

making changes in train_nmt.py seems to be ok, but not sure.

can you help ?
thanks.

By RNN Encoded,how to learn without exactly output vector???

By this tutorial:http://devblogs.nvidia.com/parallelforall/introduction-neural-machine-translation-gpus-part-2/.
i see that output of RNN Encoded depends on Matrix weight and Matrix input ( tanh function ), and when we learn with training data,this matrix is always changed>>So we can not have an exactly output when model is training ( with Encoded RNN ) .So how to learn weights and bias for RNN Encoded??Do we need a backpropagation from RNN Decoded to RNN Encoded to change weights and bias for RNN Encoded???Or by some other ways???

How to get started ???

Theano is installed.
dl4mt-material is pulled.
I ran setup_local_env.sh and it finished properly.

now what ?
Tried to run ./train.sh in session0 or session1 but got an error.
python can't open file ./train_lm.py eventhough the file is there in session0

what is the "cd $PBS_O_WORKDIR" ? don't see this variable nowhere.

help please.

Cost is Nan after one epoch if maxlen > 50

For some reason, the cost is Nan after one epoch if I increase 'maxlen' parameter to any number greater than 50.

➜  session2 git:(master) ✗ THEANO_FLAGS=floatX=float32 python train_nmt.py
WARNING (theano.configdefaults): Only clang++ is supported. With g++, we end up with strange g++/OSX bugs.
ERROR (theano.sandbox.cuda): nvcc compiler not found on $PATH. Check your nvcc installation and try again.
{'use-dropout': [True], 'dim': [1024], 'optimizer': ['adadelta'], 'dim_word': [512], 'reload': [True], 'clip-c': [1.0], 'n-words': [30000], 'model': ['model_hal.npz'], 'learning-rate': [0.0001], 'decay-c': [0.0]}
Reloading model options
Loading data
Building model
Reloading model parameters
Building sampler
Building f_init... Done
Building f_next.. Done
Building f_log_probs... Done
Building f_cost... Done
Computing gradient... Done
Building optimizers... Done
Optimization
Seen 5 samples
NaN detected

setup_local_env.sh was used for setup

Do you have any plan for the next deep learning for machine translation school?

I just interested to know if another deep learning for machine translation school will be available?
when and where?
Is it possible to access 2016 winter school videos?

How to define a task-specific validation measure?

In the current implementation, validError is computed according to log probabilities ('pred_probs' function). Is it possible to define a task specific validation error?
Do you recommend it?
please explain which part of the source code should be modified.
Thanks.

what is the role of 'maxlen' parameter?

'maxlen' is one of the parameters in 'train_nmt.py', set to 50 by default.
I get the following message during the training process: "Minibatch with zero sample under length 100"
Investigating the source code shows that this message is appear when there is a batch size that the length of the source and target is greater than 'maxlen'.
On the other hand, in 'data_iterator.py' training samples have been skipped when the length of source and target is greater than 'maxlen'.

Why such a contradiction is exist? -passing samples in data_iterator and then filter them in 'prpare-data'
If I set maxlen to a large value (1000 for example), the updating time is significantly increase, would you describe why?

Is it possible to train a deep bidirectional RNNs?

I just interested to know if it is possible to stack multiple layers of bidirectional RNNs using this project?

Where is dataset='/ichec/work/dl4mt_data/nec_files/wiki.tok.txt.gz'?

When I try to run the demo in session0, I notice there should be a wiki dataset, but it is missing. I go through the data folder and run the .sh file to download the europarl-v7.fr-en and newstest dataset, but I cannot find the wiki. Can you point out how to obtain the wiki.tok.txt.gz file?

Thanks

How to import (source,target) pairs into the project?

I have a list of (source,target) sentence pairs and I want to build an encoder-decoder from scratch. It seems that the nmt.py is based on standard datasets in machine translation and specific inputs (datasets and dictionaries) are required.
what is the best way to feed (source,target) pairs?

How to encode an input text?

I have trained a simple encoder-decoder model (session 1). Everything is OK and I can generate translation using translate.py
I'm interested to encode the input text using RNN model. Would you please help me?

Unnecessary bias term?

Is the bias term c_tt required here:
(https://github.com/nyu-dl/dl4mt-tutorial/blob/master/session3/nmt.py#L458) ? I think the softmax that follows will cancel the effect of that bias.

Bug in session3\translate.py

It seems that there is a bug in session3\translate.py because using session2\translate.py leads to a meaningful translations but when I'm using session3\translate.py with -b 1 parameter (other parameters are same), completely meaningless sentences are generated.
I can't find the bug, would you please check the source code?

Multi-Task Learning for Multiple Language Translation

@kyunghyuncho
Is it possible to easily extend this project to support Multi-task learning in machine translation?