nyu-dl / dl4mt-multi Goto Github PK

License: BSD 3-Clause "New" or "Revised" License

Python 93.83% Shell 6.17%

dl4mt-multi's Introduction

Multi-Way Neural Machine Translation

This repo implements multi-way Neural Machine Translation described in the paper "Multi-way Multilingual Neural Machine Translation with a Shared Attention Mechanism". In NAACL,2016.

With this repo, you can build a multi-encoder, multi-decoder or a multi-way NMT model. When you reduce the number of encoders and decoders to one respectively, you basically retain a single-pair NMT model with attention mechanism.

Dependencies:

The code consists of three major components for dependencies:

Core computational graphs (Theano)
Data streams (Fuel)
Training loop and extensions (Blocks)

Please use setup.sh for setting up your development environment.

Navigation:

The core computational graphs are written using pure Theano, and based on the implementations in dl4mt-tutorial.

We refer each source-target pair a computational graph, since we build an actual separate computational graph for each of them, where some of the parameters in these computational graphs are shared with other computational graphs.

In order to train multiple computational graphs, we need multiple data-streams, and a scheduler over them. This part is handled by Fuel and custom streams, along with development and test decoding streams.

Given the computational graphs and their corresponding data streams, training the parameters in the computational graphs is carried out by adapted training loop from Blocks.

Finally, this codebase is a refined combination of multiple codebases. The layer structure and handling of parameters are somehow similar to dl4mt-tutorial. The class hierarchy and experiment configuration resembles a pruned version of GroundHog and main-loop and extensions are quite similar to blocks-examples.

During the development of this codebase, we tried to be pragmatic and inherit the lessons learned from other NMT implementations, hope we picked the best parts not the worst 😌

Preparing Text Corpora:

The original text corpora could be downloaded from here.

In this repo, we do not handle downloading the data and tokenizing it. Please follow the steps described in dl4mt-tutorial for downloading and tokenization of the data. Once you've downloaded and tokenized the data, you can use scripts/encode_with_bpe_parallel.sh and scripts/encode_with_bpe_joint.sh to use sub-word units as input and output tokens (check scripts for details).

dl4mt-multi's People

Contributors

Stargazers

Watchers

dl4mt-multi's Issues

Module obeject has no attribute 'OrderedDict'

Hello,
I start this model by issuing command python train_mlnmt.py that raises an attribute error. I am not sure this is due to python version. I am using python 2.7.13

Stuck after entering main loop

Hi,
I am trying to train an 2-Encoder/1-Decoder model, but the training process seems to get stuck right before the first iteration.

The computation graphs seems to be built (allocates ca. 3.5GB on gpu0), the main loop of blocks is being entered:

-------------------------------------------------------------------------------
BEFORE FIRST EPOCH
-------------------------------------------------------------------------------
Training status:
         batch_interrupt_received: False
         epoch_interrupt_received: False
         epoch_started: True
         epochs_done: 0
         iterations_done: 0
         received_first_batch: False
         resumed_from: None
         training_started: True
Log records from the iteration 0:
         time_initialization: 3.81469726562e-06

but then nothing happens. GPU displays 0% load, waited for 1h, also tested with very small training files. The main CPU process is running at 100%. Has anybody seen behaviour like that? Any suggestions?

Thanks,
Marcin

Questions about _step_slice in line 285 of layers.py

Hi!

I have two quesitons:

The _step_slice function have 15 arguments listed in its definition, while len(sequences)+len(outputs_info)+len(non_sequences) is 17. Why?
step_slice returns h, ctx, alpha.T, why ctx_ and alpha.T will not disturb enc_ls, cc_ (arguments) ?

Issues related to validation

Hi thanks for sharing great work.

i'm trying to train multi-way model but encountered some issues related to intermediate validation.

first one is that the code never executes bleu validation.

i properly set 'bleu_val_freq' and 'bleu_script' config but nothing happens, validation is just skipped.

the second one is about log_prob_freq.

computing log probability on val set is executed at every log_prob_freq i specified, but it outputs error like this:

Traceback (most recent call last):
File "train_mlnmt.py", line 25, in
get_logprob_streams(config))
File "/home/pjh/dl4mt-multi/mlnmt.py", line 152, in train
main_loop.run()
File "/home/pjh/dl4mt-multi/mcg/algorithm.py", line 328, in run
reraise_as(e)
File "/home/pjh/codes/blocks/blocks/utils/init.py", line 225, in reraise_as
six.reraise(type(new_exc), new_exc, orig_exc_traceback)
File "/home/pjh/dl4mt-multi/mcg/algorithm.py", line 314, in run
while self._run_epoch():
File "/home/pjh/dl4mt-multi/mcg/algorithm.py", line 352, in _run_epoch
while self._run_iteration():
File "/home/pjh/dl4mt-multi/mcg/algorithm.py", line 374, in _run_iteration
self._run_extensions('after_batch', batch)
File "/home/pjh/dl4mt-multi/mcg/algorithm.py", line 382, in _run_extensions
extension.dispatch(CallbackName(method_name), *args)
File "/home/pjh/codes/blocks/blocks/extensions/init.py", line 328, in dispatch
self.do(callback_invoked, *(from_main_loop + tuple(arguments)))
File "/home/pjh/dl4mt-multi/mcg/extensions.py", line 375, in do
if numpy.isnan(numpy.mean(probs[cg_name])):
File "/usr/local/lib/python2.7/dist-packages/numpy/core/fromnumeric.py", line 2885, in mean
out=out, keepdims=keepdims)
File "/usr/local/lib/python2.7/dist-packages/numpy/core/_methods.py", line 72, in _mean
ret = ret / rcount
TypeError: unsupported operand type(s) for /: 'list' and 'int'

Original exception:
TypeError: unsupported operand type(s) for /: 'list' and 'int'

it seems numpy related issue, my numpy version is 1.11.0

Other than these, sampling_freq, save_freq and actual training works fine.
Any ideas or comments would be appreciated! FYI i'm attaching my configurations.

config['normalized_bleu'] = True
config['track_n_models'] = 3
config['output_val_set'] = True
config['beam_size'] = 12

config['log_prob_freq'] = 20
config['lob_prob_bs'] = 10

config['reload'] = True
config['save_freq'] = 25000
config['sampling_freq'] = 2500
config['bleu_val_freq'] = 25
config['val_burn_in'] = 1
config['finish_after'] = 2000000
config['incremental_dump'] = True

AttributeError: 'module' object has no attribute 'OrderedDict'

Hello,
I am using python 2.7.6. while running 'python train_mlnmt.py' I got following error.

File "train_mlnmt.py", line 27, in
train(config, get_tr_stream(config), get_dev_streams(config),
File "/home/development/jigar/jigar/dl4mt-multi/mcg/stream.py", line 214, in get_tr_stream
for k, v in config['src_vocabs'].iteritems()}
File "/home/development/jigar/jigar/dl4mt-multi/mcg/stream.py", line 214, in
for k, v in config['src_vocabs'].iteritems()}
AttributeError: 'module' object has no attribute 'OrderedDict'

I guess this error is specific to python version earlier to 2.6.

nyu-dl / dl4mt-multi Goto Github PK

dl4mt-multi's Introduction

Multi-Way Neural Machine Translation

Dependencies:

Navigation:

Preparing Text Corpora:

dl4mt-multi's People

Contributors

Stargazers

Watchers

Forkers

dl4mt-multi's Issues

Module obeject has no attribute 'OrderedDict'

Stuck after entering main loop

Questions about _step_slice in line 285 of layers.py

Issues related to validation

AttributeError: 'module' object has no attribute 'OrderedDict'

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent