Giter Club home page Giter Club logo

dl4mt-multi's Introduction

Multi-Way Neural Machine Translation

This repo implements multi-way Neural Machine Translation described in the paper "Multi-way Multilingual Neural Machine Translation with a Shared Attention Mechanism". In NAACL,2016.

With this repo, you can build a multi-encoder, multi-decoder or a multi-way NMT model. When you reduce the number of encoders and decoders to one respectively, you basically retain a single-pair NMT model with attention mechanism.

Dependencies:

The code consists of three major components for dependencies:

  1. Core computational graphs (Theano)
  2. Data streams (Fuel)
  3. Training loop and extensions (Blocks)

Please use setup.sh for setting up your development environment.

Navigation:

The core computational graphs are written using pure Theano, and based on the implementations in dl4mt-tutorial.

We refer each source-target pair a computational graph, since we build an actual separate computational graph for each of them, where some of the parameters in these computational graphs are shared with other computational graphs.

In order to train multiple computational graphs, we need multiple data-streams, and a scheduler over them. This part is handled by Fuel and custom streams, along with development and test decoding streams.

Given the computational graphs and their corresponding data streams, training the parameters in the computational graphs is carried out by adapted training loop from Blocks.

Finally, this codebase is a refined combination of multiple codebases. The layer structure and handling of parameters are somehow similar to dl4mt-tutorial. The class hierarchy and experiment configuration resembles a pruned version of GroundHog and main-loop and extensions are quite similar to blocks-examples.

During the development of this codebase, we tried to be pragmatic and inherit the lessons learned from other NMT implementations, hope we picked the best parts not the worst ๐Ÿ˜Œ

Preparing Text Corpora:

The original text corpora could be downloaded from here.

In this repo, we do not handle downloading the data and tokenizing it. Please follow the steps described in dl4mt-tutorial for downloading and tokenization of the data. Once you've downloaded and tokenized the data, you can use scripts/encode_with_bpe_parallel.sh and scripts/encode_with_bpe_joint.sh to use sub-word units as input and output tokens (check scripts for details).

dl4mt-multi's People

Contributors

jayparks avatar orhanf avatar kyunghyuncho avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.