This repository is deprecated. Please refer to the updated codebase here: https://github.com/DevSinghSachan/multilingual_nmt
This repository implements the transformer
model in pytorch framework which was introduced in the paper Attention is All you Need as described in their
NIPS 2017 version: https://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf
The overall model architecture is as shown in the figure:
The code in this repository implements the following features:
- Positional Encoding
- Multi-Head Dot-Product Attention
- Positional Attention from "Non-Autoregressive Neural Machine Translation"
- Label Smoothing
- Warm-up steps based training of Adam Optimizer
- Shared weights of embedding and softmax layers
- Beam Search with length normalisation
- Python 3.6
- Pytorch v0.4 (needs manual installation from source https://github.com/pytorch/pytorch)
- torchtext
- numpy
One can install the above packages using the requirements file.
pip install -r requirements.txt
python preprocess.py -i data/ja_en -s-train train-big.ja -t-train train-big.en -s-valid dev.ja -t-valid dev.en -s-test test.ja -t-test test.en --save_data demo
python train.py -i data/ja_en --data demo --wbatchsize 4096 --batchsize 60 --tied --beam_size 5 --epoch 40 --layers 6 --multi_heads 8 --gpu 0
python translate.py -i data/ja_en --data demo --batchsize 60 --beam_size 5 --model_file "results/model.ckpt" --src data/ja_en/test.ja --gpu 0
Dataset Statistics included in data
directory are:
Dataset | Train Set | Dev Set | Test Set |
---|---|---|---|
Japanse-English | 148,850 | 500 | 500 |
IWSLT'15 English-Vietnamese | 133,317 | 1,553 | 1,268 |
IWSLT'16 German-English | 98,132 | 887 | 1,565 |
All the experiments were performed on a modern Titan-Xp GPU with 12GB RAM. BLEU Scores are computed using Beam Search.
Method | Layers | BLEU (dev) | BLEU (test) | Parameters | Words / Sec |
---|---|---|---|---|---|
Transformer (self) | 1 | 33.16 | 36.52 | 32.5 M | 60.1K |
Transformer (self) | 6 | 34.65 | 69.3 M | 15.5K | |
BiLSTM encoder (OpenNMT-py) | 1 | 29.55 | 41.3 M | 31.5K | |
LSTM encoder (OpenNMT-py) | 1 | 30.15 | 41.8 M | 35.5K | |
Transformer (OpenNMT-py) | 1 | 26.83 | 42.3 M | 52.5K | |
BiLSTM encoder (XNMT) | 1 | 29.58 | 31.39 | 9.1K* (Target Words) | |
Transformer (XNMT) | 1 | 25.55 | 2.2K (Target Words) |
*1 epoch get completed in around 180 seconds.
Method | Layers | BLEU (dev) | BLEU (test) | Parameters | Words / Sec |
---|---|---|---|---|---|
Transformer (self) | 1 | 21.96 | 41.2 M | 57.8K | |
Transformer (self) | 2 | 22.96 | 48.5 M | 40.2K | |
BiLSTM encoder (OpenNMT-py) | 1 | 21.99 | 53.5 M | 30.5K | |
LSTM encoder (OpenNMT-py) | 1 | 21.04 | 53.9 M | 29.5K | |
Transformer (OpenNMT-py) | 1 | 19.26 | 55.3 M | 48.5K | |
BiLSTM encoder (XNMT) | 1 | 21.31 | 23.87 | 7.2K (Target Words) | |
Transformer (XNMT) | 1 |
Dataset URL . This dataset exists in tokenised form (using NLTK and lowercase).
Method | Layers | BLEU (dev) | BLEU (test) | Parameters | Words / Sec |
---|---|---|---|---|---|
Transformer (self) | 1 | 21.91 | 54.5 M | 44.5K | |
Transformer (self) | 2 | ||||
BiLSTM encoder (OpenNMT-py) | 1 | 23.10 | 23.71 | 73.7 M | |
LSTM encoder (OpenNMT-py) | 1 | ||||
Transformer (OpenNMT-py) | 1 | ||||
BiLSTM encoder (XNMT) | 1 | 22.87 | 23.43 | 8K | |
Transformer (XNMT) | 1 |
- Thanks to the suggestions from Graham Neubig @gneubig and Matt Sperber @msperber
- The code in this repository was originally based and has been adapted from the Sosuke Kobayashi's implementation in Chainer "https://github.com/soskek/attention_is_all_you_need".
- Some parts of the code were borrowed from XNMT (based on Dynet) and OpenNMT-py (based on Pytorch).