Giter Club home page Giter Club logo

zero's Introduction

Zero

A neural machine translation system implemented by python2 + tensorflow.

Features

  1. Multi-Process Data Loading/Processing (Problems Exist)
  2. Multi-GPU Training/Decoding
  3. Gradient Aggregation

Papers

We associate each paper below with a readme file link. Please click the paper link you are interested for more details.

Supported Models

Requirements

  • python2.7
  • tensorflow <= 1.13.2

Usage

How to use this toolkit for machine translation?

TODO:

  1. organize the parameters and interpretations in config.
  2. reformat and fulfill code comments
  3. simplify and remove unecessary coding
  4. improve rnn models

Citation

If you use the source code, please consider citing the follow paper:

@InProceedings{D18-1459,
  author = 	"Zhang, Biao
		and Xiong, Deyi
		and su, jinsong
		and Lin, Qian
		and Zhang, Huiji",
  title = 	"Simplifying Neural Machine Translation with Addition-Subtraction Twin-Gated Recurrent Networks",
  booktitle = 	"Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing",
  year = 	"2018",
  publisher = 	"Association for Computational Linguistics",
  pages = 	"4273--4283",
  location = 	"Brussels, Belgium",
  url = 	"http://aclweb.org/anthology/D18-1459"
}

If you are interested in the CAEncoder model, please consider citing our TASLP paper:

@article{Zhang:2017:CRE:3180104.3180106,
 author = {Zhang, Biao and Xiong, Deyi and Su, Jinsong and Duan, Hong},
 title = {A Context-Aware Recurrent Encoder for Neural Machine Translation},
 journal = {IEEE/ACM Trans. Audio, Speech and Lang. Proc.},
 issue_date = {December 2017},
 volume = {25},
 number = {12},
 month = dec,
 year = {2017},
 issn = {2329-9290},
 pages = {2424--2432},
 numpages = {9},
 url = {https://doi.org/10.1109/TASLP.2017.2751420},
 doi = {10.1109/TASLP.2017.2751420},
 acmid = {3180106},
 publisher = {IEEE Press},
 address = {Piscataway, NJ, USA},
}

Reference

When developing this repository, I referred to the following projects:

Contact

For any questions or suggestions, please feel free to contact Biao Zhang

zero's People

Contributors

bzhanggo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

zero's Issues

Vanishing Gradient Analysis

Hi Biao,

is there a convenient way to reproduce Table 1 from your paper "Improving Deep Transformer with Depth-Scaled Initialization and Merged Attention"?

Thanks a lot!

Cheers,
Stephan

About reproducing the results on ACL 2020 paper

Hi,

I'm trying to reproduce the results of your ACL 2020 paper "Improving Massively Multilingual Neural Machine Translation and Zero-Shot Translation".
I follow the instruction here
https://github.com/bzhangGo/zero/tree/master/docs/multilingual_laln_lalt

I use example_train.sh example_evaluation.sh and My experiments are conducted on the one-to-many data.

However, I find my reproduced results are far from what you reported in your paper, about 10 bleu score lower.

Do you have any idea what causes this drop? Thanks.

Here is my training script.
image

Unable to train the model for En-De translation

Hi I tried the steps presented here completely until I train the model with the command
python zero/config.py --mode train --parameters=hidden_size=1024,embed_size=512,dropout=0.1,label_smooth=0.1,max_len=80,batch_size=80,eval_batch_size=240,token_size=3000,batch_or_token='token',model_name="rnnsearch",buffer_size=3200,clip_grad_norm=5.0,lrate=5e-4,epoches=25,update_cycle=8,gpus=[8],disp_freq=100,eval_freq=10000,sample_freq=1000,checkpoints=5,caencoder=True,cell='atr',max_training_steps=100000000,nthreads=8,swap_memory=True,layer_norm=True,max_queue_size=100,random_seed=1234,src_vocab_file="$data_dir/vocab.en",tgt_vocab_file="$data_dir/vocab.de",src_train_file="$data_dir/train.40k.en",tgt_train_file="$data_dir/train.40k.de",src_dev_file="$data_dir/dev.40k.en",tgt_dev_file="$data_dir/dev.40k.de",src_test_file="",tgt_test_file="",output_dir="train",test_output=""

But I get the following error and training doesn't start

Traceback (most recent call last):
  File "zero/config.py", line 223, in <module>
    tf.app.run()
  File "/miniconda3/envs/myenv_s_m/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/miniconda3/envs/myenv_s_m/lib/python3.6/site-packages/absl/app.py", line 300, in run
    _run_main(main, args)
  File "/miniconda3/envs/myenv_s_m/lib/python3.6/site-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "zero/config.py", line 215, in main
    graph.train(params)
  File "/Sugeeth/translation_from_atr/zero/main.py", line 108, in train
    loss, gradients = tower_train_graph(train_features, optimizer, params)
  File "/Sugeeth/translation_from_atr/zero/main.py", line 35, in tower_train_graph
    params.gpus, use_cpu=(len(params.gpus) == 0))
  File "/Sugeeth/translation_from_atr/zero/utils/parallel.py", line 179, in parallel_model
    features, device_mask = shard_features(features, num_devices)
  File "/Sugeeth/translation_from_atr/zero/utils/parallel.py", line 132, in shard_features
    pieces = util.uniform_splits(tf.shape(features.values()[0])[0],
TypeError: 'dict_values' object does not support indexing

Did I miss something ?

Fuse mask implementation

Hi,

The mask implementation for aan seems to be different to what you describe in the paper
https://arxiv.org/pdf/1805.00631.pdf?
Specifically in Figure 3.

Based on Figure 3, I would think the mask computation for sequence length L would be as below:

mask = tf.matrix_band_part(tf.ones([L, L]), -1, 0)
factor = tf.reciprocal(tf.cumsum(tf.ones_like(mask)))
masked_mask = factor * mask
masked_mask = tf.expand_dims(masked_mask, 0)

The mask computation you did at the following location seems to completely different, in that there is a gate and softmax computation.

zero/func.py

Line 390 in 7bb3250

elif mode == "aan":

Speech data preprocessing

Hi

I'm using your AFS ST model as a baseline for my thesis project. I'm trying to run the speech preprocessing script, but the tensorflow.contrib package has been deprecated in tensorflow 2.0. Any suggestions on how to adapt your code?

Thank you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.