xmunlp / xmunmt Goto Github PK
View Code? Open in Web Editor NEWAn implementation of RNNsearch using TensorFlow
License: BSD 3-Clause "New" or "Revised" License
An implementation of RNNsearch using TensorFlow
License: BSD 3-Clause "New" or "Revised" License
Hi, I tried to run the code with default settings on GTX 1080 Ti, which has a memory of 11 GB, but still got a ResourceExhaustedError
. This is amazing, because RNNSearch is a relatively small model. How could it occupy that much memory?
How much GPU memory (or which GPU) do you use in your experiment?
I'm using tf1.4.0-rc0
with CUDA 8, and the error message looks like:
INFO:tensorflow:rnnsearch/decoder/attention/k_transform/matrix_0 shape (2000, 1000)
INFO:tensorflow:rnnsearch/decoder/attention/logits/matrix_0 shape (1000, 1)
INFO:tensorflow:rnnsearch/decoder/attention/q_transform/matrix_0 shape (1000, 1000)
INFO:tensorflow:rnnsearch/decoder/gru_cell/candidate/bias shape (1000,)
INFO:tensorflow:rnnsearch/decoder/gru_cell/candidate/matrix_0 shape (620, 1000)
...... NMT parameters info ......
INFO:tensorflow:rnnsearch/softmax/bias shape (36166,)
INFO:tensorflow:rnnsearch/softmax/matrix_0 shape (620, 36166)
INFO:tensorflow:rnnsearch/source_embedding/bias shape (620,)
INFO:tensorflow:rnnsearch/source_embedding/embedding shape (36166, 620)
INFO:tensorflow:rnnsearch/target_embedding/bias shape (620,)
INFO:tensorflow:rnnsearch/target_embedding/embedding shape (36166, 620)
INFO:tensorflow:Total trainable variables size: 95822166
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Create EvaluationHook.
INFO:tensorflow:Making dir: train/eval
2017-11-21 22:40:19.596248: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:02:00.0
totalMemory: 10.91GiB freeMemory: 10.75GiB
2017-11-21 22:40:19.596352: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:02:00.0, compute capability: 6.1)
INFO:tensorflow:loss = 9.27618, step = 1, target = [128 48], source = [128 48]
INFO:tensorflow:Saving checkpoints for 1 into train/model.ckpt.
INFO:tensorflow:loss = 8.80296, step = 2, target = [128 24], source = [128 24] (3.997 sec)
INFO:tensorflow:loss = 9.28678, step = 3, target = [128 32], source = [128 32] (0.429 sec)
INFO:tensorflow:loss = 8.62557, step = 4, target = [128 48], source = [128 47] (0.637 sec)
INFO:tensorflow:loss = 8.68332, step = 5, target = [128 24], source = [128 24] (0.306 sec)
INFO:tensorflow:loss = 8.25838, step = 6, target = [128 64], source = [128 64] (0.937 sec)
INFO:tensorflow:loss = 8.36041, step = 7, target = [128 32], source = [128 32] (0.400 sec)
INFO:tensorflow:loss = 7.80727, step = 8, target = [128 16], source = [128 16] (0.218 sec)
INFO:tensorflow:loss = 8.16378, step = 9, target = [128 48], source = [128 48] (0.661 sec)
INFO:tensorflow:loss = 7.60664, step = 10, target = [128 24], source = [128 24] (0.299 sec)
INFO:tensorflow:loss = 7.55223, step = 11, target = [128 32], source = [128 32] (0.417 sec)
INFO:tensorflow:loss = 7.39985, step = 12, target = [128 48], source = [128 48] (0.665 sec)
INFO:tensorflow:loss = 7.14252, step = 13, target = [128 12], source = [128 12] (0.164 sec)
INFO:tensorflow:loss = 7.12088, step = 14, target = [128 24], source = [128 24] (0.306 sec)
INFO:tensorflow:loss = 7.08299, step = 15, target = [128 32], source = [128 32] (0.403 sec)
INFO:tensorflow:loss = 7.25178, step = 16, target = [128 64], source = [128 64] (0.929 sec)
INFO:tensorflow:loss = 6.85371, step = 17, target = [128 16], source = [128 16] (0.200 sec)
INFO:tensorflow:loss = 7.00198, step = 18, target = [128 48], source = [128 48] (0.658 sec)
INFO:tensorflow:loss = 6.67163, step = 19, target = [128 8], source = [128 8] (0.116 sec)
INFO:tensorflow:loss = 6.76071, step = 20, target = [128 24], source = [128 24] (0.323 sec)
INFO:tensorflow:loss = 6.8594, step = 21, target = [128 32], source = [128 32] (0.414 sec)
INFO:tensorflow:loss = 7.01129, step = 22, target = [128 48], source = [128 48] (0.675 sec)
2017-11-21 22:40:51.626475: W tensorflow/core/common_runtime/bfc_allocator.cc:273] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.38GiB. Current allocation summary follows.
..... Many memory footprint info.......
ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[10240,36166]
[[Node: rnnsearch/smoothed_softmax_cross_entropy_with_logits/one_hot = OneHot[T=DT_FLOAT, TI=DT_INT32, axis=-1, _device="/job:localhost/replica:0/task:0/device:GPU:0"](rnnsearch/smoothed_softmax_cross_entropy_with_logits/Reshape/_935, rnnsearch/smoothed_softmax_cross_entropy_with_logits/strided_slice, training/train/beta1, rnnsearch/smoothed_softmax_cross_entropy_with_logits/truediv)]]
[[Node: truediv/_1015 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_4870_truediv", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Did you do this experiment?Could you provide some benchmark result ?Thanks very much.
I've read through a large part of the source code, and find it to be well-organized and easy to read.
But I still have a few questions. Could you please verify them?
1, Is the following comment correct?
XMUNMT/xmunmt/layers/attention.py
Line 76 in ed7ae9a
hidden
in L74 is [batch_size * mem_size, hidden_size], logits
in L77 is of shape [batch_size * mem_size, 1], logits
in L78 is of shape [batch_size, mem_size], and no variable has a shape of [batch_size, mem_size, 1]. Or am I getting it wrong?
2, In
XMUNMT/xmunmt/models/rnnsearch.py
Line 280 in ed7ae9a
Hi,
I'm wondering if this reproduce could achieve similar performance with the original RNNSearch implementation (GroundHog). Is there any benchmark result?
运行时出现各种问题、这是怎么回事啊!根本运行不了。
File "trainer.py", line 405, in
main(parse_args())
File "trainer.py", line 286, in main
collect_params(params, model_cls.get_parameters())
File "trainer.py", line 138, in collect_params
collected.add_hparam(k, getattr(all_params, str(k)))
AttributeError: 'HParams' object has no attribute '('rnn_cell', 'LegacyGRUCell')'
Hi @XMU-NLPLAB
I am trying to run your code but I cannot find the dataset or any download link. I've tried the openmt15 dataset but it seems the official registration and data-download link is not available now.
BTW, in the command
python preprocess.py -d vocab.zh.pkl -v 30000 -b bintext.zh.pkl -p zh.txt
What do vocab.zh.pkl
, bintext.zh.pkl
and zh.txt
represent respectively?
I am a beginner in NMT. Can you offer any information or resources? That would be really help!
The instructions in readme.md describes how to train an English-to-German translation model and apply it on test data. But how did you evaluate the result?
This is what I did:
src=en
tgt=de
# Merge subwords
sed -r 's/(@@ )|(@@ ?$)//g' $nmt_output_dir/test.txt > $nmt_output_dir/test.merged-subwords.txt
# Detruecase NMT outputs
$moses_scripts/recaser/detruecase.perl < $nmt_output_dir/test.merged-subwords.txt > $nmt_output_dir/test.merged-bpe32k.detc
# Detokenize
$moses_scripts/tokenizer/detokenizer.perl -l $tgt < $nmt_output_dir/test.merged-bpe32k.detc > $nmt_output_dir/test.merged-bpe32k.txt
# Evaluation
# Method II: using mteval
# wrap up outputs with SGML format
$moses_scripts/ems/support/wrap-xml.perl $tgt $test_sgm_dir/newstest2017-$src$tgt-src.$src.sgm < $nmt_output_dir/test.merged-bpe32k.txt > $nmt_output_dir/test.merged-bpe32k.sgm
$moses_scripts/generic/mteval-v14.pl -r $test_sgm_dir/newstest2017-$src$tgt-ref.$tgt.sgm -s $test_sgm_dir/newstest2017-$src$tgt-src.$src.sgm -t $nmt_output_dir/test.merged-bpe32k.sgm > mteval-result.txt
I used the default hyper-parameters to train the model (except for batch_size=80), and got a BLEU of 22.47 only:
Evaluation of any-to-de translation using:
src set "newstest2017" (130 docs, 3004 segs)
ref set "newstest2017" (1 refs)
tst set "newstest2017" (1 systems)
length ratio: 1.01165010524255 (62001/61287), penalty (log): 0
NIST score = 6.6085 BLEU score = 0.2247 for system "Edinburgh"
# ------------------------------------------------------------------------
Individual N-gram scoring
1-gram 2-gram 3-gram 4-gram 5-gram 6-gram 7-gram 8-gram 9-gram
------ ------ ------ ------ ------ ------ ------ ------ ------
NIST: 5.0732 1.2981 0.2044 0.0291 0.0037 0.0007 0.0001 0.0000 0.0000 "Edinburgh"
BLEU: 0.5593 0.2821 0.1633 0.0989 0.0616 0.0388 0.0250 0.0164 0.0108 "Edinburgh"
# ------------------------------------------------------------------------
Cumulative N-gram scoring
1-gram 2-gram 3-gram 4-gram 5-gram 6-gram 7-gram 8-gram 9-gram
------ ------ ------ ------ ------ ------ ------ ------ ------
NIST: 5.0732 6.3713 6.5757 6.6048 6.6085 6.6092 6.6094 6.6094 6.6094 "Edinburgh"
BLEU: 0.5593 0.3972 0.2954 0.2247 0.1735 0.1352 0.1062 0.0841 0.0669 "Edinburgh"
What could be wrong?
And according to my experience in NMT, a BLEU score of 30 is kind of high for English-to-German translation system on newstest2017 data. For example, in this work, the English -> German system just got a BLEU of <26. And the winner of WMT'17 only get a BLEU of 28.3, see http://matrix.statmt.org/.
In utils/parallel.py
, the function parallel_model
is a wrapper which handles multiple computing devices.
But, if I understand it correctly, it achieves the following effect:
Suppose function model_fn
returns k scalars (maybe multiple losses or metrics, e.g.: accuracy), i.e.: a tuple (o1, o2, ..., ok),
And m devices are available: d1, d2, ..., dm.
Then the return value of the function is of shape:
You see, in the second case, the return value is weird. Say, if my model_fn
has 2 return values, in the multiple-device cases, I can use sharded_loss1, sharded_loss2 = parallel.parallel_model(fn, features, device_list)
to catch these two losses; but if I only specify a single device from command line, the code breaks.
Certainly, I can judge isinstance(return_value_from_parallel_model, tuple)
and decide how to deal with the return value, but this is stupid. It would be better to return ([o1_d1], [o2_d1], ..., [ok_d1]), i.e.: a tuple of lists, in the "multiple return values + single device" case, which leads to a more consistent design.
Hope I've made myself clear.
Hi, I saw you add dropout layer after word embedding, which was not mentioned in rnnsearch paper "Neural Machine Translation by Jointly Learning to Align and Translate". Does this trick improve some performance? Is this implemented in vanilla theano version groundhog?
Thanks!
In
XMUNMT/xmunmt/layers/attention.py
Line 22 in 0e1539b
Hi, is it possible to share some pre-trained models (checkpoints)? Thanks.
In
Line 309 in 84d2d70
Ideally, it should be a weighted average of all losses, and the weights are number of valid tokens in each device, shouldn't it?
This code runs very well.
But when I try to adapt it and implement VNMT, it converges very slow. All I did is to feed the average of encoder hidden states to decoder GRU cells (and several tranforming matrices).
I think I should adjust hyperparameters for my new model, and found in your code the default initializer is random_uniform and has range of [-0.08, 0.08]. At least it's not Xavier initialization. I don't really get it.
Does it have a theoretical basis? Or you just get the hyperparameter by trials and errors?
Hi,@Playinf
I want to do experiments on GAN neural machine translation, but I don't know whether this model can be implemented and whether the code needs to be modified?
I am a beginner in NMT. Hope to give answers or suggestions.That would be really help!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.