Giter Club home page Giter Club logo

xmunmt's People

Contributors

playinf avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

xmunmt's Issues

What's the minimum memory requirement for the default configuration?

Hi, I tried to run the code with default settings on GTX 1080 Ti, which has a memory of 11 GB, but still got a ResourceExhaustedError. This is amazing, because RNNSearch is a relatively small model. How could it occupy that much memory?
How much GPU memory (or which GPU) do you use in your experiment?
I'm using tf1.4.0-rc0 with CUDA 8, and the error message looks like:

INFO:tensorflow:rnnsearch/decoder/attention/k_transform/matrix_0                                        shape    (2000, 1000)
INFO:tensorflow:rnnsearch/decoder/attention/logits/matrix_0                                             shape    (1000, 1)
INFO:tensorflow:rnnsearch/decoder/attention/q_transform/matrix_0                                        shape    (1000, 1000)
INFO:tensorflow:rnnsearch/decoder/gru_cell/candidate/bias                                               shape    (1000,)
INFO:tensorflow:rnnsearch/decoder/gru_cell/candidate/matrix_0                                           shape    (620, 1000)

...... NMT parameters info ......

INFO:tensorflow:rnnsearch/softmax/bias                                                                  shape    (36166,)    
INFO:tensorflow:rnnsearch/softmax/matrix_0                                                              shape    (620, 36166)
INFO:tensorflow:rnnsearch/source_embedding/bias                                                         shape    (620,)      
INFO:tensorflow:rnnsearch/source_embedding/embedding                                                    shape    (36166, 620)
INFO:tensorflow:rnnsearch/target_embedding/bias                                                         shape    (620,)      
INFO:tensorflow:rnnsearch/target_embedding/embedding                                                    shape    (36166, 620)

INFO:tensorflow:Total trainable variables size: 95822166
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Create EvaluationHook.
INFO:tensorflow:Making dir: train/eval
2017-11-21 22:40:19.596248: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:02:00.0
totalMemory: 10.91GiB freeMemory: 10.75GiB
2017-11-21 22:40:19.596352: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:02:00.0, compute capability: 6.1)
INFO:tensorflow:loss = 9.27618, step = 1, target = [128  48], source = [128  48]
INFO:tensorflow:Saving checkpoints for 1 into train/model.ckpt.
INFO:tensorflow:loss = 8.80296, step = 2, target = [128  24], source = [128  24] (3.997 sec)
INFO:tensorflow:loss = 9.28678, step = 3, target = [128  32], source = [128  32] (0.429 sec)
INFO:tensorflow:loss = 8.62557, step = 4, target = [128  48], source = [128  47] (0.637 sec)
INFO:tensorflow:loss = 8.68332, step = 5, target = [128  24], source = [128  24] (0.306 sec)
INFO:tensorflow:loss = 8.25838, step = 6, target = [128  64], source = [128  64] (0.937 sec)
INFO:tensorflow:loss = 8.36041, step = 7, target = [128  32], source = [128  32] (0.400 sec)
INFO:tensorflow:loss = 7.80727, step = 8, target = [128  16], source = [128  16] (0.218 sec)
INFO:tensorflow:loss = 8.16378, step = 9, target = [128  48], source = [128  48] (0.661 sec)
INFO:tensorflow:loss = 7.60664, step = 10, target = [128  24], source = [128  24] (0.299 sec)
INFO:tensorflow:loss = 7.55223, step = 11, target = [128  32], source = [128  32] (0.417 sec)
INFO:tensorflow:loss = 7.39985, step = 12, target = [128  48], source = [128  48] (0.665 sec)
INFO:tensorflow:loss = 7.14252, step = 13, target = [128  12], source = [128  12] (0.164 sec)
INFO:tensorflow:loss = 7.12088, step = 14, target = [128  24], source = [128  24] (0.306 sec)
INFO:tensorflow:loss = 7.08299, step = 15, target = [128  32], source = [128  32] (0.403 sec)
INFO:tensorflow:loss = 7.25178, step = 16, target = [128  64], source = [128  64] (0.929 sec)
INFO:tensorflow:loss = 6.85371, step = 17, target = [128  16], source = [128  16] (0.200 sec)
INFO:tensorflow:loss = 7.00198, step = 18, target = [128  48], source = [128  48] (0.658 sec)
INFO:tensorflow:loss = 6.67163, step = 19, target = [128   8], source = [128   8] (0.116 sec)
INFO:tensorflow:loss = 6.76071, step = 20, target = [128  24], source = [128  24] (0.323 sec)
INFO:tensorflow:loss = 6.8594, step = 21, target = [128  32], source = [128  32] (0.414 sec)
INFO:tensorflow:loss = 7.01129, step = 22, target = [128  48], source = [128  48] (0.675 sec)
2017-11-21 22:40:51.626475: W tensorflow/core/common_runtime/bfc_allocator.cc:273] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.38GiB.  Current allocation summary follows.

..... Many memory footprint info.......

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[10240,36166]
         [[Node: rnnsearch/smoothed_softmax_cross_entropy_with_logits/one_hot = OneHot[T=DT_FLOAT, TI=DT_INT32, axis=-1, _device="/job:localhost/replica:0/task:0/device:GPU:0"](rnnsearch/smoothed_softmax_cross_entropy_with_logits/Reshape/_935, rnnsearch/smoothed_softmax_cross_entropy_with_logits/strided_slice, training/train/beta1, rnnsearch/smoothed_softmax_cross_entropy_with_logits/truediv)]]
         [[Node: truediv/_1015 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_4870_truediv", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Questions about source code

I've read through a large part of the source code, and find it to be well-organized and easy to read.
But I still have a few questions. Could you please verify them?

1, Is the following comment correct?

# Shape: [batch, mem_size, 1]

As far as I understand, the shape of hidden in L74 is [batch_size * mem_size, hidden_size], logits in L77 is of shape [batch_size * mem_size, 1], logits in L78 is of shape [batch_size, mem_size], and no variable has a shape of [batch_size, mem_size, 1]. Or am I getting it wrong?

2, In

# Special case for non-incremental decoding
, should it be incremental decoding? In this case, the decoder just run for one step.

Any benchmark result?

Hi,
I'm wondering if this reproduce could achieve similar performance with the original RNNSearch implementation (GroundHog). Is there any benchmark result?

我配置好环境了,

运行时出现各种问题、这是怎么回事啊!根本运行不了。
File "trainer.py", line 405, in
main(parse_args())
File "trainer.py", line 286, in main
collect_params(params, model_cls.get_parameters())
File "trainer.py", line 138, in collect_params
collected.add_hparam(k, getattr(all_params, str(k)))
AttributeError: 'HParams' object has no attribute '('rnn_cell', 'LegacyGRUCell')'

Where can I download the dataset and how to prepare training corpus exactly?

Hi @XMU-NLPLAB

I am trying to run your code but I cannot find the dataset or any download link. I've tried the openmt15 dataset but it seems the official registration and data-download link is not available now.
BTW, in the command
python preprocess.py -d vocab.zh.pkl -v 30000 -b bintext.zh.pkl -p zh.txt
What do vocab.zh.pkl, bintext.zh.pkl and zh.txt represent respectively?

I am a beginner in NMT. Can you offer any information or resources? That would be really help!

What's the exact command/dataset to reproduce a BLEU of 30.42 on test set?

The instructions in readme.md describes how to train an English-to-German translation model and apply it on test data. But how did you evaluate the result?

This is what I did:

src=en
tgt=de

# Merge subwords
sed -r 's/(@@ )|(@@ ?$)//g' $nmt_output_dir/test.txt > $nmt_output_dir/test.merged-subwords.txt

# Detruecase NMT outputs
$moses_scripts/recaser/detruecase.perl < $nmt_output_dir/test.merged-subwords.txt > $nmt_output_dir/test.merged-bpe32k.detc

# Detokenize
$moses_scripts/tokenizer/detokenizer.perl -l $tgt < $nmt_output_dir/test.merged-bpe32k.detc > $nmt_output_dir/test.merged-bpe32k.txt

# Evaluation
# Method II: using mteval
# wrap up outputs with SGML format
$moses_scripts/ems/support/wrap-xml.perl $tgt $test_sgm_dir/newstest2017-$src$tgt-src.$src.sgm < $nmt_output_dir/test.merged-bpe32k.txt > $nmt_output_dir/test.merged-bpe32k.sgm

$moses_scripts/generic/mteval-v14.pl -r $test_sgm_dir/newstest2017-$src$tgt-ref.$tgt.sgm -s $test_sgm_dir/newstest2017-$src$tgt-src.$src.sgm -t $nmt_output_dir/test.merged-bpe32k.sgm > mteval-result.txt

I used the default hyper-parameters to train the model (except for batch_size=80), and got a BLEU of 22.47 only:

 Evaluation of any-to-de translation using:
    src set "newstest2017" (130 docs, 3004 segs)
    ref set "newstest2017" (1 refs)
    tst set "newstest2017" (1 systems)

length ratio: 1.01165010524255 (62001/61287), penalty (log): 0
NIST score = 6.6085  BLEU score = 0.2247 for system "Edinburgh"

# ------------------------------------------------------------------------

Individual N-gram scoring
        1-gram   2-gram   3-gram   4-gram   5-gram   6-gram   7-gram   8-gram   9-gram
        ------   ------   ------   ------   ------   ------   ------   ------   ------
 NIST:  5.0732   1.2981   0.2044   0.0291   0.0037   0.0007   0.0001   0.0000   0.0000  "Edinburgh"

 BLEU:  0.5593   0.2821   0.1633   0.0989   0.0616   0.0388   0.0250   0.0164   0.0108  "Edinburgh"

# ------------------------------------------------------------------------
Cumulative N-gram scoring
        1-gram   2-gram   3-gram   4-gram   5-gram   6-gram   7-gram   8-gram   9-gram
        ------   ------   ------   ------   ------   ------   ------   ------   ------
 NIST:  5.0732   6.3713   6.5757   6.6048   6.6085   6.6092   6.6094   6.6094   6.6094  "Edinburgh"

 BLEU:  0.5593   0.3972   0.2954   0.2247   0.1735   0.1352   0.1062   0.0841   0.0669  "Edinburgh"

What could be wrong?
And according to my experience in NMT, a BLEU score of 30 is kind of high for English-to-German translation system on newstest2017 data. For example, in this work, the English -> German system just got a BLEU of <26. And the winner of WMT'17 only get a BLEU of 28.3, see http://matrix.statmt.org/.

feat: return value of parallel_model is unnatural

In utils/parallel.py, the function parallel_model is a wrapper which handles multiple computing devices.
But, if I understand it correctly, it achieves the following effect:
Suppose function model_fn returns k scalars (maybe multiple losses or metrics, e.g.: accuracy), i.e.: a tuple (o1, o2, ..., ok),
And m devices are available: d1, d2, ..., dm.
Then the return value of the function is of shape:

  1. multiple return values + multiple devices: ([o1_d1, o1_d2, ..., o1_dm], [o2_d1, o2_d2, ..., o2_dm], ..., [ok_d1, ok_d2, ..., ok_dm]), i.e.: a tuple of lists;
  2. multiple return values + single device: [(o1_d1, o2_d1, ..., ok_d1)], i.e. a length-1 list of tuple
  3. single return value + multiple devices: [o1_d1, o1_d2, ..., o1_dm]
  4. single return value + single device: [o1_d1]

You see, in the second case, the return value is weird. Say, if my model_fn has 2 return values, in the multiple-device cases, I can use sharded_loss1, sharded_loss2 = parallel.parallel_model(fn, features, device_list) to catch these two losses; but if I only specify a single device from command line, the code breaks.
Certainly, I can judge isinstance(return_value_from_parallel_model, tuple) and decide how to deal with the return value, but this is stupid. It would be better to return ([o1_d1], [o2_d1], ..., [ok_d1]), i.e.: a tuple of lists, in the "multiple return values + single device" case, which leads to a more consistent design.

Hope I've made myself clear.

"casual" or "causal"?

In

if mode == "casual":
, there is a string literal "casual"(随意的), do you mean "causal"(因果性的)?
I guess this mode is designed to prevent network from seeing future information, so it might be "causal" instead of "casual."

pre-trained models?

Hi, is it possible to share some pre-trained models (checkpoints)? Thanks.

Loss with multiple GPUs

In

loss = tf.add_n(sharded_losses) / len(sharded_losses)
, it's simply an arithmetic average of losses collected from different devices.
This code is perfect for a single-device setting. But if you have multiple devices and different devices afford different amount of computation in a mini-batch (say, device 0 deals with shorter sentences, while device 1 faces longer ones), the loss calculated in this way will be biased. (Nevertheless, this carelessness should not introduce too much impact, unless the data distribution is extremely unbalanced.)

Ideally, it should be a weighted average of all losses, and the weights are number of valid tokens in each device, shouldn't it?

Why initialize weights from Uniform[-0.08, 0.08]?

This code runs very well.
But when I try to adapt it and implement VNMT, it converges very slow. All I did is to feed the average of encoder hidden states to decoder GRU cells (and several tranforming matrices).
I think I should adjust hyperparameters for my new model, and found in your code the default initializer is random_uniform and has range of [-0.08, 0.08]. At least it's not Xavier initialization. I don't really get it.
Does it have a theoretical basis? Or you just get the hyperparameter by trials and errors?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.