Comments (12)
@zhangpengGenedock use small batch_size。
from nmt.
I download the tmp.zip file, and there is nothing in it?
from nmt.
@mingfengwuye tmp.zip is updated.
from nmt.
@hxsnow10 I have download the tmp.zip file and begin to train model untill now It works well. The final result will be next week. One more question, how could get the vocab.zh or vocab.en file. Before, I get vocab follow the wmt16_en_de.sh, but it seems different with that.
from nmt.
@mingfengwuye some strange, try use my update bigger data, I make sure i meet Segmentation fault this time without anychange to code. my vocab is count from en-zh corpus myself.
By the way, my system is centos7, tensorflow version 1.2.1, i run on cpu, memory is enough.
python -m nmt.nmt --src=zh --tgt=en --vocab_prefix=/tmp/nmt_data/vocab --train_prefix=/tmp/nmt_data/dev3 --dev_prefix=/tmp/nmt_data/dev3 --test_prefix=/tmp/nmt_data/dev3 --out_dir=/tmp/nmt_model_zh2en --num_train_steps=12000 --steps_per_stats=100 --num_layers=2 --num_units=128 --dropout=0.2 --metrics=ble
# Job id 0
# hparams:
src=zh
tgt=en
train_prefix=/tmp/nmt_data/dev3
dev_prefix=/tmp/nmt_data/dev3
test_prefix=/tmp/nmt_data/dev3
out_dir=/tmp/nmt_model_zh2en
# Vocab file /tmp/nmt_data/vocab.zh exists
# Vocab file /tmp/nmt_data/vocab.en exists
saving hparams to /tmp/nmt_model_zh2en/hparams
saving hparams to /tmp/nmt_model_zh2en/best_ble/hparams
attention=
attention_architecture=standard
batch_size=128
beam_width=0
best_ble=0
best_ble_dir=/tmp/nmt_model_zh2en/best_ble
bpe_delimiter=None
colocate_gradients_with_ops=True
decay_factor=0.98
decay_steps=10000
dev_prefix=/tmp/nmt_data/dev3
dropout=0.2
encoder_type=uni
eos=</s>
epoch_step=0
forget_bias=1.0
infer_batch_size=32
init_weight=0.1
learning_rate=1.0
length_penalty_weight=0.0
log_device_placement=False
max_gradient_norm=5.0
max_train=0
metrics=['ble']
num_buckets=5
num_gpus=1
num_layers=2
num_residual_layers=0
num_train_steps=12000
num_units=128
optimizer=sgd
out_dir=/tmp/nmt_model_zh2en
pass_hidden_state=True
random_seed=None
residual=False
share_vocab=False
sos=<s>
source_reverse=False
src=zh
src_max_len=50
src_max_len_infer=None
src_vocab_file=/tmp/nmt_data/vocab.zh
src_vocab_size=459879
start_decay_step=0
steps_per_external_eval=None
steps_per_stats=100
test_prefix=/tmp/nmt_data/dev3
tgt=en
tgt_max_len=50
tgt_max_len_infer=None
tgt_vocab_file=/tmp/nmt_data/vocab.en
tgt_vocab_size=570651
time_major=True
train_prefix=/tmp/nmt_data/dev3
unit_type=lstm
vocab_prefix=/tmp/nmt_data/vocab
# creating train graph ...
num_layers = 2, num_residual_layers=0
cell 0 LSTM, forget_bias=1 DropoutWrapper, dropout=0.2 DeviceWrapper, device=/gpu:0
cell 1 LSTM, forget_bias=1 DropoutWrapper, dropout=0.2 DeviceWrapper, device=/gpu:0
cell 0 LSTM, forget_bias=1 DropoutWrapper, dropout=0.2 DeviceWrapper, device=/gpu:0
cell 1 LSTM, forget_bias=1 DropoutWrapper, dropout=0.2 DeviceWrapper, device=/gpu:0
start_decay_step=0, learning_rate=1, decay_steps 10000,decay_factor 0.98
# Trainable variables
embeddings/encoder/embedding_encoder:0, (459879, 128),
embeddings/decoder/embedding_decoder:0, (570651, 128),
dynamic_seq2seq/encoder/rnn/multi_rnn_cell/cell_0/basic_lstm_cell/kernel:0, (256, 512), /device:GPU:0
dynamic_seq2seq/encoder/rnn/multi_rnn_cell/cell_0/basic_lstm_cell/bias:0, (512,), /device:GPU:0
dynamic_seq2seq/encoder/rnn/multi_rnn_cell/cell_1/basic_lstm_cell/kernel:0, (256, 512), /device:GPU:0
dynamic_seq2seq/encoder/rnn/multi_rnn_cell/cell_1/basic_lstm_cell/bias:0, (512,), /device:GPU:0
dynamic_seq2seq/decoder/multi_rnn_cell/cell_0/basic_lstm_cell/kernel:0, (256, 512), /device:GPU:0
dynamic_seq2seq/decoder/multi_rnn_cell/cell_0/basic_lstm_cell/bias:0, (512,), /device:GPU:0
dynamic_seq2seq/decoder/multi_rnn_cell/cell_1/basic_lstm_cell/kernel:0, (256, 512), /device:GPU:0
dynamic_seq2seq/decoder/multi_rnn_cell/cell_1/basic_lstm_cell/bias:0, (512,), /device:GPU:0
dynamic_seq2seq/decoder/output_projection/kernel:0, (128, 570651), /device:GPU:0
# creating eval graph ...
num_layers = 2, num_residual_layers=0
cell 0 LSTM, forget_bias=1 DeviceWrapper, device=/gpu:0
cell 1 LSTM, forget_bias=1 DeviceWrapper, device=/gpu:0
cell 0 LSTM, forget_bias=1 DeviceWrapper, device=/gpu:0
cell 1 LSTM, forget_bias=1 DeviceWrapper, device=/gpu:0
start_decay_step=0, learning_rate=1, decay_steps 10000,decay_factor 0.98
# Trainable variables
embeddings/encoder/embedding_encoder:0, (459879, 128),
embeddings/decoder/embedding_decoder:0, (570651, 128),
dynamic_seq2seq/encoder/rnn/multi_rnn_cell/cell_0/basic_lstm_cell/kernel:0, (256, 512), /device:GPU:0
dynamic_seq2seq/encoder/rnn/multi_rnn_cell/cell_0/basic_lstm_cell/bias:0, (512,), /device:GPU:0
dynamic_seq2seq/encoder/rnn/multi_rnn_cell/cell_1/basic_lstm_cell/kernel:0, (256, 512), /device:GPU:0
dynamic_seq2seq/encoder/rnn/multi_rnn_cell/cell_1/basic_lstm_cell/bias:0, (512,), /device:GPU:0
dynamic_seq2seq/decoder/multi_rnn_cell/cell_0/basic_lstm_cell/kernel:0, (256, 512), /device:GPU:0
dynamic_seq2seq/decoder/multi_rnn_cell/cell_0/basic_lstm_cell/bias:0, (512,), /device:GPU:0
dynamic_seq2seq/decoder/multi_rnn_cell/cell_1/basic_lstm_cell/kernel:0, (256, 512), /device:GPU:0
dynamic_seq2seq/decoder/multi_rnn_cell/cell_1/basic_lstm_cell/bias:0, (512,), /device:GPU:0
dynamic_seq2seq/decoder/output_projection/kernel:0, (128, 570651), /device:GPU:0
# creating infer graph ...
num_layers = 2, num_residual_layers=0
cell 0 LSTM, forget_bias=1 DeviceWrapper, device=/gpu:0
cell 1 LSTM, forget_bias=1 DeviceWrapper, device=/gpu:0
cell 0 LSTM, forget_bias=1 DeviceWrapper, device=/gpu:0
cell 1 LSTM, forget_bias=1 DeviceWrapper, device=/gpu:0
start_decay_step=0, learning_rate=1, decay_steps 10000,decay_factor 0.98
# Trainable variables
embeddings/encoder/embedding_encoder:0, (459879, 128),
embeddings/decoder/embedding_decoder:0, (570651, 128),
dynamic_seq2seq/encoder/rnn/multi_rnn_cell/cell_0/basic_lstm_cell/kernel:0, (256, 512), /device:GPU:0
dynamic_seq2seq/encoder/rnn/multi_rnn_cell/cell_0/basic_lstm_cell/bias:0, (512,), /device:GPU:0
dynamic_seq2seq/encoder/rnn/multi_rnn_cell/cell_1/basic_lstm_cell/kernel:0, (256, 512), /device:GPU:0
dynamic_seq2seq/encoder/rnn/multi_rnn_cell/cell_1/basic_lstm_cell/bias:0, (512,), /device:GPU:0
dynamic_seq2seq/decoder/multi_rnn_cell/cell_0/basic_lstm_cell/kernel:0, (256, 512), /device:GPU:0
dynamic_seq2seq/decoder/multi_rnn_cell/cell_0/basic_lstm_cell/bias:0, (512,), /device:GPU:0
dynamic_seq2seq/decoder/multi_rnn_cell/cell_1/basic_lstm_cell/kernel:0, (256, 512), /device:GPU:0
dynamic_seq2seq/decoder/multi_rnn_cell/cell_1/basic_lstm_cell/bias:0, (512,), /device:GPU:0
dynamic_seq2seq/decoder/output_projection/kernel:0, (128, 570651),
# log_file=/tmp/nmt_model_zh2en/log_1500642274
2017-07-21 21:04:34.111711: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-07-21 21:04:34.111760: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-07-21 21:04:34.111771: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-07-21 21:04:34.111780: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-07-21 21:04:34.111790: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
created train model with fresh parameters, time 0.00s.
2017-07-21 21:04:34.636797: I tensorflow/core/common_runtime/simple_placer.cc:675] Ignoring device specification /job:localhost/replica:0/task:0/device:GPU:0 for node 'gradients/dynamic_seq2seq/decoder/decoder/while/TensorArrayWrite/TensorArrayWriteV3_grad/TensorArrayGrad/TensorArrayGradV3/Enter' because the input edge from 'dynamic_seq2seq/decoder/decoder/TensorArray' is a reference connection and already has a device field set to /job:localhost/replica:0/task:0/device:CPU:0
created infer model with fresh parameters, time 0.01s.
# 14506
src: 尽管 明确 的 分辨 因果 关系 有一 些 困难 , 还是 有一 些 证据 表明 , 建立 了 财政 规则 体系 的 国家 具有 更为 合理 的 财政 状况 。
ref: There is some evidence that countries with fiscal rules have sounder public finances , though it is tricky to separate cause from effect .
nmt: ZigBee acti acti bonderizing party.169 superstitious superstitious superstitious 05/23/08 05/23/08 05/23/08 kvetching kvetching herbut Hamidzada Hamidzada Hamidzada Hamidzada Hamidzada Hamidzada Nudibranchs Nudibranchs present.family IMBALANCE September-1 September-1 September-1 interleaving interleaving interleaving interleaving cIients cIients Bootloader Piaohong Piaohong Piaohong discovery.But prayerWe prayerWe carpas carpas carpas 57、Hope 57、Hope 57、Hope evidenced-based highs.The yesterdayhave DVE DVE deathA.She satellite-to-ground satellite-to-ground satellite-to-ground infantsweresix died.Because died.Because died.Because insK'fiSnt
created eval model with fresh parameters, time 0.00s.
Segmentation fault
from nmt.
@hxsnow10 I train the model in the weekend, and trainning completed without any error. I did not change the source code. OS is Ubuntu, cuda 8.0 + nvidia-375, tensorflow version is 1.2.1. My script is:
python -m nmt.nmt
--src=zh --tgt=en --vocab_prefix=./temp/nmt_problem/tmp/vocab --train_prefix=./temp/nmt_problem/tmp/dev2 --dev_prefix=./temp/nmt_problem/tmp/dev2 --test_prefix=./temp/nmt_problem/tmp/dev2 --out_dir=./temp/nmt_problem/nmt_model_tmp --num_train_steps=12000 --steps_per_stats=100 --num_layers=2 --num_units=128 --dropout=0.2 --metrics=bleu
from nmt.
@mingfengwuye i'm sorry, can you try use dev3 and cpu once..., thanks...
python -m nmt.nmt --src=zh --tgt=en --vocab_prefix=./temp/nmt_problem/tmp/vocab --train_prefix=./temp/nmt_problem/tmp/dev3 --dev_prefix=./temp/nmt_problem/tmp/dev3 --test_prefix=./temp/nmt_problem/tmp/dev3 --out_dir=./temp/nmt_problem/nmt_model_tmp --num_train_steps=12000 --steps_per_stats=100 --num_layers=2 --num_units=128 --dropout=0.2 --metrics=bleu --num_gpus 0
from nmt.
@hxsnow10 I will try it later.
from nmt.
@mingfengwuye I find in my environment and cpu, the problem lies in crossent = tf.nn.sparse_softmax_cross_entropy_with_logits( in model.py, when max_time is big like 39, it will raise Segmentation fault error.
When i use tf.one_hot and tf.nn.softmax_cross_entropy_with_logits, thie error disappear.
from nmt.
@hxsnow10 How could you get vocab. Could you give me some guides? Thank you very much.
from nmt.
@mingfengwuye I can't catch you much.. My vocab is count and sort word by several english-chineses corpus from http://www.statmt.org/wmt17/translation-task.html#download after tokenize(using nltk and chineses tokenizer).
Segmentation fault is because sprase.softmax not support big tensor, so at last i try batch_size=32 with sparse.softmax, it works.
Another question: what iterator.source looks like, when i make source_reverse=False
word0,... wordk
does it look like [id0, id1, ..., idk, id_< /s >, id_< /s > ..id_< /s > ] or what? thanks!
from nmt.
@hxsnow10 can you give a complete and clear solution? I face the same question too.
from nmt.
Related Issues (20)
- How this performs against Facebook's fairseq?
- How to get the vocab file for custom dataset? HOT 1
- CopyNet
- TypeError: __call__() got an unexpected keyword argument 'training' HOT 6
- default embedding HOT 5
- Shuffle buffer filled
- assertion failed: [All values in memory_sequence_length must greater than zero. HOT 1
- Why we need to pass test dataset while training?
- GPU not fully utilized HOT 1
- top-k predictions not generated
- The result is bad HOT 3
- How to convert custom tensorflow seq2seq checkpoint model to SavedModel format(pb)? HOT 1
- Dead link in the NMT tutorial HOT 1
- How to check the attention matrix?
- Cannot generate vocab file
- How to do Hyperparameter Optimization using Tensorflow NMT? HOT 1
- What is the different between num_translations_per_input and beam_width?
- How to use BERT embedding? HOT 3
- What is the name of the loss function used in the NMT? HOT 1
- Using Pickle with CPU
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from nmt.