tensorflow / tensor2tensor Goto Github PK
View Code? Open in Web Editor NEWLibrary of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.
License: Apache License 2.0
Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.
License: Apache License 2.0
Is it possible to resume training for a certain amount of additional steps?
Hi all,
I'm a little confused with the vocab_size
defined in problem_hparams.py.
In wmt_ende_bpe32k
, the vocab_size
param is set to 40960
, while in wmt_enfr_tokens
, there's a wrong_vocab_size
param, which is set to 2**13
if in wmt_enfr_tokens_8k
. I'm guessing that this might not be the actual number of the vocabulary.
My question is:
When setting input_modality, how to set the vocab_size
?
More specifically, If I have a separated source vocab and target vocab, with size n_src, n_tgt, respectively, should I set the vocab_size
to n_src + n_tgt
or something else?
Thank you
Hi,
I add a dataset to do the classification using the Transformer model. I generated the dataset successfully and trained the model successfully. But when I do the decoding, the invalid argument error is thrown.
The following codes are used to generate dataset, it's simply to pick up data from several dataset and make the combinations. Each combination is a category. The task is to find out which category is for a new set of data.
For an example, there are four dataset: A,B,C,D. They have several members like:
A: a0,a1,a2
B: b0,b1,b2,b3
C: c0,c1
D: d0,d1,d2
So there will be 15 combinations(aka. categories): A,B,C,D,AB,AC,AD,BC,BD,CD,ABC,ABD,ACD,BCD,ABCD. My generator will generate random number of sequences that picks members from set A,B,C,D and generate targets label between the 0 to 15.
All data generation and training worked well. But when I tried to do the decoding, The invalid argument error is thrown. This error is too unspecific and I have no idea how to solve it.
Please be kind to review my code what's wrong in it. Thanks a lot!
Generator:
This generator is to pick up members from several files and the file name is the single set name and assign a target classification label to it.
import numpy as np
import os, sys
from tensor2tensor.data_generators import generator_utils
from tensor2tensor.data_generators import text_encoder
from six.moves import xrange
import ssl
import itertools
def generateDicts(dataDir,nameFiles,idFrom=100):
dictFileName=dataDir+"/name.dict"
categories=len(nameFiles)
nameset=dict()
for name in nameFiles:
nameFile=open(dataDir+"/"+name)
names=list()
for individual in nameFile:
names.append(individual.strip())
nameset[name]=names
nameFile.close()
allNames=set()
for (k,v) in zip(nameset.keys(),nameset.values()) :
allNames=allNames.union(set(v))
print("lenOfAllNames after add "+k+" is "+str(len(allNames)))
allNamesList=sorted(allNames)
names=dict()
idx=idFrom+1
for name in allNamesList:
names[idx]=name
idx+=1
dictFile=open(dictFileName,"w")
for (k,v) in zip(names.keys(),names.values()):
dictFile.write(v+"\n")
dictFile.close()
combos=dict()
for i in xrange(len(nameFiles)):
combination=itertools.combinations(nameFiles,i+1)
for signleComb in combination:
combos["_".join(signleComb)]=i
return nameset,names,combos
def generateCase(nameset,names,nameFiles,combos,maxMembers):
categories=len(nameFiles)
categorySize=list()
members = np.random.randint(maxMembers)+1
leftMembers = members
categoryList=list()
inputs=list()
for i in xrange(categories-1):
categorySize.append(np.random.randint(leftMembers))
leftMembers-=categorySize[i]
if categorySize[i] != 0:
categoryList.append(nameFiles[i])
names=nameset[nameFiles[i]]
nameLen=len(names)
for j in xrange(categorySize[i]):
nameIndex=np.random.randint(nameLen)
inputs.append(names[nameIndex])
i+=1
if leftMembers != 0:
categoryList.append(nameFiles[i])
names=nameset[nameFiles[i]]
nameLen=len(names)
for j in xrange(leftMembers):
nameIndex=np.random.randint(nameLen)
inputs.append(names[nameIndex])
cateStr="_".join(categoryList)
outputs=[cateStr]
return inputs,outputs
def party_party_generator(dataDir,nameFiles,maxMembers,numOfCases):
nameset,names,combos=generateDicts(dataDir,nameFiles)
#targetDict=dataDir+"/targets.dict"
#targetDictFile=open(targetDict,"w")
#for combo in combos:
# targetDictFile.write(combo+"\n")
#targetDictFile.close()
dictFileName=dataDir+"/name.dict"
inputsTextToken=text_encoder.TokenTextEncoder(dictFileName)
#targetsTextToken=text_encoder.TokenTextEncoder(targetDict)
for i in xrange(numOfCases):
inputs,outputs=generateCase(nameset,names,nameFiles,combos,maxMembers)
strInput=" ".join(inputs)
encodedInputs=inputsTextToken.encode(strInput)
np.random.shuffle(encodedInputs)
encodedOutputs=[combos[outputs[0]]]
yield {"inputs":encodedInputs,"targets":encodedOutputs}
HParams I added
I feel a little bit confused about the input_space_id and target_space_id, it seems like I can set anything to it without any problems.
def party_party(model_hparams):
"""Party Party."""
p = default_problem_hparams()
nameDict=model_hparams.data_dir+"/name.dict"
targetDict=model_hparams.data_dir+"/targets.dict"
num_lines = sum(1 for line in open(nameDict))
target_lines= sum(1 for line in open(targetDict))
p.input_modality = {"inputs": (registry.Modalities.SYMBOL, num_lines)}
p.vocabulary = {
"inputs": text_encoder.TokenTextEncoder(vocab_filename=nameDict),
"targets": text_encoder.TextEncoder(),
}
p.target_modality = (registry.Modalities.CLASS_LABEL, target_lines)
p.batch_size_multiplier = 4
p.max_expected_batch_size_per_shard = 8
p.loss_multiplier = 3.0
p.input_space_id = 1
p.target_space_id = 1
return p
"party_party": lambda p:party_party(p),
The training script I am using:
Since I need to change the t2t codes frequently, I don't install it to my site-package directory.
tensor2tensor/bin/t2t-trainer --data_dir /mnt/5efa3937-4221-48b5-9660-85a4a7eb0cfd/data/ --problems=party_party --model=transformer --hparams_set=transformer_base_single_gpu --keep_checkpoint_max=5 --save_checkpoints_secs=3600 --hparams='batch_size=2048' --output_dir /mnt/5efa3937-4221-48b5-9660-85a4a7eb0cfd/model
The predict script I'm using:
# Decode
DATA_DIR=/mnt/5efa3937-4221-48b5-9660-85a4a7eb0cfd/data
PROBLEM=party_party
MODEL=transformer
DECODE_FILE=decode.txt
TRAIN_DIR=/mnt/5efa3937-4221-48b5-9660-85a4a7eb0cfd/model
HPARAMS=transformer_base_single_gpu
BEAM_SIZE=4
ALPHA=0.6
tensor2tensor/bin/t2t-trainer \
--data_dir=$DATA_DIR \
--problems=$PROBLEM \
--model=$MODEL \
--hparams_set=$HPARAMS \
--output_dir=$TRAIN_DIR \
--train_steps=0 \
--eval_steps=0 \
--decode_beam_size=$BEAM_SIZE \
--decode_alpha=$ALPHA \
--decode_from_file=$DECODE_FILE
The decode.txt is pretty simple:
324 f2r32f2r5 3f2fsfda fewfoIFE fselfj203 fselfj203 fj203rf2 3jr22iofj dsfslfkj23LJ dsf2f dsfslfkj23LJ dsflw2mf>K
This text will be encoded to int ids just like what the generator does.
However this text is random generated, the procedure is very common to classify the text corpus.
Please help to see where I made wrong. Thanks a lot!
There are like a hundred GANs out there, would be great if you can also provide a general framework to standarize the typical GAN parts into tensor2tensor
Hi all,
When I run the shell of walkthrough training a good English-to-German translation model using the Transformer model, but I encountered the problem.
My problem is:
INFO:tensorflow:Total trainable variables size: 60276736
INFO:tensorflow:Total embedding variables size: 16384
INFO:tensorflow:Total non-embedding variables size: 60260352
INFO:tensorflow:Computing gradients for global model_fn.
INFO:tensorflow:Global model_fn finished.
INFO:tensorflow:Create CheckpointSaverHook.
2017-06-30 15:05:58.562782: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-06-30 15:05:58.562814: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-06-30 15:05:58.562820: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-06-30 15:05:58.562824: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-06-30 15:05:58.562829: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
Traceback (most recent call last):
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 1139, in _do_call
return fn(*args)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 1121, in _run_fn
status, run_metadata)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/contextlib.py", line 66, in exit
next(self.gen)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status
pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[27,4,0] = -1 is not in [0, 31488)
[[Node: symbol_modality_31488_512/parallel_0/symbol_modality_31488_512/shared/Gather = Gather[Tindices=DT_INT32, Tparams=DT_FLOAT, validate_indices=true, _device="/job:localhost/replica:0/task:0/cpu:0"](symbol_modality_31488_512/parallel_0/symbol_modality_31488_512/shared/ConvertGradientToTensor_cc661786, symbol_modality_31488_512/parallel_0/symbol_modality_31488_512/shared/Squeeze)]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/sycmss/tc/anaconda3/envs/tensorflow/bin/t2t-trainer", line 83, in
tf.app.run()
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "/home/sycmss/tc/anaconda3/envs/tensorflow/bin/t2t-trainer", line 79, in main
schedule=FLAGS.schedule)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensor2tensor/utils/trainer_utils.py", line 240, in run
run_locally(exp_fn(output_dir))
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensor2tensor/utils/trainer_utils.py", line 532, in run_locally
exp.train()
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/contrib/learn/python/learn/experiment.py", line 275, in train
hooks=self._train_monitors + extra_hooks)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/contrib/learn/python/learn/experiment.py", line 665, in _call_train
monitors=hooks)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/python/util/deprecation.py", line 289, in new_func
return func(*args, **kwargs)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 455, in fit
loss = self._train_model(input_fn=input_fn, hooks=hooks)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 1007, in _train_model
_, loss = mon_sess.run([model_fn_ops.train_op, model_fn_ops.loss])
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/python/training/monitored_session.py", line 505, in run
run_metadata=run_metadata)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/python/training/monitored_session.py", line 842, in run
run_metadata=run_metadata)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/python/training/monitored_session.py", line 798, in run
return self._sess.run(*args, **kwargs)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/python/training/monitored_session.py", line 952, in run
run_metadata=run_metadata)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/python/training/monitored_session.py", line 798, in run
return self._sess.run(*args, **kwargs)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 789, in run
run_metadata_ptr)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 997, in _run
feed_dict_string, options, run_metadata)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 1132, in _do_run
target_list, options, run_metadata)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 1152, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[27,4,0] = -1 is not in [0, 31488)
[[Node: symbol_modality_31488_512/parallel_0/symbol_modality_31488_512/shared/Gather = Gather[Tindices=DT_INT32, Tparams=DT_FLOAT, validate_indices=true, _device="/job:localhost/replica:0/task:0/cpu:0"](symbol_modality_31488_512/parallel_0/symbol_modality_31488_512/shared/ConvertGradientToTensor_cc661786, symbol_modality_31488_512/parallel_0/symbol_modality_31488_512/shared/Squeeze)]]
Caused by op 'symbol_modality_31488_512/parallel_0/symbol_modality_31488_512/shared/Gather', defined at:
File "/home/sycmss/tc/anaconda3/envs/tensorflow/bin/t2t-trainer", line 83, in
tf.app.run()
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "/home/sycmss/tc/anaconda3/envs/tensorflow/bin/t2t-trainer", line 79, in main
schedule=FLAGS.schedule)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensor2tensor/utils/trainer_utils.py", line 240, in run
run_locally(exp_fn(output_dir))
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensor2tensor/utils/trainer_utils.py", line 532, in run_locally
exp.train()
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/contrib/learn/python/learn/experiment.py", line 275, in train
hooks=self._train_monitors + extra_hooks)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/contrib/learn/python/learn/experiment.py", line 665, in _call_train
monitors=hooks)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/python/util/deprecation.py", line 289, in new_func
return func(*args, **kwargs)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 455, in fit
loss = self._train_model(input_fn=input_fn, hooks=hooks)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 955, in _train_model
model_fn_ops = self._get_train_ops(features, labels)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 1162, in _get_train_ops
return self._call_model_fn(features, labels, model_fn_lib.ModeKeys.TRAIN)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 1133, in _call_model_fn
model_fn_results = self._model_fn(features, labels, **kwargs)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensor2tensor/utils/trainer_utils.py", line 424, in model_fn
len(hparams.problems) - 1)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensor2tensor/utils/trainer_utils.py", line 751, in _cond_on_index
return fn(cur_idx)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensor2tensor/utils/trainer_utils.py", line 406, in nth_model
features, skip=(skipping_is_on and skip_this_one))
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensor2tensor/utils/t2t_model.py", line 377, in model_fn
sharded_features[key], dp)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensor2tensor/utils/modality.py", line 91, in bottom_sharded
return data_parallelism(self.bottom, xs)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensor2tensor/utils/expert_utils.py", line 294, in call
outputs.append(fns[i](*my_args[i], **my_kwargs[i]))
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensor2tensor/models/modalities.py", line 88, in bottom
return self.bottom_simple(x, "shared", reuse=None)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensor2tensor/models/modalities.py", line 80, in bottom_simple
ret = tf.gather(var, x)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/python/ops/gen_array_ops.py", line 1179, in gather
validate_indices=validate_indices, name=name)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
op_def=op_def)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/python/framework/ops.py", line 2506, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/python/framework/ops.py", line 1269, in init
self._traceback = _extract_stack()
InvalidArgumentError (see above for traceback): indices[27,4,0] = -1 is not in [0, 31488)
[[Node: symbol_modality_31488_512/parallel_0/symbol_modality_31488_512/shared/Gather = Gather[Tindices=DT_INT32, Tparams=DT_FLOAT, validate_indices=true, _device="/job:localhost/replica:0/task:0/cpu:0"](symbol_modality_31488_512/parallel_0/symbol_modality_31488_512/shared/ConvertGradientToTensor_cc661786, symbol_modality_31488_512/parallel_0/symbol_modality_31488_512/shared/Squeeze)]]
INFO:tensorflow:Creating experiment, storing model files in /root/t2t_train/wmt_ende_tokens_32k/transformer-transformer_base
INFO:tensorflow:datashard_devices: ['gpu:0']
INFO:tensorflow:caching_devices: None
INFO:tensorflow:Using config: {'_task_type': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f884bf21630>, '_model_dir': '/root/t2t_train/wmt_ende_tokens_32k/transformer-transformer_base', '_save_checkpoints_secs': 600, '_save_summary_steps': 100, '_session_config': allow_soft_placement: true
graph_options {
optimizer_options {
}
}
, '_tf_config': gpu_options {
per_process_gpu_memory_fraction: 1.0
}
, '_task_id': 0, '_tf_random_seed': None, '_num_ps_replicas': 0, '_evaluation_master': '', '_keep_checkpoint_max': 20, '_keep_checkpoint_every_n_hours': 10000, '_master': '', '_is_chief': True, '_num_worker_replicas': 0, '_save_checkpoints_steps': None, '_environment': 'local'}
INFO:tensorflow:Performing Decoding from a file.
INFO:tensorflow:Getting sorted inputs
Traceback (most recent call last):
File "/home/sycmss/tc/anaconda3/envs/tensorflow/bin/t2t-trainer", line 83, in
tf.app.run()
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "/home/sycmss/tc/anaconda3/envs/tensorflow/bin/t2t-trainer", line 79, in main
schedule=FLAGS.schedule)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensor2tensor/utils/trainer_utils.py", line 240, in run
run_locally(exp_fn(output_dir))
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensor2tensor/utils/trainer_utils.py", line 544, in run_locally
decode_from_file(estimator, FLAGS.decode_from_file)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensor2tensor/utils/trainer_utils.py", line 648, in decode_from_file
as_iterable=True)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/python/util/deprecation.py", line 289, in new_func
return func(*args, **kwargs)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 590, in predict
as_iterable=as_iterable)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 878, in _infer_model
% self._model_dir)
tensorflow.contrib.learn.python.learn.estimators._sklearn.NotFittedError: Couldn't find trained model at /root/t2t_train/wmt_ende_tokens_32k/transformer-transformer_base.
cat: /root/t2t_data/decode_this.txt.transformer.transformer_base.beam4.alpha0.6.decodes: No such file or directory
How should I solve this problem?
Thank you
Does the trainer currently write out logs for Tensorboard? I looked through the code in utils/trainer_utils.py, and while I see calls to tf.summary.scalar
, I don't see a call to tf.summary.FileWriter
.
If it does support Tensorboard, how do I configure it?
If not, I'll start working on a pull request tomorrow to implement this.
I have run into some issues installing on my GPU with Ubuntu 14.04.5. it reaches the
Installing collected packages: mpmath, sympy, tensor2tensor
line and then there is a permission denied error.
I tried pip installing each individual error into the directory specified and then it said that everything was installed, but the t2t-trainer --registry_help
did not work.
this step works fine on my CPU but I also run into the known issue of downloading the dataset.
Is it possible to run the Walkthrough example from the website with other data than WMT?
I've tried changing the data paths in wmt.py:
_ENDE_TRAIN_DATASETS = [
[
"http://data.statmt.org/wmt16/translation-task/training-parallel-nc-v11.tgz", # pylint: disable=line-too-long
("training-parallel-nc-v11/news-commentary-v11.de-en.en",
"training-parallel-nc-v11/news-commentary-v11.de-en.de")
],
[
"http://www.statmt.org/wmt13/training-parallel-commoncrawl.tgz",
("commoncrawl.de-en.en", "commoncrawl.de-en.de")
],
[
"http://www.statmt.org/wmt13/training-parallel-europarl-v7.tgz",
("training/europarl-v7.de-en.en", "training/europarl-v7.de-en.de")
],
]
_ENDE_TEST_DATASETS = [
[
"http://data.statmt.org/wmt16/translation-task/dev.tgz",
("dev/newstest2013.en", "dev/newstest2013.de")
But when I run the example with new paths, it still downloads the WMT data...
I trained the wmt transformer_base model, and see some model checkpoints like.
-rw-rw-r-- 1 public public 24 Jun 23 09:47 model.ckpt-55975.data-00000-of-00002
-rw-rw-r-- 1 public public 1320167432 Jun 23 09:47 model.ckpt-55975.data-00001-of-00002
-rw-rw-r-- 1 public public 10449 Jun 23 09:47 model.ckpt-55975.index
-rw-rw-r-- 1 public public 24328177 Jun 23 09:47 model.ckpt-55975.meta
-rw-rw-r-- 1 public public 24 Jun 23 09:57 model.ckpt-56426.data-00000-of-00002
-rw-rw-r-- 1 public public 1320167432 Jun 23 09:57 model.ckpt-56426.data-00001-of-00002
-rw-rw-r-- 1 public public 10414 Jun 23 09:57 model.ckpt-56426.index
There are 2 questions:
readme.md
, to see that the --model param is set to transformer
, which I thought should be set to a specific model file. Will t2t-trainer automatically use the latest checkpoint of saved model?
DECODE_FILE=$DATA_DIR/decode_this.txt
echo "Hello world" >> $DECODE_FILE
echo "Goodbye world" >> $DECODE_FILE
BEAM_SIZE=4
ALPHA=0.6
t2t-trainer \
--data_dir=$DATA_DIR \
--problems=$PROBLEM \
--model=$MODEL \
--hparams_set=$HPARAMS \
--output_dir=$TRAIN_DIR \
--train_steps=0 \
--eval_steps=0 \
--decode_beam_size=$BEAM_SIZE \
--decode_alpha=$ALPHA \
--decode_from_file=$DECODE_FILE
When I try decode a file that was not part of the training/testing set, the following error occurs:
INFO:tensorflow:Performing Decoding from a file.
INFO:tensorflow:Getting sorted inputs
INFO:tensorflow: batch 94
INFO:tensorflow:Deocding batch 0
Traceback (most recent call last):
File "/usr/local/bin/t2t-trainer", line 83, in <module>
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "/usr/local/bin/t2t-trainer", line 79, in main
schedule=FLAGS.schedule)
File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/trainer_utils.py", line 240, in run
run_locally(exp_fn(output_dir))
File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/trainer_utils.py", line 543, in run_locally
decode_from_file(estimator, FLAGS.decode_from_file)
File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/trainer_utils.py", line 645, in decode_from_file
result_iter = estimator.predict(input_fn=input_fn.next, as_iterable=True)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/deprecation.py", line 289, in new_func
return func(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 590, in predict
as_iterable=as_iterable)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 883, in _infer_model
features = self._get_features_from_input_fn(input_fn)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 863, in _get_features_from_input_fn
result = input_fn()
File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/trainer_utils.py", line 725, in _decode_batch_input_fn
input_ids = vocabulary.encode(inputs)
File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/data_generators/text_encoder.py", line 132, in encode
ret = [self._token_to_id[tok] for tok in sentence.strip().split()]
KeyError: '@-@'
The decoding command I use is as follows:
PROBLEM=wmt_ende_bpe32k
MODEL=transformer
HPARAMS=transformer_base
DATA_DIR=$HOME/t2t_data
TMP_DIR=/tmp/t2t_datagen
TRAIN_DIR=$HOME/t2t_train/$PROBLEM/$MODEL-$HPARAMS
BEAM_SIZE=4
ALPHA=0.6
t2t-trainer \
--data_dir=$DATA_DIR \
--problems=$PROBLEM \
--model=$MODEL \
--hparams_set=$HPARAMS \
--output_dir=$TRAIN_DIR \
--train_steps=0 \
--eval_steps=0 \
--decode_beam_size=$BEAM_SIZE \
--decode_alpha=$ALPHA \
--decode_from_file /tmp/t2t_datagen/newsdev2016.bpe.en
I'm trying to train a bytes-to-subwords model:
def problem(model_hparams):
# This vocab file must be present within the data directory.
vocab_filename = os.path.join(model_hparams.data_dir, 'vocab')
source_encoder = text_encoder.ByteTextEncoder()
target_encoder = text_encoder.SubwordTextEncoder(vocab_filename)
p = problem_hparams.default_problem_hparams()
p.input_modality = {"inputs": (registry.Modalities.SYMBOL, source_encoder.vocab_size)}
p.target_modality = (registry.Modalities.SYMBOL, target_encoder.vocab_size)
p.vocabulary = {
"inputs": source_encoder,
"targets": target_encoder,
}
return p
This fails catastrophically during model construction. It appears to work if the input & target modalities have the same vocab size (eg switching both to share the same SubwordTextEncoder
) but fails if they differ in size. This appears to not be the case for other modalities (eg changing both the above to CLASS_LABEL
appears to work).
I think that would be very cool and attractive adding the support for the KITTI datasets
TensorFlow version : 1.1.0
OS:CentOS : 7.0
tensor2tensor version : 1.0.7 from pip
I encounter an exception when I run 'Work Through'.
tensor2tensor/utils/trainer_utils.py use tf.contrib.learn.Estimator.Invoke Estimator constructor with session_config.But In tensorflow 1.1.0 ,tf.contrib.learn.Estimator constructor no session_config args.
So tensor2tensor not compatibility with TensorFlow 1.1.0?
During inference, I'm not able to create a file containing the inference output.
I've tried --decode_to_file
, but no output file is being created...
Hi all,
I tested the training example in readme.
I found that the volatile GPU-util
of almost all GPUs are 0%
except the first one but took all GPU memories. I'm not sure whether it's a tensorflow
or tensor2tensor
error.
Thank you
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.26 Driver Version: 375.26 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla M40 24GB Off | 0000:04:00.0 Off | 0 |
| N/A 56C P0 187W / 250W | 21871MiB / 22939MiB | 100% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla M40 24GB Off | 0000:05:00.0 Off | 0 |
| N/A 28C P0 56W / 250W | 21806MiB / 22939MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla M40 24GB Off | 0000:08:00.0 Off | 0 |
| N/A 28C P0 55W / 250W | 21804MiB / 22939MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla M40 24GB Off | 0000:09:00.0 Off | 0 |
| N/A 29C P0 55W / 250W | 21804MiB / 22939MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 4 Tesla M40 24GB Off | 0000:86:00.0 Off | 0 |
| N/A 29C P0 56W / 250W | 21808MiB / 22939MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 5 Tesla M40 24GB Off | 0000:87:00.0 Off | 0 |
| N/A 27C P0 57W / 250W | 21806MiB / 22939MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 6 Tesla M40 24GB Off | 0000:8A:00.0 Off | 0 |
| N/A 30C P0 57W / 250W | 21804MiB / 22939MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 7 Tesla M40 24GB Off | 0000:8B:00.0 Off | 0 |
| N/A 27C P0 56W / 250W | 21804MiB / 22939MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
Hi, I am a newbie for distributed tensorflow. I followed the instruction here
https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/docs/distributed_training.md
I created the TF_CONFIG, But, I don't know how to use it during the training in commanline or anyother way.
could you give me some advice or some document instructions.
I downloaded and ran the tensorflow docker, then started following the walkthough by installing tensor2tensor with pip install, setting the environment variables, and running t2t-datagen.
Next. I ran the t2t-trainer:
t2t-trainer --data_dir=$DATA_DIR --problems=$PROBLEM --model=$MODEL --hparams_set=$HPARAMS --output_dir=$TRAIN_DIR
It looked like it was training for a minute, until it failed with:
t2t-trainer --data_dir=$DATA_DIR --problems=$PROBLEM --model=$MODEL --hparams_set=$HPARAMS --output_dir=$TRAIN_DIR
INFO:tensorflow:Creating experiment, storing model files in /root/t2t_train/wmt_ende_tokens_32k/transformer-transformer_base
INFO:tensorflow:datashard_devices: ['gpu:0']
INFO:tensorflow:caching_devices: None
INFO:tensorflow:Using config: {'_model_dir': '/root/t2t_train/wmt_ende_tokens_32k/transformer-transformer_base', '_save_checkpoints_secs': 600, '_num_ps_replicas': 0, '_keep_checkpoint_max': 20, '_tf_random_seed': None, '_task_type': None, '_environment': 'local', '_is_chief': True, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f054c81bb50>, '_tf_config': gpu_options {
per_process_gpu_memory_fraction: 1.0
}
, '_num_worker_replicas': 0, '_task_id': 0, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_evaluation_master': '', '_keep_checkpoint_every_n_hours': 10000, '_master': '', '_session_config': allow_soft_placement: true
graph_options {
optimizer_options {
}
}
}
INFO:tensorflow:Performing local training.
INFO:tensorflow:datashard_devices: ['gpu:0']
INFO:tensorflow:caching_devices: None
INFO:tensorflow:Doing model_fn_body took 2.320 sec.
INFO:tensorflow:This model_fn took 2.521 sec.
INFO:tensorflow:Weight body/decoder/layer_0/conv_hidden_relu/conv1_single/bias shape (2048,) size 2048
INFO:tensorflow:Weight body/decoder/layer_0/conv_hidden_relu/conv1_single/kernel shape (1, 1, 512, 2048) size 1048576
INFO:tensorflow:Weight body/decoder/layer_0/conv_hidden_relu/conv2_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_0/conv_hidden_relu/conv2_single/kernel shape (1, 1, 2048, 512) size 1048576
INFO:tensorflow:Weight body/decoder/layer_0/decoder_self_attention/output_transform_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_0/decoder_self_attention/output_transform_single/kernel shape (1, 1, 512, 512) size 262144
INFO:tensorflow:Weight body/decoder/layer_0/decoder_self_attention/qkv_transform_single/bias shape (1536,) size 1536
INFO:tensorflow:Weight body/decoder/layer_0/decoder_self_attention/qkv_transform_single/kernel shape (1, 1, 512, 1536) size 786432
INFO:tensorflow:Weight body/decoder/layer_0/encdec_attention/kv_transform_single/bias shape (1024,) size 1024
INFO:tensorflow:Weight body/decoder/layer_0/encdec_attention/kv_transform_single/kernel shape (1, 1, 512, 1024) size 524288
INFO:tensorflow:Weight body/decoder/layer_0/encdec_attention/output_transform_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_0/encdec_attention/output_transform_single/kernel shape (1, 1, 512, 512) size 262144
INFO:tensorflow:Weight body/decoder/layer_0/encdec_attention/q_transform_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_0/encdec_attention/q_transform_single/kernel shape (1, 1, 512, 512) size 262144
INFO:tensorflow:Weight body/decoder/layer_0/layer_norm/layer_norm_bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_0/layer_norm/layer_norm_scale shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_0/layer_norm_1/layer_norm_bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_0/layer_norm_1/layer_norm_scale shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_0/layer_norm_2/layer_norm_bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_0/layer_norm_2/layer_norm_scale shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_1/conv_hidden_relu/conv1_single/bias shape (2048,) size 2048
INFO:tensorflow:Weight body/decoder/layer_1/conv_hidden_relu/conv1_single/kernel shape (1, 1, 512, 2048) size 1048576
INFO:tensorflow:Weight body/decoder/layer_1/conv_hidden_relu/conv2_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_1/conv_hidden_relu/conv2_single/kernel shape (1, 1, 2048, 512) size 1048576
INFO:tensorflow:Weight body/decoder/layer_1/decoder_self_attention/output_transform_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_1/decoder_self_attention/output_transform_single/kernel shape (1, 1, 512, 512) size 262144
INFO:tensorflow:Weight body/decoder/layer_1/decoder_self_attention/qkv_transform_single/bias shape (1536,) size 1536
INFO:tensorflow:Weight body/decoder/layer_1/decoder_self_attention/qkv_transform_single/kernel shape (1, 1, 512, 1536) size 786432
INFO:tensorflow:Weight body/decoder/layer_1/encdec_attention/kv_transform_single/bias shape (1024,) size 1024
INFO:tensorflow:Weight body/decoder/layer_1/encdec_attention/kv_transform_single/kernel shape (1, 1, 512, 1024) size 524288
INFO:tensorflow:Weight body/decoder/layer_1/encdec_attention/output_transform_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_1/encdec_attention/output_transform_single/kernel shape (1, 1, 512, 512) size 262144
INFO:tensorflow:Weight body/decoder/layer_1/encdec_attention/q_transform_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_1/encdec_attention/q_transform_single/kernel shape (1, 1, 512, 512) size 262144
INFO:tensorflow:Weight body/decoder/layer_1/layer_norm/layer_norm_bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_1/layer_norm/layer_norm_scale shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_1/layer_norm_1/layer_norm_bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_1/layer_norm_1/layer_norm_scale shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_1/layer_norm_2/layer_norm_bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_1/layer_norm_2/layer_norm_scale shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_2/conv_hidden_relu/conv1_single/bias shape (2048,) size 2048
INFO:tensorflow:Weight body/decoder/layer_2/conv_hidden_relu/conv1_single/kernel shape (1, 1, 512, 2048) size 1048576
INFO:tensorflow:Weight body/decoder/layer_2/conv_hidden_relu/conv2_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_2/conv_hidden_relu/conv2_single/kernel shape (1, 1, 2048, 512) size 1048576
INFO:tensorflow:Weight body/decoder/layer_2/decoder_self_attention/output_transform_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_2/decoder_self_attention/output_transform_single/kernel shape (1, 1, 512, 512) size 262144
INFO:tensorflow:Weight body/decoder/layer_2/decoder_self_attention/qkv_transform_single/bias shape (1536,) size 1536
INFO:tensorflow:Weight body/decoder/layer_2/decoder_self_attention/qkv_transform_single/kernel shape (1, 1, 512, 1536) size 786432
INFO:tensorflow:Weight body/decoder/layer_2/encdec_attention/kv_transform_single/bias shape (1024,) size 1024
INFO:tensorflow:Weight body/decoder/layer_2/encdec_attention/kv_transform_single/kernel shape (1, 1, 512, 1024) size 524288
INFO:tensorflow:Weight body/decoder/layer_2/encdec_attention/output_transform_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_2/encdec_attention/output_transform_single/kernel shape (1, 1, 512, 512) size 262144
INFO:tensorflow:Weight body/decoder/layer_2/encdec_attention/q_transform_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_2/encdec_attention/q_transform_single/kernel shape (1, 1, 512, 512) size 262144
INFO:tensorflow:Weight body/decoder/layer_2/layer_norm/layer_norm_bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_2/layer_norm/layer_norm_scale shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_2/layer_norm_1/layer_norm_bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_2/layer_norm_1/layer_norm_scale shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_2/layer_norm_2/layer_norm_bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_2/layer_norm_2/layer_norm_scale shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_3/conv_hidden_relu/conv1_single/bias shape (2048,) size 2048
INFO:tensorflow:Weight body/decoder/layer_3/conv_hidden_relu/conv1_single/kernel shape (1, 1, 512, 2048) size 1048576
INFO:tensorflow:Weight body/decoder/layer_3/conv_hidden_relu/conv2_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_3/conv_hidden_relu/conv2_single/kernel shape (1, 1, 2048, 512) size 1048576
INFO:tensorflow:Weight body/decoder/layer_3/decoder_self_attention/output_transform_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_3/decoder_self_attention/output_transform_single/kernel shape (1, 1, 512, 512) size 262144
INFO:tensorflow:Weight body/decoder/layer_3/decoder_self_attention/qkv_transform_single/bias shape (1536,) size 1536
INFO:tensorflow:Weight body/decoder/layer_3/decoder_self_attention/qkv_transform_single/kernel shape (1, 1, 512, 1536) size 786432
INFO:tensorflow:Weight body/decoder/layer_3/encdec_attention/kv_transform_single/bias shape (1024,) size 1024
INFO:tensorflow:Weight body/decoder/layer_3/encdec_attention/kv_transform_single/kernel shape (1, 1, 512, 1024) size 524288
INFO:tensorflow:Weight body/decoder/layer_3/encdec_attention/output_transform_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_3/encdec_attention/output_transform_single/kernel shape (1, 1, 512, 512) size 262144
INFO:tensorflow:Weight body/decoder/layer_3/encdec_attention/q_transform_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_3/encdec_attention/q_transform_single/kernel shape (1, 1, 512, 512) size 262144
INFO:tensorflow:Weight body/decoder/layer_3/layer_norm/layer_norm_bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_3/layer_norm/layer_norm_scale shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_3/layer_norm_1/layer_norm_bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_3/layer_norm_1/layer_norm_scale shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_3/layer_norm_2/layer_norm_bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_3/layer_norm_2/layer_norm_scale shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_4/conv_hidden_relu/conv1_single/bias shape (2048,) size 2048
INFO:tensorflow:Weight body/decoder/layer_4/conv_hidden_relu/conv1_single/kernel shape (1, 1, 512, 2048) size 1048576
INFO:tensorflow:Weight body/decoder/layer_4/conv_hidden_relu/conv2_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_4/conv_hidden_relu/conv2_single/kernel shape (1, 1, 2048, 512) size 1048576
INFO:tensorflow:Weight body/decoder/layer_4/decoder_self_attention/output_transform_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_4/decoder_self_attention/output_transform_single/kernel shape (1, 1, 512, 512) size 262144
INFO:tensorflow:Weight body/decoder/layer_4/decoder_self_attention/qkv_transform_single/bias shape (1536,) size 1536
INFO:tensorflow:Weight body/decoder/layer_4/decoder_self_attention/qkv_transform_single/kernel shape (1, 1, 512, 1536) size 786432
INFO:tensorflow:Weight body/decoder/layer_4/encdec_attention/kv_transform_single/bias shape (1024,) size 1024
INFO:tensorflow:Weight body/decoder/layer_4/encdec_attention/kv_transform_single/kernel shape (1, 1, 512, 1024) size 524288
INFO:tensorflow:Weight body/decoder/layer_4/encdec_attention/output_transform_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_4/encdec_attention/output_transform_single/kernel shape (1, 1, 512, 512) size 262144
INFO:tensorflow:Weight body/decoder/layer_4/encdec_attention/q_transform_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_4/encdec_attention/q_transform_single/kernel shape (1, 1, 512, 512) size 262144
INFO:tensorflow:Weight body/decoder/layer_4/layer_norm/layer_norm_bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_4/layer_norm/layer_norm_scale shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_4/layer_norm_1/layer_norm_bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_4/layer_norm_1/layer_norm_scale shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_4/layer_norm_2/layer_norm_bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_4/layer_norm_2/layer_norm_scale shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_5/conv_hidden_relu/conv1_single/bias shape (2048,) size 2048
INFO:tensorflow:Weight body/decoder/layer_5/conv_hidden_relu/conv1_single/kernel shape (1, 1, 512, 2048) size 1048576
INFO:tensorflow:Weight body/decoder/layer_5/conv_hidden_relu/conv2_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_5/conv_hidden_relu/conv2_single/kernel shape (1, 1, 2048, 512) size 1048576
INFO:tensorflow:Weight body/decoder/layer_5/decoder_self_attention/output_transform_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_5/decoder_self_attention/output_transform_single/kernel shape (1, 1, 512, 512) size 262144
INFO:tensorflow:Weight body/decoder/layer_5/decoder_self_attention/qkv_transform_single/bias shape (1536,) size 1536
INFO:tensorflow:Weight body/decoder/layer_5/decoder_self_attention/qkv_transform_single/kernel shape (1, 1, 512, 1536) size 786432
INFO:tensorflow:Weight body/decoder/layer_5/encdec_attention/kv_transform_single/bias shape (1024,) size 1024
INFO:tensorflow:Weight body/decoder/layer_5/encdec_attention/kv_transform_single/kernel shape (1, 1, 512, 1024) size 524288
INFO:tensorflow:Weight body/decoder/layer_5/encdec_attention/output_transform_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_5/encdec_attention/output_transform_single/kernel shape (1, 1, 512, 512) size 262144
INFO:tensorflow:Weight body/decoder/layer_5/encdec_attention/q_transform_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_5/encdec_attention/q_transform_single/kernel shape (1, 1, 512, 512) size 262144
INFO:tensorflow:Weight body/decoder/layer_5/layer_norm/layer_norm_bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_5/layer_norm/layer_norm_scale shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_5/layer_norm_1/layer_norm_bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_5/layer_norm_1/layer_norm_scale shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_5/layer_norm_2/layer_norm_bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_5/layer_norm_2/layer_norm_scale shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_0/conv_hidden_relu/conv1_single/bias shape (2048,) size 2048
INFO:tensorflow:Weight body/encoder/layer_0/conv_hidden_relu/conv1_single/kernel shape (1, 1, 512, 2048) size 1048576
INFO:tensorflow:Weight body/encoder/layer_0/conv_hidden_relu/conv2_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_0/conv_hidden_relu/conv2_single/kernel shape (1, 1, 2048, 512) size 1048576
INFO:tensorflow:Weight body/encoder/layer_0/encoder_self_attention/output_transform_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_0/encoder_self_attention/output_transform_single/kernel shape (1, 1, 512, 512) size 262144
INFO:tensorflow:Weight body/encoder/layer_0/encoder_self_attention/qkv_transform_single/bias shape (1536,) size 1536
INFO:tensorflow:Weight body/encoder/layer_0/encoder_self_attention/qkv_transform_single/kernel shape (1, 1, 512, 1536) size 786432
INFO:tensorflow:Weight body/encoder/layer_0/layer_norm/layer_norm_bias shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_0/layer_norm/layer_norm_scale shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_0/layer_norm_1/layer_norm_bias shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_0/layer_norm_1/layer_norm_scale shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_1/conv_hidden_relu/conv1_single/bias shape (2048,) size 2048
INFO:tensorflow:Weight body/encoder/layer_1/conv_hidden_relu/conv1_single/kernel shape (1, 1, 512, 2048) size 1048576
INFO:tensorflow:Weight body/encoder/layer_1/conv_hidden_relu/conv2_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_1/conv_hidden_relu/conv2_single/kernel shape (1, 1, 2048, 512) size 1048576
INFO:tensorflow:Weight body/encoder/layer_1/encoder_self_attention/output_transform_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_1/encoder_self_attention/output_transform_single/kernel shape (1, 1, 512, 512) size 262144
INFO:tensorflow:Weight body/encoder/layer_1/encoder_self_attention/qkv_transform_single/bias shape (1536,) size 1536
INFO:tensorflow:Weight body/encoder/layer_1/encoder_self_attention/qkv_transform_single/kernel shape (1, 1, 512, 1536) size 786432
INFO:tensorflow:Weight body/encoder/layer_1/layer_norm/layer_norm_bias shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_1/layer_norm/layer_norm_scale shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_1/layer_norm_1/layer_norm_bias shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_1/layer_norm_1/layer_norm_scale shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_2/conv_hidden_relu/conv1_single/bias shape (2048,) size 2048
INFO:tensorflow:Weight body/encoder/layer_2/conv_hidden_relu/conv1_single/kernel shape (1, 1, 512, 2048) size 1048576
INFO:tensorflow:Weight body/encoder/layer_2/conv_hidden_relu/conv2_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_2/conv_hidden_relu/conv2_single/kernel shape (1, 1, 2048, 512) size 1048576
INFO:tensorflow:Weight body/encoder/layer_2/encoder_self_attention/output_transform_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_2/encoder_self_attention/output_transform_single/kernel shape (1, 1, 512, 512) size 262144
INFO:tensorflow:Weight body/encoder/layer_2/encoder_self_attention/qkv_transform_single/bias shape (1536,) size 1536
INFO:tensorflow:Weight body/encoder/layer_2/encoder_self_attention/qkv_transform_single/kernel shape (1, 1, 512, 1536) size 786432
INFO:tensorflow:Weight body/encoder/layer_2/layer_norm/layer_norm_bias shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_2/layer_norm/layer_norm_scale shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_2/layer_norm_1/layer_norm_bias shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_2/layer_norm_1/layer_norm_scale shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_3/conv_hidden_relu/conv1_single/bias shape (2048,) size 2048
INFO:tensorflow:Weight body/encoder/layer_3/conv_hidden_relu/conv1_single/kernel shape (1, 1, 512, 2048) size 1048576
INFO:tensorflow:Weight body/encoder/layer_3/conv_hidden_relu/conv2_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_3/conv_hidden_relu/conv2_single/kernel shape (1, 1, 2048, 512) size 1048576
INFO:tensorflow:Weight body/encoder/layer_3/encoder_self_attention/output_transform_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_3/encoder_self_attention/output_transform_single/kernel shape (1, 1, 512, 512) size 262144
INFO:tensorflow:Weight body/encoder/layer_3/encoder_self_attention/qkv_transform_single/bias shape (1536,) size 1536
INFO:tensorflow:Weight body/encoder/layer_3/encoder_self_attention/qkv_transform_single/kernel shape (1, 1, 512, 1536) size 786432
INFO:tensorflow:Weight body/encoder/layer_3/layer_norm/layer_norm_bias shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_3/layer_norm/layer_norm_scale shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_3/layer_norm_1/layer_norm_bias shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_3/layer_norm_1/layer_norm_scale shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_4/conv_hidden_relu/conv1_single/bias shape (2048,) size 2048
INFO:tensorflow:Weight body/encoder/layer_4/conv_hidden_relu/conv1_single/kernel shape (1, 1, 512, 2048) size 1048576
INFO:tensorflow:Weight body/encoder/layer_4/conv_hidden_relu/conv2_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_4/conv_hidden_relu/conv2_single/kernel shape (1, 1, 2048, 512) size 1048576
INFO:tensorflow:Weight body/encoder/layer_4/encoder_self_attention/output_transform_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_4/encoder_self_attention/output_transform_single/kernel shape (1, 1, 512, 512) size 262144
INFO:tensorflow:Weight body/encoder/layer_4/encoder_self_attention/qkv_transform_single/bias shape (1536,) size 1536
INFO:tensorflow:Weight body/encoder/layer_4/encoder_self_attention/qkv_transform_single/kernel shape (1, 1, 512, 1536) size 786432
INFO:tensorflow:Weight body/encoder/layer_4/layer_norm/layer_norm_bias shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_4/layer_norm/layer_norm_scale shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_4/layer_norm_1/layer_norm_bias shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_4/layer_norm_1/layer_norm_scale shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_5/conv_hidden_relu/conv1_single/bias shape (2048,) size 2048
INFO:tensorflow:Weight body/encoder/layer_5/conv_hidden_relu/conv1_single/kernel shape (1, 1, 512, 2048) size 1048576
INFO:tensorflow:Weight body/encoder/layer_5/conv_hidden_relu/conv2_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_5/conv_hidden_relu/conv2_single/kernel shape (1, 1, 2048, 512) size 1048576
INFO:tensorflow:Weight body/encoder/layer_5/encoder_self_attention/output_transform_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_5/encoder_self_attention/output_transform_single/kernel shape (1, 1, 512, 512) size 262144
INFO:tensorflow:Weight body/encoder/layer_5/encoder_self_attention/qkv_transform_single/bias shape (1536,) size 1536
INFO:tensorflow:Weight body/encoder/layer_5/encoder_self_attention/qkv_transform_single/kernel shape (1, 1, 512, 1536) size 786432
INFO:tensorflow:Weight body/encoder/layer_5/layer_norm/layer_norm_bias shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_5/layer_norm/layer_norm_scale shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_5/layer_norm_1/layer_norm_bias shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_5/layer_norm_1/layer_norm_scale shape (512,) size 512
INFO:tensorflow:Weight body/target_space_embedding/kernel shape (32, 512) size 16384
INFO:tensorflow:Weight symbol_modality_31236_512/shared/weights_0 shape (1953, 512) size 999936
INFO:tensorflow:Weight symbol_modality_31236_512/shared/weights_10 shape (1952, 512) size 999424
INFO:tensorflow:Weight symbol_modality_31236_512/shared/weights_11 shape (1952, 512) size 999424
INFO:tensorflow:Weight symbol_modality_31236_512/shared/weights_12 shape (1952, 512) size 999424
INFO:tensorflow:Weight symbol_modality_31236_512/shared/weights_13 shape (1952, 512) size 999424
INFO:tensorflow:Weight symbol_modality_31236_512/shared/weights_14 shape (1952, 512) size 999424
INFO:tensorflow:Weight symbol_modality_31236_512/shared/weights_15 shape (1952, 512) size 999424
INFO:tensorflow:Weight symbol_modality_31236_512/shared/weights_1 shape (1953, 512) size 999936
INFO:tensorflow:Weight symbol_modality_31236_512/shared/weights_2 shape (1953, 512) size 999936
INFO:tensorflow:Weight symbol_modality_31236_512/shared/weights_3 shape (1953, 512) size 999936
INFO:tensorflow:Weight symbol_modality_31236_512/shared/weights_4 shape (1952, 512) size 999424
INFO:tensorflow:Weight symbol_modality_31236_512/shared/weights_5 shape (1952, 512) size 999424
INFO:tensorflow:Weight symbol_modality_31236_512/shared/weights_6 shape (1952, 512) size 999424
INFO:tensorflow:Weight symbol_modality_31236_512/shared/weights_7 shape (1952, 512) size 999424
INFO:tensorflow:Weight symbol_modality_31236_512/shared/weights_8 shape (1952, 512) size 999424
INFO:tensorflow:Weight symbol_modality_31236_512/shared/weights_9 shape (1952, 512) size 999424
INFO:tensorflow:Total trainable variables size: 60147712
INFO:tensorflow:Total embedding variables size: 16384
INFO:tensorflow:Total non-embedding variables size: 60131328
INFO:tensorflow:Computing gradients for global model_fn.
INFO:tensorflow:Global model_fn finished.
INFO:tensorflow:Create CheckpointSaverHook.
2017-06-27 04:34:58.910748: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-06-27 04:34:58.910798: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-06-27 04:34:58.910821: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
INFO:tensorflow:Saving checkpoints for 1 into /root/t2t_train/wmt_ende_tokens_32k/transformer-transformer_base/model.ckpt.
INFO:tensorflow:loss = 8.79561, step = 1
Traceback (most recent call last):
File "/usr/local/bin/t2t-trainer", line 83, in
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "/usr/local/bin/t2t-trainer", line 79, in main
schedule=FLAGS.schedule)
File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/trainer_utils.py", line 240, in run
run_locally(exp_fn(output_dir))
File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/trainer_utils.py", line 531, in run_locally
exp.train()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/experiment.py", line 275, in train
hooks=self._train_monitors + extra_hooks)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/experiment.py", line 665, in _call_train
monitors=hooks)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/deprecation.py", line 289, in new_func
return func(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 455, in fit
loss = self._train_model(input_fn=input_fn, hooks=hooks)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 1007, in _train_model
_, loss = mon_sess.run([model_fn_ops.train_op, model_fn_ops.loss])
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 505, in run
run_metadata=run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 842, in run
run_metadata=run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 798, in run
return self._sess.run(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 952, in run
run_metadata=run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 798, in run
return self._sess.run(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 789, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 997, in _run
feed_dict_string, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1132, in _do_run
target_list, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1152, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[16,1,0] = -1 is not in [0, 31236)
[[Node: symbol_modality_31236_512_1/parallel_0/symbol_modality_31236_512/shared/Gather = Gather[Tindices=DT_INT32, Tparams=DT_FLOAT, validate_indices=true, _device="/job:localhost/replica:0/task:0/cpu:0"](symbol_modality_31236_512_1/parallel_0/symbol_modality_31236_512/shared/ConvertGradientToTensor_cc661786, symbol_modality_31236_512_1/parallel_0/symbol_modality_31236_512/shared/Squeeze)]]
Caused by op u'symbol_modality_31236_512_1/parallel_0/symbol_modality_31236_512/shared/Gather', defined at:
File "/usr/local/bin/t2t-trainer", line 83, in
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "/usr/local/bin/t2t-trainer", line 79, in main
schedule=FLAGS.schedule)
File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/trainer_utils.py", line 240, in run
run_locally(exp_fn(output_dir))
File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/trainer_utils.py", line 531, in run_locally
exp.train()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/experiment.py", line 275, in train
hooks=self._train_monitors + extra_hooks)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/experiment.py", line 665, in _call_train
monitors=hooks)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/deprecation.py", line 289, in new_func
return func(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 455, in fit
loss = self._train_model(input_fn=input_fn, hooks=hooks)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 955, in _train_model
model_fn_ops = self._get_train_ops(features, labels)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 1162, in _get_train_ops
return self._call_model_fn(features, labels, model_fn_lib.ModeKeys.TRAIN)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 1133, in _call_model_fn
model_fn_results = self._model_fn(features, labels, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/trainer_utils.py", line 423, in model_fn
len(hparams.problems) - 1)
File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/trainer_utils.py", line 748, in _cond_on_index
return fn(cur_idx)
File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/trainer_utils.py", line 405, in nth_model
features, train, skip=(skipping_is_on and skip_this_one))
File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/t2t_model.py", line 387, in model_fn
sharded_features["targets"], dp)
File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/modality.py", line 115, in targets_bottom_sharded
return data_parallelism(self.targets_bottom, xs)
File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/expert_utils.py", line 294, in call
outputs.append(fns[i](*my_args[i], **my_kwargs[i]))
File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/models/modalities.py", line 94, in targets_bottom
return self.bottom_simple(x, "shared", reuse=True)
File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/models/modalities.py", line 80, in bottom_simple
ret = tf.gather(var, x)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_array_ops.py", line 1179, in gather
validate_indices=validate_indices, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2506, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1269, in init
self._traceback = _extract_stack()
InvalidArgumentError (see above for traceback): indices[16,1,0] = -1 is not in [0, 31236)
[[Node: symbol_modality_31236_512_1/parallel_0/symbol_modality_31236_512/shared/Gather = Gather[Tindices=DT_INT32, Tparams=DT_FLOAT, validate_indices=true, _device="/job:localhost/replica:0/task:0/cpu:0"](symbol_modality_31236_512_1/parallel_0/symbol_modality_31236_512/shared/ConvertGradientToTensor_cc661786, symbol_modality_31236_512_1/parallel_0/symbol_modality_31236_512/shared/Squeeze)]]
Hi,
I run the basic wmt_ende_tokens_32k problem as showed in the Walkthrough in README.md, I use 8 gpus and the loss figure is as follows:
But when I run decoding with decode_from_file, the decoding file is the standard WMT2014 en-de test set(newstest2014.en), but the decoding output means nothing, just show irrelevant decoding sentences as follows:
The input and output can not match, and the BLEU score is 0.
When I run eval script following the answer provided by @lukaszkaiser in #36, the bleu score is nearly 0.006.
So what's the problems? Thanks!
Hi all,
I'm wondering whether tensor2tensor
support multi-GPU decoding for now? (wmt translation task)
I'm saying this because when I tried to use multiple GPU cards to decode a data (translation task), the following exception raised, while no exception in a single GPU decoding scenario.
I'm putting the decoding script and full exception trace here. Thank you.
t2t-trainer --data_dir=/tensor2tensor/t2t_data --problems=wmt_ende_tokens_32k \
--model=transformer --hparams_set=transformer_base --worker_gpu=3 \
--output_dir=/tensor2tensor/exp/8cards/wmt_ende_tokens_32k/transformer-transformer_base \
--train_steps=0 --eval_steps=0 --decode_beam_size=4 --decode_alpha=0.6 \
--decode_use_last_position_only --decode_batch_size=128 \
--decode_from_file=/tensor2tensor/t2t_data/validate.en
INFO:tensorflow:Restoring parameters from /search/odin/public/experiments/tensor2tensor/exp/8cards/wmt_ende_tokens_32k/transformer-transformer_base/model.ckpt-56426
2017-06-23 11:34:20.978020: W tensorflow/core/framework/op_kernel.cc:1158] Invalid argument: Number of ways to split should evenly divide the split dimension, but got split_dim 0 (size = 128) and num_split 3
[[Node: while/split = Split[T=DT_INT32, num_split=3, _device="/job:localhost/replica:0/task:0/cpu:0"](while/split/split_dim, while/split/Enter)]]
2017-06-23 11:34:20.978210: W tensorflow/core/framework/op_kernel.cc:1158] Invalid argument: Number of ways to split should evenly divide the split dimension, but got split_dim 0 (size = 128) and num_split 3
[[Node: while/split = Split[T=DT_INT32, num_split=3, _device="/job:localhost/replica:0/task:0/cpu:0"](while/split/split_dim, while/split/Enter)]]
Traceback (most recent call last):
File "/search/odin/public/anaconda2/bin/t2t-trainer", line 4, in <module>
__import__('pkg_resources').run_script('tensor2tensor==1.0.4', 't2t-trainer')
File "/search/odin/public/anaconda2/lib/python2.7/site-packages/pkg_resources/__init__.py", line 739, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/search/odin/public/anaconda2/lib/python2.7/site-packages/pkg_resources/__init__.py", line 1507, in run_script
exec(script_code, namespace, namespace)
File "/search/odin/public/anaconda2/lib/python2.7/site-packages/tensor2tensor-1.0.4-py2.7.egg/EGG-INFO/scripts/t2t-trainer", line 55, in <module>
File "/search/odin/public/anaconda2/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "/search/odin/public/anaconda2/lib/python2.7/site-packages/tensor2tensor-1.0.4-py2.7.egg/EGG-INFO/scripts/t2t-trainer", line 51, in main
File "build/bdist.linux-x86_64/egg/tensor2tensor/utils/trainer_utils.py", line 240, in run
File "build/bdist.linux-x86_64/egg/tensor2tensor/utils/trainer_utils.py", line 543, in run_locally
File "build/bdist.linux-x86_64/egg/tensor2tensor/utils/trainer_utils.py", line 646, in decode_from_file
File "/search/odin/public/anaconda2/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 902, in _predict_generator
preds = mon_sess.run(predictions, feed_fn() if feed_fn else None)
File "/search/odin/public/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 505, in run
run_metadata=run_metadata)
File "/search/odin/public/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 842, in run
run_metadata=run_metadata)
File "/search/odin/public/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 798, in run
return self._sess.run(*args, **kwargs)
File "/search/odin/public/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 952, in run
run_metadata=run_metadata)
File "/search/odin/public/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 798, in run
return self._sess.run(*args, **kwargs)
File "/search/odin/public/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 789, in run
run_metadata_ptr)
File "/search/odin/public/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 997, in _run
feed_dict_string, options, run_metadata)
File "/search/odin/public/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1132, in _do_run
target_list, options, run_metadata)
File "/search/odin/public/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1152, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Number of ways to split should evenly divide the split dimension, but got split_dim 0 (size = 128) and num_split 3
[[Node: while/split = Split[T=DT_INT32, num_split=3, _device="/job:localhost/replica:0/task:0/cpu:0"](while/split/split_dim, while/split/Enter)]]
[[Node: while/GatherNd/_1405 = _HostRecv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_4080_while/GatherNd", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/gpu:0"](^_cloopwhile/parallel_0/Identity/_1292)]]
Caused by op u'while/split', defined at:
File "/search/odin/public/anaconda2/bin/t2t-trainer", line 4, in <module>
__import__('pkg_resources').run_script('tensor2tensor==1.0.4', 't2t-trainer')
File "/search/odin/public/anaconda2/lib/python2.7/site-packages/pkg_resources/__init__.py", line 739, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/search/odin/public/anaconda2/lib/python2.7/site-packages/pkg_resources/__init__.py", line 1507, in run_script
exec(script_code, namespace, namespace)
File "/search/odin/public/anaconda2/lib/python2.7/site-packages/tensor2tensor-1.0.4-py2.7.egg/EGG-INFO/scripts/t2t-trainer", line 55, in <module>
File "/search/odin/public/anaconda2/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "/search/odin/public/anaconda2/lib/python2.7/site-packages/tensor2tensor-1.0.4-py2.7.egg/EGG-INFO/scripts/t2t-trainer", line 51, in main
File "build/bdist.linux-x86_64/egg/tensor2tensor/utils/trainer_utils.py", line 240, in run
run_locally(exp_fn(output_dir))
File "build/bdist.linux-x86_64/egg/tensor2tensor/utils/trainer_utils.py", line 543, in run_locally
decode_from_file(estimator, FLAGS.decode_from_file)
File "build/bdist.linux-x86_64/egg/tensor2tensor/utils/trainer_utils.py", line 645, in decode_from_file
result_iter = estimator.predict(input_fn=input_fn.next, as_iterable=True)
File "/search/odin/public/anaconda2/lib/python2.7/site-packages/tensorflow/python/util/deprecation.py", line 289, in new_func
return func(*args, **kwargs)
File "/search/odin/public/anaconda2/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 590, in predict
as_iterable=as_iterable)
File "/search/odin/public/anaconda2/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 884, in _infer_model
infer_ops = self._get_predict_ops(features)
File "/search/odin/public/anaconda2/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 1218, in _get_predict_ops
return self._call_model_fn(features, labels, model_fn_lib.ModeKeys.INFER)
File "/search/odin/public/anaconda2/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 1133, in _call_model_fn
model_fn_results = self._model_fn(features, labels, **kwargs)
File "build/bdist.linux-x86_64/egg/tensor2tensor/utils/trainer_utils.py", line 423, in model_fn
len(hparams.problems) - 1)
File "build/bdist.linux-x86_64/egg/tensor2tensor/utils/trainer_utils.py", line 748, in _cond_on_index
return fn(cur_idx)
File "build/bdist.linux-x86_64/egg/tensor2tensor/utils/trainer_utils.py", line 396, in nth_model
decode_length=FLAGS.decode_extra_length)
File "build/bdist.linux-x86_64/egg/tensor2tensor/utils/t2t_model.py", line 154, in infer
last_position_only, alpha)
File "build/bdist.linux-x86_64/egg/tensor2tensor/utils/t2t_model.py", line 211, in _beam_decode
alpha)
File "build/bdist.linux-x86_64/egg/tensor2tensor/utils/beam_search.py", line 405, in beam_search
back_prop=False)
File "/search/odin/public/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2766, in while_loop
result = context.BuildLoop(cond, body, loop_vars, shape_invariants)
File "/search/odin/public/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2595, in BuildLoop
pred, body, original_loop_vars, loop_vars, shape_invariants)
File "/search/odin/public/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2545, in _BuildLoop
body_result = body(*packed_vars_for_body)
File "build/bdist.linux-x86_64/egg/tensor2tensor/utils/beam_search.py", line 336, in inner_loop
i, alive_seq, alive_log_probs)
File "build/bdist.linux-x86_64/egg/tensor2tensor/utils/beam_search.py", line 240, in grow_topk
flat_logits = symbols_to_logits_fn(flat_ids)
File "build/bdist.linux-x86_64/egg/tensor2tensor/utils/t2t_model.py", line 181, in symbols_to_logits_fn
features, False, last_position_only=last_position_only)
File "build/bdist.linux-x86_64/egg/tensor2tensor/utils/t2t_model.py", line 352, in model_fn
sharded_features = self._shard_features(features)
File "build/bdist.linux-x86_64/egg/tensor2tensor/utils/t2t_model.py", line 332, in _shard_features
0))
File "/search/odin/public/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/array_ops.py", line 1214, in split
split_dim=axis, num_split=num_or_size_splits, value=value, name=name)
File "/search/odin/public/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/gen_array_ops.py", line 3261, in _split
num_split=num_split, name=name)
File "/search/odin/public/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
op_def=op_def)
File "/search/odin/public/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2506, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/search/odin/public/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1269, in __init__
self._traceback = _extract_stack()
InvalidArgumentError (see above for traceback): Number of ways to split should evenly divide the split dimension, but got split_dim 0 (size = 128) and num_split 3
[[Node: while/split = Split[T=DT_INT32, num_split=3, _device="/job:localhost/replica:0/task:0/cpu:0"](while/split/split_dim, while/split/Enter)]]
[[Node: while/GatherNd/_1405 = _HostRecv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_4080_while/GatherNd", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/gpu:0"](^_cloopwhile/parallel_0/Identity/_1292)]]
Would a Toy DataSet(e.g. predict a reverse sentence) be useful for sanity-check as used in tf-seq2seq?
I followed the following guideline:
You can currently do so for models, hyperparameter sets, and modalities. Please do submit a pull request if your component might be useful to others.
Here's an example with a new hyperparameter set:
from tensor2tensor.models import transformer
from tensor2tensor.utils import registry
@registry.register_hparams
def transformer_my_very_own_hparams_set():
hparams = transformer.transformer_base()
hparams.hidden_size = 1024
...
import my_registrations
t2t-trainer --t2t_usr_dir=~/usr/t2t_usr --registry_help
You'll see under the registered HParams your transformer_my_very_own_hparams_set, which you can directly use on the command line with the --hparams_set flag.
do the same, but I could not find "transformer_my_very_own_hparams_set" from the result, here is the log:
Models: ['attention_lm', 'attention_lm_moe', 'baseline_lstm_seq2seq', 'byte_net', 'diagonal_neural_gpu', 'multi_model', 'neural_gpu', 'slice_net', 'transformer', 'xception']
HParams (by model):
* attention: ['attention_lm_base', 'attention_lm_moe_base', 'attention_lm_moe_large', 'attention_lm_moe_small']
* basic: ['basic_1']
* bytenet: ['bytenet_base']
* multimodel: ['multimodel_1p8']
* neuralgpu: ['neuralgpu_1']
* slicenet: ['slicenet_1', 'slicenet_1noam', 'slicenet_1tiny']
* transformer: ['transformer_base', 'transformer_base_single_gpu', 'transformer_big', 'transformer_big_dr1', 'transformer_big_dr2', 'transformer_big_enfr', 'transformer_big_single_gpu', 'transformer_dr0', 'transformer_dr2', 'transformer_ff1024', 'transformer_ff4096', 'transformer_h1', 'transformer_h16', 'transformer_h32', 'transformer_h4', 'transformer_hs1024', 'transformer_hs256', 'transformer_k128', 'transformer_k256', 'transformer_l2', 'transformer_l4', 'transformer_l8', 'transformer_ls0', 'transformer_ls2', 'transformer_parsing_base', 'transformer_parsing_big', 'transformer_tiny']
* xception: ['xception_base']
RangedHParams: ['basic1', 'slicenet1', 'transformer_big_single_gpu']
Modalities: ['audio:audio_spectral_modality', 'audio:default', 'audio:identity', 'class_label:class_label_2d', 'class_label:default', 'class_label:identity', 'generic:default', 'image:default', 'image:identity', 'image:small_image_modality', 'symbol:default', 'symbol:identity']
[@nmyjs_160_20 t2t_usr]# ls
init.py my_registrations.py
is there anyone can help me ?
For current problems, we have NMT problems and Image Classification problems.
But it's lack of the LSTM-CNN's classification problems.
I've implemented a toy data set which will generate random sequences to represent people names.
Some are boys' names while others are girls.
If all names in the sequence are boys' names, it's classified as 1.
If all names in the sequence are girls' names, it's classified as 2.
Otherwise, it's classified as 3.
Now I'm training my toy dataset using the default Transformer model.
Do you guys think it's good to pull this problem to the master?
Hi,
I read the paper Attention is all you need
. The results of wmt tasks are really exciting.
But I found that there's no detailed explanation about what exact metrics was used in wmt translation task in the paper.
What I really mean by detailed explanation:
a tiny mis-spelling here
deocding
-> decoding
Thank you so much
when run Walkthrough
t2t-trainer
--data_dir=$DATA_DIR
--problems=$PROBLEM
--model=$MODEL
--hparams_set=$HPARAMS
--output_dir=$TRAIN_DIR
--train_steps=0
--eval_steps=0
--decode_beam_size=$BEAM_SIZE
--decode_alpha=$ALPHA
--decode_from_file=$DECODE_FILE
the error infomation is as follows:
INFO:tensorflow:Creating experiment, storing model files in /root/t2t_train/wmt_ende_tokens_32k/transformer-transformer_base
INFO:tensorflow:datashard_devices: ['gpu:0']
INFO:tensorflow:caching_devices: None
Traceback (most recent call last):
File "/usr/local/bin/t2t-trainer", line 83, in
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "/usr/local/bin/t2t-trainer", line 79, in main
schedule=FLAGS.schedule)
File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/trainer_utils.py", line 240, in run
run_locally(exp_fn(output_dir))
File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/trainer_utils.py", line 126, in experiment_fn
eval_steps=eval_steps)
File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/trainer_utils.py", line 138, in create_experiment
model_name=model_name)
File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/trainer_utils.py", line 174, in create_experiment_components
keep_checkpoint_max=FLAGS.keep_checkpoint_max))
TypeError: init() got an unexpected keyword argument 'session_config'
INFO:tensorflow:Creating experiment, storing model files in /root/t2t_train/wmt_ende_tokens_32k/transformer-transformer_base
INFO:tensorflow:datashard_devices: ['gpu:0']
INFO:tensorflow:caching_devices: None
Traceback (most recent call last):
File "/usr/local/bin/t2t-trainer", line 83, in
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "/usr/local/bin/t2t-trainer", line 79, in main
schedule=FLAGS.schedule)
File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/trainer_utils.py", line 240, in run
run_locally(exp_fn(output_dir))
File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/trainer_utils.py", line 126, in experiment_fn
eval_steps=eval_steps)
File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/trainer_utils.py", line 138, in create_experiment
model_name=model_name)
File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/trainer_utils.py", line 174, in create_experiment_components
keep_checkpoint_max=FLAGS.keep_checkpoint_max))
TypeError: init() got an unexpected keyword argument 'session_config'
When running the demo (also in README: English-to-German translation model using the Transformer model from Attention Is All You Need on WMT data.), downloading the data, gives a corrupted version.
Eventually this causes the tokenizer to run into errors.
`PROBLEM=wmt_ende_tokens_32k
MODEL=transformer
HPARAMS=transformer_base
DATA_DIR=$HOME/t2t_data
TMP_DIR=/tmp/t2t_datagen
TRAIN_DIR=$HOME/t2t_train/$PROBLEM/$MODEL-$HPARAMS
mkdir -p $DATA_DIR $TMP_DIR $TRAIN_DIR
t2t-datagen
--data_dir=$DATA_DIR
--tmp_dir=$TMP_DIR
--num_shards=100 `
The output of the previous data generation commands:
INFO:tensorflow:Generating training data for wmt_ende_tokens_32k.
INFO:tensorflow:Downloading http://data.statmt.org/wmt16/translation-task/training-parallel-nc-v11.tgz to /tmp/t2t_datagen/training-parallel-nc-v11.tgz
INFO:tensorflow:Succesfully downloaded training-parallel-nc-v11.tgz, 75178032 bytes.
INFO:tensorflow:Reading file: training-parallel-nc-v11/news-commentary-v11.de-en.en
INFO:tensorflow:Reading file: training-parallel-nc-v11/news-commentary-v11.de-en.de
INFO:tensorflow:Downloading http://www.statmt.org/wmt13/training-parallel-commoncrawl.tgz to /tmp/t2t_datagen/training-parallel-commoncrawl.tgz
At this point, the download just keeps hanging eventhough the data has been downloaded succesfully (checked in /tmp/t2t_datagen) and I abort with CTRL-C. When trying again it gives the following error:
Traceback (most recent call last):
File "/usr/local/bin/t2t-datagen", line 361, in
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "/usr/local/bin/t2t-datagen", line 344, in main
training_gen(), FLAGS.problem + UNSHUFFLED_SUFFIX + "-train",
File "/usr/local/bin/t2t-datagen", line 140, in
lambda: wmt.ende_wordpiece_token_generator(FLAGS.tmp_dir, True, 2**15),
File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/data_generators/wmt.py", line 224, in ende_wordpiece_token_generator
tmp_dir, "tokens.vocab.%d" % vocab_size, vocab_size)
File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/data_generators/generator_utils.py", line 220, in get_or_generate_vocab
corpus_tar.extractall(tmp_dir)
File "/usr/lib/python2.7/tarfile.py", line 2079, in extractall
self.extract(tarinfo, path)
File "/usr/lib/python2.7/tarfile.py", line 2116, in extract
self._extract_member(tarinfo, os.path.join(path, tarinfo.name))
File "/usr/lib/python2.7/tarfile.py", line 2192, in _extract_member
self.makefile(tarinfo, targetpath)
File "/usr/lib/python2.7/tarfile.py", line 2233, in makefile
copyfileobj(source, target)
File "/usr/lib/python2.7/tarfile.py", line 266, in copyfileobj
shutil.copyfileobj(src, dst)
File "/usr/lib/python2.7/shutil.py", line 49, in copyfileobj
buf = fsrc.read(length)
File "/usr/lib/python2.7/tarfile.py", line 831, in read
buf += self.fileobj.read(size - len(buf))
File "/usr/lib/python2.7/tarfile.py", line 743, in read
return self.readnormal(size)
File "/usr/lib/python2.7/tarfile.py", line 758, in readnormal
return self.__read(size)
File "/usr/lib/python2.7/tarfile.py", line 748, in __read
buf = self.fileobj.read(size)
File "/usr/lib/python2.7/gzip.py", line 268, in read
self._read(readsize)
File "/usr/lib/python2.7/gzip.py", line 315, in _read
self._read_eof()
File "/usr/lib/python2.7/gzip.py", line 354, in _read_eof
hex(self.crc)))
IOError: CRC check failed 0x75d9e49c != 0xd122220fL
One strategy might be to manually download the final tar.gz from http://www.statmt.org/wmt13/training-parallel-commoncrawl.tgz and unpack it in /tmp/t2t_data. When trying, download is extremely slow, approx. 2 hours for 876MB...
Results of manual download:
Working:
INFO:tensorflow:Not downloading, file already found: /tmp/t2t_datagen/training-parallel-commoncrawl.tgz
INFO:tensorflow:Reading file: commoncrawl.de-en.en
INFO:tensorflow:Reading file: commoncrawl.de-en.de
INFO:tensorflow:Reading file: commoncrawl.fr-en.en
INFO:tensorflow:Reading file: commoncrawl.fr-en.fr
Next in line for (hopefully not too slow) download:
INFO:tensorflow:Downloading
http://www.statmt.org/wmt13/training-parallel-europarl-v7.tgz to /tmp/t2t_datagen/training-parallel-europarl-v7.tgz
The above as an FYI or possible issue to be resolved.
I got quite low performance compared to the paper.
So, i did some research, and I found the sizes of wmt_ende_tokens_32k-{dev, train}* are too small as follows.
444K wmt_ende_tokens_32k-dev-00000-of-00001
730M wmt_ende_tokens_32k-train-00000-of-00001
I ran t2t_datagen again, then i got the following sizes. (with 100 split option)
820K wmt_ende_tokens_32k-dev-00000-of-00001
14M wmt_ende_tokens_32k-train-00000-of-00100
....
(total 1400M)
what is the proper size of wmt_ende_tokens_32k-* file?
When called from t2t-datagen --problem=algorithmic_calculus_integrate ...
, generate_calculus_integrate_sample()
in tensor2tensor/data_generators/algorithmic_math.py
raises exception, or causes subsequent KeyError
exception in int_encoder()
.
The reason is an attempt to integrate expressions like "(b-d)/(a-a)"
w.r.t. "b"
- it leads to sympy.polys.polyerrors.PolynomialDivisionFailed
or builds expresions with ComplexInfinity
(aka zoo
), confusing int_encoder()
.
The straightforward fix would be to put retry loop into calculus_integrate()
. Alternatively, random_expr_with_required_var()
could be refined.
Here is my run script:
[g@pc:/home/g/Desktop/tensor2tensor/reverse]$ cat run.sh
PROBLEM=algorithmic_reverse_decimal40
MODEL=baseline_lstm_seq2seq
HPARAMS=basic1
DATA_DIR=./t2t_data
TMP_DIR=./t2t_datagen
TRAIN_DIR=./t2t_train/$PROBLEM/$MODEL-$HPARAMS
mkdir -p $DATA_DIR $TMP_DIR $TRAIN_DIR
# Generate data
t2t-datagen \
--data_dir=$DATA_DIR \
--tmp_dir=$TMP_DIR \
--problem=$PROBLEM
# mv $TMP_DIR/tokens.vocab.32768 $DATA_DIR
# Train
t2t-trainer \
--data_dir=$DATA_DIR \
--problems=$PROBLEM \
--model=$MODEL \
--hparams_set=$HPARAMS \
--output_dir=$TRAIN_DIR
# Decode
DECODE_FILE=$DATA_DIR/decode_this.txt
echo "Hello world" >> $DECODE_FILE
echo "Goodbye world" >> $DECODE_FILE
BEAM_SIZE=4
ALPHA=0.6
t2t-trainer \
--data_dir=$DATA_DIR \
--problems=$PROBLEM \
--model=$MODEL \
--hparams_set=$HPARAMS \
--output_dir=$TRAIN_DIR \
--train_steps=0 \
--eval_steps=0 \
--beam_size=$BEAM_SIZE \
--alpha=$ALPHA \
--decode_from_file=$DECODE_FILE
cat $DECODE_FILE.$MODEL.$HPARAMS.beam$BEAM_SIZE.alpha$ALPHA.decodes
Output:
[g@pc:/home/g/Desktop/tensor2tensor/reverse]$ bash run.sh
INFO:tensorflow:Generating training data for algorithmic_reverse_decimal40.
INFO:tensorflow:Generating case 0 for algorithmic_reverse_decimal40-unshuffled-train.
INFO:tensorflow:Generating development data for algorithmic_reverse_decimal40.
INFO:tensorflow:Generating case 0 for algorithmic_reverse_decimal40-unshuffled-dev.
INFO:tensorflow:Shuffling data...
INFO:tensorflow:read: 10000
INFO:tensorflow:read: 20000
INFO:tensorflow:read: 30000
INFO:tensorflow:read: 40000
INFO:tensorflow:read: 50000
INFO:tensorflow:read: 60000
INFO:tensorflow:read: 70000
INFO:tensorflow:read: 80000
INFO:tensorflow:read: 90000
INFO:tensorflow:read: 100000
INFO:tensorflow:write: 0
INFO:tensorflow:write: 10000
INFO:tensorflow:write: 20000
INFO:tensorflow:write: 30000
INFO:tensorflow:write: 40000
INFO:tensorflow:write: 50000
INFO:tensorflow:write: 60000
INFO:tensorflow:write: 70000
INFO:tensorflow:write: 80000
INFO:tensorflow:write: 90000
INFO:tensorflow:read: 10000
INFO:tensorflow:write: 0
INFO:tensorflow:Registry contents:
Models: ['multi_model', 'baseline_lstm_seq2seq', 'slice_net', 'diagonal_neural_gpu', 'byte_net', 'transformer', 'attention_lm', 'neural_gpu', 'xception']
HParams: ['transformer_h32', 'transformer_big_dr2', 'transformer_big_dr3', 'transformer_big_dr1', 'slicenet1', 'transformer_tiny', 'xception_base', 'transformer_dr2', 'transformer_parsing_base_dr6', 'basic1', 'transformer_k256', 'transformer_h16', 'transformer_ff1024', 'transformer_k128', 'slicenet1tiny', 'transformer_big_enfr', 'multimodel1p8', 'transformer_dr0', 'transformer_base', 'transformer_l8', 'transformer_parsing_big', 'transformer_hs1024', 'slicenet1noam', 'transformer_big_single_gpu', 'attention_lm_base', 'transformer_ff4096', 'transformer_single_gpu', 'transformer_ls2', 'transformer_ls0', 'transformer_hs256', 'neural_gpu1', 'transformer_h1', 'transformer_h4', 'transformer_l4', 'transformer_l2', 'bytenet_base']
RangedHParams: ['transformer_big_single_gpu', 'basic1', 'slicenet1']
INFO:tensorflow:Using config: {'_save_checkpoints_secs': 600, '_num_ps_replicas': 0, '_keep_checkpoint_max': 20, '_task_type': None, '_is_chief': True, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f670e058c10>, '_model_dir': './t2t_train/algorithmic_reverse_decimal40/baseline_lstm_seq2seq-basic1', '_save_checkpoints_steps': None, '_keep_checkpoint_every_n_hours': 10000, '_session_config': allow_soft_placement: true
graph_options {
optimizer_options {
}
}
, '_tf_random_seed': None, '_environment': 'local', '_num_worker_replicas': 0, '_task_id': 0, '_save_summary_steps': 100, '_tf_config': gpu_options {
per_process_gpu_memory_fraction: 1.0
}
, '_evaluation_master': '', '_master': ''}
INFO:tensorflow:datashard_devices: ['gpu:0']
INFO:tensorflow:caching_devices: None
INFO:tensorflow:Performing local training.
INFO:tensorflow:datashard_devices: ['gpu:0']
INFO:tensorflow:caching_devices: None
INFO:tensorflow:Doing model_fn_body took 0.418 sec.
INFO:tensorflow:This model_fn took 0.649 sec.
INFO:tensorflow:Weight body/lstm_seq2seq/decoder/rnn/multi_rnn_cell/cell_0/basic_lstm_cell/bias shape (256,) size 256
INFO:tensorflow:Weight body/lstm_seq2seq/decoder/rnn/multi_rnn_cell/cell_0/basic_lstm_cell/kernel shape (128, 256) size 32768
INFO:tensorflow:Weight body/lstm_seq2seq/decoder/rnn/multi_rnn_cell/cell_1/basic_lstm_cell/bias shape (256,) size 256
INFO:tensorflow:Weight body/lstm_seq2seq/decoder/rnn/multi_rnn_cell/cell_1/basic_lstm_cell/kernel shape (128, 256) size 32768
INFO:tensorflow:Weight body/lstm_seq2seq/decoder/rnn/multi_rnn_cell/cell_2/basic_lstm_cell/bias shape (256,) size 256
INFO:tensorflow:Weight body/lstm_seq2seq/decoder/rnn/multi_rnn_cell/cell_2/basic_lstm_cell/kernel shape (128, 256) size 32768
INFO:tensorflow:Weight body/lstm_seq2seq/decoder/rnn/multi_rnn_cell/cell_3/basic_lstm_cell/bias shape (256,) size 256
INFO:tensorflow:Weight body/lstm_seq2seq/decoder/rnn/multi_rnn_cell/cell_3/basic_lstm_cell/kernel shape (128, 256) size 32768
INFO:tensorflow:Weight body/lstm_seq2seq/encoder/rnn/multi_rnn_cell/cell_0/basic_lstm_cell/bias shape (256,) size 256
INFO:tensorflow:Weight body/lstm_seq2seq/encoder/rnn/multi_rnn_cell/cell_0/basic_lstm_cell/kernel shape (128, 256) size 32768
INFO:tensorflow:Weight body/lstm_seq2seq/encoder/rnn/multi_rnn_cell/cell_1/basic_lstm_cell/bias shape (256,) size 256
INFO:tensorflow:Weight body/lstm_seq2seq/encoder/rnn/multi_rnn_cell/cell_1/basic_lstm_cell/kernel shape (128, 256) size 32768
INFO:tensorflow:Weight body/lstm_seq2seq/encoder/rnn/multi_rnn_cell/cell_2/basic_lstm_cell/bias shape (256,) size 256
INFO:tensorflow:Weight body/lstm_seq2seq/encoder/rnn/multi_rnn_cell/cell_2/basic_lstm_cell/kernel shape (128, 256) size 32768
INFO:tensorflow:Weight body/lstm_seq2seq/encoder/rnn/multi_rnn_cell/cell_3/basic_lstm_cell/bias shape (256,) size 256
INFO:tensorflow:Weight body/lstm_seq2seq/encoder/rnn/multi_rnn_cell/cell_3/basic_lstm_cell/kernel shape (128, 256) size 32768
INFO:tensorflow:Weight symbol_modality_11_64/input_emb/weights_0 shape (1, 64) size 64
INFO:tensorflow:Weight symbol_modality_11_64/input_emb/weights_10 shape (1, 64) size 64
INFO:tensorflow:Weight symbol_modality_11_64/input_emb/weights_11 shape (0, 64) size 0
INFO:tensorflow:Weight symbol_modality_11_64/input_emb/weights_12 shape (0, 64) size 0
INFO:tensorflow:Weight symbol_modality_11_64/input_emb/weights_13 shape (0, 64) size 0
INFO:tensorflow:Weight symbol_modality_11_64/input_emb/weights_14 shape (0, 64) size 0
INFO:tensorflow:Weight symbol_modality_11_64/input_emb/weights_15 shape (0, 64) size 0
INFO:tensorflow:Weight symbol_modality_11_64/input_emb/weights_1 shape (1, 64) size 64
INFO:tensorflow:Weight symbol_modality_11_64/input_emb/weights_2 shape (1, 64) size 64
INFO:tensorflow:Weight symbol_modality_11_64/input_emb/weights_3 shape (1, 64) size 64
INFO:tensorflow:Weight symbol_modality_11_64/input_emb/weights_4 shape (1, 64) size 64
INFO:tensorflow:Weight symbol_modality_11_64/input_emb/weights_5 shape (1, 64) size 64
INFO:tensorflow:Weight symbol_modality_11_64/input_emb/weights_6 shape (1, 64) size 64
INFO:tensorflow:Weight symbol_modality_11_64/input_emb/weights_7 shape (1, 64) size 64
INFO:tensorflow:Weight symbol_modality_11_64/input_emb/weights_8 shape (1, 64) size 64
INFO:tensorflow:Weight symbol_modality_11_64/input_emb/weights_9 shape (1, 64) size 64
INFO:tensorflow:Weight symbol_modality_11_64/softmax/weights_0 shape (1, 64) size 64
INFO:tensorflow:Weight symbol_modality_11_64/softmax/weights_10 shape (1, 64) size 64
INFO:tensorflow:Weight symbol_modality_11_64/softmax/weights_11 shape (0, 64) size 0
INFO:tensorflow:Weight symbol_modality_11_64/softmax/weights_12 shape (0, 64) size 0
INFO:tensorflow:Weight symbol_modality_11_64/softmax/weights_13 shape (0, 64) size 0
INFO:tensorflow:Weight symbol_modality_11_64/softmax/weights_14 shape (0, 64) size 0
INFO:tensorflow:Weight symbol_modality_11_64/softmax/weights_15 shape (0, 64) size 0
INFO:tensorflow:Weight symbol_modality_11_64/softmax/weights_1 shape (1, 64) size 64
INFO:tensorflow:Weight symbol_modality_11_64/softmax/weights_2 shape (1, 64) size 64
INFO:tensorflow:Weight symbol_modality_11_64/softmax/weights_3 shape (1, 64) size 64
INFO:tensorflow:Weight symbol_modality_11_64/softmax/weights_4 shape (1, 64) size 64
INFO:tensorflow:Weight symbol_modality_11_64/softmax/weights_5 shape (1, 64) size 64
INFO:tensorflow:Weight symbol_modality_11_64/softmax/weights_6 shape (1, 64) size 64
INFO:tensorflow:Weight symbol_modality_11_64/softmax/weights_7 shape (1, 64) size 64
INFO:tensorflow:Weight symbol_modality_11_64/softmax/weights_8 shape (1, 64) size 64
INFO:tensorflow:Weight symbol_modality_11_64/softmax/weights_9 shape (1, 64) size 64
INFO:tensorflow:Weight symbol_modality_11_64/target_emb/weights_0 shape (1, 64) size 64
INFO:tensorflow:Weight symbol_modality_11_64/target_emb/weights_10 shape (1, 64) size 64
INFO:tensorflow:Weight symbol_modality_11_64/target_emb/weights_11 shape (0, 64) size 0
INFO:tensorflow:Weight symbol_modality_11_64/target_emb/weights_12 shape (0, 64) size 0
INFO:tensorflow:Weight symbol_modality_11_64/target_emb/weights_13 shape (0, 64) size 0
INFO:tensorflow:Weight symbol_modality_11_64/target_emb/weights_14 shape (0, 64) size 0
INFO:tensorflow:Weight symbol_modality_11_64/target_emb/weights_15 shape (0, 64) size 0
INFO:tensorflow:Weight symbol_modality_11_64/target_emb/weights_1 shape (1, 64) size 64
INFO:tensorflow:Weight symbol_modality_11_64/target_emb/weights_2 shape (1, 64) size 64
INFO:tensorflow:Weight symbol_modality_11_64/target_emb/weights_3 shape (1, 64) size 64
INFO:tensorflow:Weight symbol_modality_11_64/target_emb/weights_4 shape (1, 64) size 64
INFO:tensorflow:Weight symbol_modality_11_64/target_emb/weights_5 shape (1, 64) size 64
INFO:tensorflow:Weight symbol_modality_11_64/target_emb/weights_6 shape (1, 64) size 64
INFO:tensorflow:Weight symbol_modality_11_64/target_emb/weights_7 shape (1, 64) size 64
INFO:tensorflow:Weight symbol_modality_11_64/target_emb/weights_8 shape (1, 64) size 64
INFO:tensorflow:Weight symbol_modality_11_64/target_emb/weights_9 shape (1, 64) size 64
INFO:tensorflow:Total trainable variables size: 266304
INFO:tensorflow:Total embedding variables size: 0
INFO:tensorflow:Total non-embedding variables size: 266304
INFO:tensorflow:Computing gradients for global model_fn.
INFO:tensorflow:Global model_fn finished.
INFO:tensorflow:Create CheckpointSaverHook.
2017-06-18 14:00:13.936233: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-06-18 14:00:13.936254: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-06-18 14:00:13.936261: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-06-18 14:00:13.936270: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-06-18 14:00:13.936276: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2017-06-18 14:00:14.068278: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:893] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2017-06-18 14:00:14.068727: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 0 with properties:
name: GeForce GTX 1080 Ti
major: 6 minor: 1 memoryClockRate (GHz) 1.582
pciBusID 0000:01:00.0
Total memory: 10.91GiB
Free memory: 10.57GiB
2017-06-18 14:00:14.068740: I tensorflow/core/common_runtime/gpu/gpu_device.cc:961] DMA: 0
2017-06-18 14:00:14.068744: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0: Y
2017-06-18 14:00:14.068749: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0)
2017-06-18 14:00:17.068205: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 4488 get requests, put_count=3034 evicted_count=1000 eviction_rate=0.329598 and unsatisfied allocation rate=0.569073
2017-06-18 14:00:17.068238: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:259] Raising pool_size_limit_ from 100 to 110
INFO:tensorflow:Saving checkpoints for 1 into ./t2t_train/algorithmic_reverse_decimal40/baseline_lstm_seq2seq-basic1/model.ckpt.
INFO:tensorflow:loss = inf, step = 1
ERROR:tensorflow:Model diverged with loss = NaN.
Traceback (most recent call last):
File "/usr/local/bin/t2t-trainer", line 4, in <module>
__import__('pkg_resources').run_script('tensor2tensor==1.0.2', 't2t-trainer')
File "/usr/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 719, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/usr/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 1511, in run_script
exec(script_code, namespace, namespace)
File "/usr/local/lib/python2.7/dist-packages/tensor2tensor-1.0.2-py2.7.egg/EGG-INFO/scripts/t2t-trainer", line 55, in <module>
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "/usr/local/lib/python2.7/dist-packages/tensor2tensor-1.0.2-py2.7.egg/EGG-INFO/scripts/t2t-trainer", line 51, in main
File "build/bdist.linux-x86_64/egg/tensor2tensor/utils/trainer_utils.py", line 234, in run
File "build/bdist.linux-x86_64/egg/tensor2tensor/utils/trainer_utils.py", line 562, in run_locally
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/deprecation.py", line 289, in new_func
return func(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 455, in fit
loss = self._train_model(input_fn=input_fn, hooks=hooks)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 1007, in _train_model
_, loss = mon_sess.run([model_fn_ops.train_op, model_fn_ops.loss])
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 505, in run
run_metadata=run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 842, in run
run_metadata=run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 798, in run
return self._sess.run(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 960, in run
run_metadata=run_metadata))
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/basic_session_run_hooks.py", line 477, in after_run
raise NanLossDuringTrainingError
tensorflow.python.training.basic_session_run_hooks.NanLossDuringTrainingError: NaN loss during training.
INFO:tensorflow:Registry contents:
Models: ['multi_model', 'baseline_lstm_seq2seq', 'slice_net', 'diagonal_neural_gpu', 'byte_net', 'transformer', 'attention_lm', 'neural_gpu', 'xception']
HParams: ['transformer_h32', 'transformer_big_dr2', 'transformer_big_dr3', 'transformer_big_dr1', 'slicenet1', 'transformer_tiny', 'xception_base', 'transformer_dr2', 'transformer_parsing_base_dr6', 'basic1', 'transformer_k256', 'transformer_h16', 'transformer_ff1024', 'transformer_k128', 'slicenet1tiny', 'transformer_big_enfr', 'multimodel1p8', 'transformer_dr0', 'transformer_base', 'transformer_l8', 'transformer_parsing_big', 'transformer_hs1024', 'slicenet1noam', 'transformer_big_single_gpu', 'attention_lm_base', 'transformer_ff4096', 'transformer_single_gpu', 'transformer_ls2', 'transformer_ls0', 'transformer_hs256', 'neural_gpu1', 'transformer_h1', 'transformer_h4', 'transformer_l4', 'transformer_l2', 'bytenet_base']
RangedHParams: ['transformer_big_single_gpu', 'basic1', 'slicenet1']
INFO:tensorflow:Using config: {'_save_checkpoints_secs': 600, '_num_ps_replicas': 0, '_keep_checkpoint_max': 20, '_task_type': None, '_is_chief': True, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f35ad359c10>, '_model_dir': './t2t_train/algorithmic_reverse_decimal40/baseline_lstm_seq2seq-basic1', '_save_checkpoints_steps': None, '_keep_checkpoint_every_n_hours': 10000, '_session_config': allow_soft_placement: true
graph_options {
optimizer_options {
}
}
, '_tf_random_seed': None, '_environment': 'local', '_num_worker_replicas': 0, '_task_id': 0, '_save_summary_steps': 100, '_tf_config': gpu_options {
per_process_gpu_memory_fraction: 1.0
}
, '_evaluation_master': '', '_master': ''}
INFO:tensorflow:datashard_devices: ['gpu:0']
INFO:tensorflow:caching_devices: None
INFO:tensorflow:Performing Decoding from a file.
INFO:tensorflow:Getting sorted inputs
INFO:tensorflow: batch 1
INFO:tensorflow:Deocding batch 0
Traceback (most recent call last):
File "/usr/local/bin/t2t-trainer", line 4, in <module>
__import__('pkg_resources').run_script('tensor2tensor==1.0.2', 't2t-trainer')
File "/usr/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 719, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/usr/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 1511, in run_script
exec(script_code, namespace, namespace)
File "/usr/local/lib/python2.7/dist-packages/tensor2tensor-1.0.2-py2.7.egg/EGG-INFO/scripts/t2t-trainer", line 55, in <module>
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "/usr/local/lib/python2.7/dist-packages/tensor2tensor-1.0.2-py2.7.egg/EGG-INFO/scripts/t2t-trainer", line 51, in main
File "build/bdist.linux-x86_64/egg/tensor2tensor/utils/trainer_utils.py", line 234, in run
File "build/bdist.linux-x86_64/egg/tensor2tensor/utils/trainer_utils.py", line 623, in run_locally
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/deprecation.py", line 289, in new_func
return func(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 590, in predict
as_iterable=as_iterable)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 883, in _infer_model
features = self._get_features_from_input_fn(input_fn)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 863, in _get_features_from_input_fn
result = input_fn()
File "build/bdist.linux-x86_64/egg/tensor2tensor/utils/trainer_utils.py", line 743, in _decode_batch_input_fn
File "build/bdist.linux-x86_64/egg/tensor2tensor/data_generators/text_encoder.py", line 60, in encode
ValueError: invalid literal for int() with base 10: 'Goodbye'
cat: ./t2t_data/decode_this.txt.baseline_lstm_seq2seq.basic1.beam4.alpha0.6.decodes: No such file or directory
t2t-trainer
runs out of GPU memory when training on a single nVidia 1080 GTX (8 GB) with the following parameters:
PROBLEM=wmt_ende_tokens_32k
MODEL=transformer
HPARAMS=transformer_base_single_gpu
Any hints on this?
More specifically, the exception ResourceExhaustedError
is raised, cf. the following dump:
2017-06-22 11:55:49.859461: I tensorflow/core/common_runtime/bfc_allocator.cc:693] Summary of in-use Chunks by size:
2017-06-22 11:55:49.859468: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 142 Chunks of size 256 totalling 35.5KiB
2017-06-22 11:55:49.859473: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 1280 totalling 1.2KiB
2017-06-22 11:55:49.859477: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 414 Chunks of size 2048 totalling 828.0KiB
2017-06-22 11:55:49.859482: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 24 Chunks of size 4096 totalling 96.0KiB
2017-06-22 11:55:49.859487: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 48 Chunks of size 6144 totalling 288.0KiB
2017-06-22 11:55:49.859491: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 48 Chunks of size 8192 totalling 384.0KiB
2017-06-22 11:55:49.859495: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 5 Chunks of size 32768 totalling 160.0KiB
2017-06-22 11:55:49.859500: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 45824 totalling 44.8KiB
2017-06-22 11:55:49.859505: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 4 Chunks of size 65536 totalling 256.0KiB
2017-06-22 11:55:49.859509: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 96 Chunks of size 1048576 totalling 96.00MiB
2017-06-22 11:55:49.859514: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 24 Chunks of size 2097152 totalling 48.00MiB
2017-06-22 11:55:49.859518: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 46 Chunks of size 3145728 totalling 138.00MiB
2017-06-22 11:55:49.859523: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 65 Chunks of size 4030464 totalling 249.84MiB
2017-06-22 11:55:49.859527: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 97 Chunks of size 4194304 totalling 388.00MiB
2017-06-22 11:55:49.859532: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 227 Chunks of size 16711680 totalling 3.53GiB
2017-06-22 11:55:49.859536: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 43 Chunks of size 20889600 totalling 856.64MiB
2017-06-22 11:55:49.859541: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 64487424 totalling 61.50MiB
2017-06-22 11:55:49.859546: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 12 Chunks of size 66846720 totalling 765.00MiB
2017-06-22 11:55:49.859550: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 1288182528 totalling 1.20GiB
2017-06-22 11:55:49.859554: I tensorflow/core/common_runtime/bfc_allocator.cc:700] Sum Total of in-use chunks: 7.28GiB
2017-06-22 11:55:49.859561: I tensorflow/core/common_runtime/bfc_allocator.cc:702] Stats:
Limit: 7969800192
InUse: 7813304576
MaxInUse: 7966874112
NumAllocs: 1912
MaxAllocSize: 1288182528
2017-06-22 11:55:49.859613: W tensorflow/core/common_runtime/bfc_allocator.cc:277] *************************************************************************************************xxx
2017-06-22 11:55:49.859627: W tensorflow/core/framework/op_kernel.cc:1158] Resource exhausted: OOM when allocating tensor with shape[102,80,1,1,31488]
Traceback (most recent call last):
File "/home/villi/tf/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1139, in _do_call
return fn(*args)
File "/home/villi/tf/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1121, in _run_fn
status, run_metadata)
File "/usr/lib/python3.6/contextlib.py", line 89, in __exit__
next(self.gen)
File "/home/villi/tf/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status
pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[102,80,1,1,31488]
[[Node: symbol_modality_31488_512_2/parallel_0_1/symbol_modality_31488_512/padded_cross_entropy/smoothing_cross_entropy/one_hot = OneHot[T=DT_FLOAT, TI=DT_INT32, axis=-1, _device="/job:localhost/replica:0/task:0/gpu:0"](symbol_modality_31488_512_2/parallel_0_1/symbol_modality_31488_512/padded_cross_entropy/pad_with_zeros/pad_to_same_length/Pad_1/_2643, symbol_modality_31488_512_2/parallel_0_1/symbol_modality_31488_512/strided_slice, symbol_modality_31488_512_2/parallel_0_1/symbol_modality_31488_512/padded_cross_entropy/smoothing_cross_entropy/one_hot/on_value, symbol_modality_31488_512_2/parallel_0_1/symbol_modality_31488_512/padded_cross_entropy/smoothing_cross_entropy/truediv)]]
[[Node: training/train/update/_2730 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_13171_training/train/update", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
training on a single nVidia 1080 GTX (12 GB) with the following parameters:
--problems=wmt_ende_tokens_32k
--model=transformer
--hparams_set=transformer_big_single_gpu
--hparams='batch_size=2048'
When the loss value is around 3, the loss value drops very slowly:
step 0~ 30k:loss drops from 8.4 to 3
INFO:tensorflow:global_step/sec: 2.03158
INFO:tensorflow:loss = 3.62252, step = 29601 (49.221 sec)
INFO:tensorflow:global_step/sec: 2.03539
INFO:tensorflow:loss = 3.64336, step = 29701 (49.130 sec)
INFO:tensorflow:global_step/sec: 2.03153
INFO:tensorflow:loss = 3.58582, step = 29801 (49.226 sec)
INFO:tensorflow:global_step/sec: 2.02831
INFO:tensorflow:loss = 3.38816, step = 29901 (49.301 sec)
INFO:tensorflow:global_step/sec: 2.02674
INFO:tensorflow:loss = 3.40213, step = 30001 (49.340 sec)
INFO:tensorflow:global_step/sec: 2.03157
INFO:tensorflow:loss = 3.44571, step = 30101 (49.235 sec)
INFO:tensorflow:global_step/sec: 2.03308
INFO:tensorflow:loss = 3.15277, step = 30201 (49.175 sec)
step 120k:loss is jitter around 3
INFO:tensorflow:loss = 3.12874, step = 125101 (76.470 sec)
INFO:tensorflow:global_step/sec: 2.04413
INFO:tensorflow:loss = 3.09151, step = 125201 (48.925 sec)
INFO:tensorflow:global_step/sec: 2.03683
INFO:tensorflow:loss = 3.2518, step = 125301 (49.093 sec)
INFO:tensorflow:global_step/sec: 2.03616
INFO:tensorflow:loss = 3.90474, step = 125401 (49.113 sec)
INFO:tensorflow:global_step/sec: 2.04036
INFO:tensorflow:loss = 2.87875, step = 125501 (49.010 sec)
INFO:tensorflow:global_step/sec: 2.0414
INFO:tensorflow:loss = 3.47175, step = 125601 (48.986 sec)
INFO:tensorflow:global_step/sec: 2.03132
INFO:tensorflow:loss = 3.00751, step = 125701 (49.230 sec)
INFO:tensorflow:global_step/sec: 2.0305
INFO:tensorflow:loss = 2.81739, step = 125801 (49.247 sec)
INFO:tensorflow:global_step/sec: 2.03291
INFO:tensorflow:loss = 3.60361, step = 125901 (49.191 sec)
INFO:tensorflow:global_step/sec: 2.03915
INFO:tensorflow:loss = 2.91831, step = 126001 (49.041 sec)
INFO:tensorflow:global_step/sec: 2.02992
INFO:tensorflow:loss = 2.98262, step = 126101 (49.263 sec)
INFO:tensorflow:global_step/sec: 2.03459
Is it normal?
Can you give a reference value?
There are many models supported in t2t, wish some pretrained models to be provided for us~
Hi
When I run distributed training following the guides in https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/docs/distributed_training.md,
I configure with 1 ps and 2 workers. The ps works ok, but all the workers show errors:
tensorflow.python.framework.errors_impl.NotFoundError: No session factory registered for the given session options: {target: "10.150.144.48:1111" config: allow_soft_placement: true graph_options { optimizer_options { } }} Registered factories are {DIRECT_SESSION, GRPC_SESSION}.
The details of this error is as follows:
2017-06-25 06:41:26.914625: E tensorflow/core/common_runtime/session.cc:69] Not found: No session factory registered for the given session options: {target: "10.150.144.48:1111" config: allow_soft_placement: true graph_options { optimizer_options { } }} Registered factories are {DIRECT_SESSION, GRPC_SESSION}. {u'cluster': {u'ps': [u'10.150.144.48:3333'], u'worker': [u'10.150.144.48:1111', u'10.150.144.48:2222']}, u'task': {u'index': 0, u'type': u'worker'}} Traceback (most recent call last): File "/usr/local/bin/t2t-trainer", line 62, in <module> tf.app.run() File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run _sys.exit(main(_sys.argv[:1] + flags_passthrough)) File "/usr/local/bin/t2t-trainer", line 58, in main schedule=FLAGS.schedule) File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/trainer_utils.py", line 247, in run output_dir=FLAGS.output_dir) File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/learn_runner.py", line 210, in run return _execute_schedule(experiment, schedule) File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/learn_runner.py", line 47, in _execute_schedule return task() File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/experiment.py", line 275, in train hooks=self._train_monitors + extra_hooks) File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/experiment.py", line 669, in _call_train monitors=hooks) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/deprecation.py", line 289, in new_func return func(*args, **kwargs) File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 455, in fit loss = self._train_model(input_fn=input_fn, hooks=hooks) File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 1003, in _train_model config=self._session_config File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 352, in MonitoredTrainingSession stop_grace_period_secs=stop_grace_period_secs) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 648, in __init__ stop_grace_period_secs=stop_grace_period_secs) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 477, in __init__ self._sess = _RecoverableSession(self._coordinated_creator) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 822, in __init__ _WrappedSession.__init__(self, self._create_session()) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 827, in _create_session return self._sess_creator.create_session() File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 538, in create_session self.tf_sess = self._session_creator.create_session() File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 412, in create_session init_fn=self._scaffold.init_fn) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/session_manager.py", line 273, in prepare_session config=config) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/session_manager.py", line 178, in _restore_checkpoint sess = session.Session(self._target, graph=self._graph, config=config) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1292, in __init__ super(Session, self).__init__(target, graph, config=config) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 562, in __init__ self._session = tf_session.TF_NewDeprecatedSession(opts, status) File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__ self.gen.next() File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status pywrap_tensorflow.TF_GetCode(status)) tensorflow.python.framework.errors_impl.NotFoundError: No session factory registered for the given session options: {target: "10.150.144.48:1111" config: allow_soft_placement: true graph_options { optimizer_options { } }} Registered factories are {DIRECT_SESSION, GRPC_SESSION}. ERROR:tensorflow:================================== Object was never used (type <class 'tensorflow.python.framework.ops.Tensor'>): <tf.Tensor 'report_uninitialized_variables_1/boolean_mask/Gather:0' shape=(?,) dtype=string> If you want to mark it as used call its "mark_used()" method. It was originally created here: ['File "/usr/local/bin/t2t-trainer", line 62, in <module>\n tf.app.run()', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run\n _sys.exit(main(_sys.argv[:1] + flags_passthrough))', 'File "/usr/local/bin/t2t-trainer", line 58, in main\n schedule=FLAGS.schedule)', 'File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/trainer_utils.py", line 247, in run\n output_dir=FLAGS.output_dir)', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/learn_runner.py", line 210, in run\n return _execute_schedule(experiment, schedule)', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/learn_runner.py", line 47, in _execute_schedule\n return task()', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/experiment.py", line 275, in train\n hooks=self._train_monitors + extra_hooks)', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/experiment.py", line 669, in _call_train\n monitors=hooks)', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/deprecation.py", line 289, in new_func\n return func(*args, **kwargs)', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 455, in fit\n loss = self._train_model(input_fn=input_fn, hooks=hooks)', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 1003, in _train_model\n config=self._session_config', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 352, in MonitoredTrainingSession\n stop_grace_period_secs=stop_grace_period_secs)', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 648, in __init__\n stop_grace_period_secs=stop_grace_period_secs)', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 477, in __init__\n self._sess = _RecoverableSession(self._coordinated_creator)', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 822, in __init__\n _WrappedSession.__init__(self, self._create_session())', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 827, in _create_session\n return self._sess_creator.create_session()', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 538, in create_session\n self.tf_sess = self._session_creator.create_session()', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 403, in create_session\n self._scaffold.finalize()', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 192, in finalize\n default_ready_for_local_init_op)', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 254, in get_or_default\n op = default_constructor()', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 189, in default_ready_for_local_init_op\n variables.global_variables())', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/tf_should_use.py", line 170, in wrapped\n return _add_should_use_warning(fn(*args, **kwargs))', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/tf_should_use.py", line 139, in _add_should_use_warning\n wrapped = TFShouldUseWarningWrapper(x)', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/tf_should_use.py", line 96, in __init__\n stack = [s.strip() for s in traceback.format_stack()]'] ==================================
It seems {DIRECT_SESSION, GRPC_SESSION}.` is not registered, So can you help to see this problem?
$t2t-datagen --data_dir=$DATA_DIR --tmp_dir=$TMP_DIR --num_shards=100 --problem=$PROBLEM
Traceback (most recent call last):
File "/tf1.2py3_venv/venv/bin/t2t-datagen", line 39, in
from tensor2tensor.data_generators import image
File "/tf1.2py3_venv/venv/lib/python3.6/site-packages/tensor2tensor/data_generators/image.py", line 21, in
import cPickle
ImportError: No module named 'cPickle'
I' trying to reproduce wmt_ende_bpe32k. The data generations fails with the following error, however:
INFO:tensorflow:Generating training data for wmt_ende_bpe32k.
Traceback (most recent call last):
File "/usr/local/bin/t2t-datagen", line 361, in <module>
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "/usr/local/bin/t2t-datagen", line 345, in main
FLAGS.data_dir, FLAGS.num_shards, FLAGS.max_cases)
File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/data_generators/generator_utils.py", line 113, in generate_files
for case in generator:
File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/data_generators/wmt.py", line 83, in token_generator
source_ints = token_vocab.encode(source.strip()) + eos_list
File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/data_generators/text_encoder.py", line 120, in encode
ret = [self._token_to_id[tok] for tok in sentence.strip().split()]
AttributeError: 'TokenTextEncoder' object has no attribute '_token_to_id'
I'm using the following command:
PROBLEM=wmt_ende_bpe32k
MODEL=transformer
HPARAMS=transformer_base
DATA_DIR=$HOME/t2t_data
TMP_DIR=/tmp/t2t_datagen
TRAIN_DIR=$HOME/t2t_train/$PROBLEM/$MODEL-$HPARAMS
mkdir -p $DATA_DIR $TMP_DIR $TRAIN_DIR
# Generate data
t2t-datagen \
--data_dir=$DATA_DIR \
--tmp_dir=$TMP_DIR \
--num_shards=100 \
--problem=$PROBLEM
mv $TMP_DIR/vocab.bpe.32000 $DATA_DIR
# Train
# * If you run out of memory, add --hparams='batch_size=2048' or even 1024.
t2t-trainer \
--data_dir=$DATA_DIR \
--problems=$PROBLEM \
--model=$MODEL \
--hparams_set=$HPARAMS \
--output_dir=$TRAIN_DIR \
--worker_gpu=2 \
--log_dir logs
# Decode
DECODE_FILE=$DATA_DIR/decode_this.txt
echo "Hello world" >> $DECODE_FILE
echo "Goodbye world" >> $DECODE_FILE
BEAM_SIZE=4
ALPHA=0.6
t2t-trainer \
--data_dir=$DATA_DIR \
--problems=$PROBLEM \
--model=$MODEL \
--hparams_set=$HPARAMS \
--output_dir=$TRAIN_DIR \
--train_steps=0 \
--eval_steps=0 \
--decode_beam_size=$BEAM_SIZE \
--decode_alpha=$ALPHA \
--decode_from_file=$DECODE_FILE
cat $DECODE_FILE.$MODEL.$HPARAMS.beam$BEAM_SIZE.alpha$ALPHA.decodes
I've downloaded wmt16_en_de.tar.gz
and placed it in /tmp/t2t_datagen
as specified in wmt.py
:
def _get_wmt_ende_dataset(directory, filename):
"""Extract the WMT en-de corpus `filename` to directory unless it's there."""
train_path = os.path.join(directory, filename)
if not (tf.gfile.Exists(train_path + ".de") and
tf.gfile.Exists(train_path + ".en")):
# We expect that this file has been downloaded from:
# https://drive.google.com/open?id=0B_bZck-ksdkpM25jRUN2X2UxMm8 and placed
# in `directory`.
corpus_file = os.path.join(directory, "wmt16_en_de.tar.gz")
with tarfile.open(corpus_file, "r:gz") as corpus_tar:
corpus_tar.extractall(directory)
return train_path
It seems that words_and_tags_from_wsj_tree can parse wsj trees (lists) like
(A (B (C c)))
but not trees like
(A (B (C c d) e))
This is because it either assumes opening or closing parenthesis for each token.
@lukaszkaiser @vthorsteinsson @nshazeer
@vthorsteinsson's recent PR improved the compatibility between Python 2 and 3 but we seem to have lost some valuable functionality.
We want to have a SubwordTextEncoder that is fully invertible with a limited vocabulary and it should be able to encode anything. i.e. it should operate on bytes exclusively, so that the vocabulary only needs to grow by <= 256 entries. So if the input is Unicode (utf-8 encoded, or otherwise), it will be read in as individual bytes (and not Unicode characters), which means that decoding might break (i.e. the decoder might produce a sequence of bytes that is invalid unicode).
For datasets or tasks that wish to handle Unicode characters directly as part of the vocabulary, there can be a different version of the SubwordTextEncoder that does that (e.g. the one that is currently checked-in).
So the suggestion is to have 2 SubwordTextEncoders, one for just bytes, and another that deals with unicode (pretty much the one that's checked-in).
I may be misunderstanding the current functionality so please correct my mental model where it's wrong.
Before training on a new generator(nlplike) i have tried to train on the baseline_lstm_seq2seq model to see how it works on the algorithmic_reverse_decimal40
, this is the result(i am inside a Docker container):
root@df1a91a7be96:/t2t# PROBLEM=algorithmic_reverse_decimal40
root@df1a91a7be96:/t2t# MODEL=baseline_lstm_seq2seq
root@df1a91a7be96:/t2t# HPARAMS=basic_1
root@df1a91a7be96:/t2t# DATA_DIR=/tmp/t2t_data
root@df1a91a7be96:/t2t# TMP_DIR=/tmp/t2t_datagen
root@df1a91a7be96:/t2t# TRAIN_DIR=/tmp/t2t_train/$PROBLEM/$MODEL-$HPARAMS
root@df1a91a7be96:/t2t# mkdir -p $DATA_DIR $TMP_DIR $TRAIN_DIR
root@df1a91a7be96:/t2t# # Generate data
root@df1a91a7be96:/t2t# t2t-datagen \
> --data_dir=$DATA_DIR \
> --tmp_dir=$TMP_DIR \
> --problem=$PROBLEM
INFO:tensorflow:Generating training data for algorithmic_reverse_decimal40.
INFO:tensorflow:Generating case 0 for algorithmic_reverse_decimal40-unshuffled-train.
INFO:tensorflow:Generating development data for algorithmic_reverse_decimal40.
INFO:tensorflow:Generating case 0 for algorithmic_reverse_decimal40-unshuffled-dev.
INFO:tensorflow:Shuffling data...
INFO:tensorflow:read: 10000
INFO:tensorflow:write: 0
INFO:tensorflow:read: 10000
INFO:tensorflow:write: 0
INFO:tensorflow:read: 10000
INFO:tensorflow:write: 0
INFO:tensorflow:read: 10000
INFO:tensorflow:write: 0
INFO:tensorflow:read: 10000
INFO:tensorflow:write: 0
INFO:tensorflow:read: 10000
INFO:tensorflow:write: 0
INFO:tensorflow:read: 10000
INFO:tensorflow:write: 0
INFO:tensorflow:read: 10000
INFO:tensorflow:write: 0
INFO:tensorflow:read: 10000
INFO:tensorflow:write: 0
INFO:tensorflow:read: 10000
INFO:tensorflow:write: 0
INFO:tensorflow:read: 10000
INFO:tensorflow:write: 0
root@df1a91a7be96:/t2t#
root@df1a91a7be96:/t2t# t2t-trainer \
> --data_dir=$DATA_DIR \
> --problems=$PROBLEM \
> --model=$MODEL \
> --hparams_set=$HPARAMS \
> --output_dir=$TRAIN_DIR
INFO:tensorflow:Creating experiment, storing model files in /tmp/t2t_train/algorithmic_reverse_decimal40/baseline_lstm_seq2seq-basic_1
INFO:tensorflow:datashard_devices: ['gpu:0']
INFO:tensorflow:caching_devices: None
INFO:tensorflow:Using config: {'_model_dir': '/tmp/t2t_train/algorithmic_reverse_decimal40/baseline_lstm_seq2seq-basic_1', '_save_checkpoints_secs': 600, '_num_ps_replicas': 0, '_keep_checkpoint_max': 20, '_tf_random_seed': None, '_task_type': None, '_environment': 'local', '_is_chief': True, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fdc4e2b5390>, '_tf_config': gpu_options {
per_process_gpu_memory_fraction: 1.0
}
, '_num_worker_replicas': 0, '_task_id': 0, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_evaluation_master': '', '_keep_checkpoint_every_n_hours': 10000, '_master': '', '_session_config': allow_soft_placement: true
graph_options {
optimizer_options {
}
}
}
INFO:tensorflow:Performing local training.
INFO:tensorflow:datashard_devices: ['gpu:0']
INFO:tensorflow:caching_devices: None
INFO:tensorflow:Doing model_fn_body took 0.668 sec.
INFO:tensorflow:This model_fn took 0.983 sec.
INFO:tensorflow:Weight body/lstm_seq2seq/decoder/rnn/multi_rnn_cell/cell_0/basic_lstm_cell/bias shape (256,) size 256
INFO:tensorflow:Weight body/lstm_seq2seq/decoder/rnn/multi_rnn_cell/cell_0/basic_lstm_cell/kernel shape (128, 256) size 32768
INFO:tensorflow:Weight body/lstm_seq2seq/decoder/rnn/multi_rnn_cell/cell_1/basic_lstm_cell/bias shape (256,) size 256
INFO:tensorflow:Weight body/lstm_seq2seq/decoder/rnn/multi_rnn_cell/cell_1/basic_lstm_cell/kernel shape (128, 256) size 32768
INFO:tensorflow:Weight body/lstm_seq2seq/decoder/rnn/multi_rnn_cell/cell_2/basic_lstm_cell/bias shape (256,) size 256
INFO:tensorflow:Weight body/lstm_seq2seq/decoder/rnn/multi_rnn_cell/cell_2/basic_lstm_cell/kernel shape (128, 256) size 32768
INFO:tensorflow:Weight body/lstm_seq2seq/decoder/rnn/multi_rnn_cell/cell_3/basic_lstm_cell/bias shape (256,) size 256
INFO:tensorflow:Weight body/lstm_seq2seq/decoder/rnn/multi_rnn_cell/cell_3/basic_lstm_cell/kernel shape (128, 256) size 32768
INFO:tensorflow:Weight body/lstm_seq2seq/encoder/rnn/multi_rnn_cell/cell_0/basic_lstm_cell/bias shape (256,) size 256
INFO:tensorflow:Weight body/lstm_seq2seq/encoder/rnn/multi_rnn_cell/cell_0/basic_lstm_cell/kernel shape (128, 256) size 32768
INFO:tensorflow:Weight body/lstm_seq2seq/encoder/rnn/multi_rnn_cell/cell_1/basic_lstm_cell/bias shape (256,) size 256
INFO:tensorflow:Weight body/lstm_seq2seq/encoder/rnn/multi_rnn_cell/cell_1/basic_lstm_cell/kernel shape (128, 256) size 32768
INFO:tensorflow:Weight body/lstm_seq2seq/encoder/rnn/multi_rnn_cell/cell_2/basic_lstm_cell/bias shape (256,) size 256
INFO:tensorflow:Weight body/lstm_seq2seq/encoder/rnn/multi_rnn_cell/cell_2/basic_lstm_cell/kernel shape (128, 256) size 32768
INFO:tensorflow:Weight body/lstm_seq2seq/encoder/rnn/multi_rnn_cell/cell_3/basic_lstm_cell/bias shape (256,) size 256
INFO:tensorflow:Weight body/lstm_seq2seq/encoder/rnn/multi_rnn_cell/cell_3/basic_lstm_cell/kernel shape (128, 256) size 32768
INFO:tensorflow:Weight symbol_modality_12_64/input_emb/weights_0 shape (1, 64) size 64
INFO:tensorflow:Weight symbol_modality_12_64/input_emb/weights_10 shape (1, 64) size 64
INFO:tensorflow:Weight symbol_modality_12_64/input_emb/weights_11 shape (1, 64) size 64
INFO:tensorflow:Weight symbol_modality_12_64/input_emb/weights_12 shape (0, 64) size 0
INFO:tensorflow:Weight symbol_modality_12_64/input_emb/weights_13 shape (0, 64) size 0
INFO:tensorflow:Weight symbol_modality_12_64/input_emb/weights_14 shape (0, 64) size 0
INFO:tensorflow:Weight symbol_modality_12_64/input_emb/weights_15 shape (0, 64) size 0
INFO:tensorflow:Weight symbol_modality_12_64/input_emb/weights_1 shape (1, 64) size 64
INFO:tensorflow:Weight symbol_modality_12_64/input_emb/weights_2 shape (1, 64) size 64
INFO:tensorflow:Weight symbol_modality_12_64/input_emb/weights_3 shape (1, 64) size 64
INFO:tensorflow:Weight symbol_modality_12_64/input_emb/weights_4 shape (1, 64) size 64
INFO:tensorflow:Weight symbol_modality_12_64/input_emb/weights_5 shape (1, 64) size 64
INFO:tensorflow:Weight symbol_modality_12_64/input_emb/weights_6 shape (1, 64) size 64
INFO:tensorflow:Weight symbol_modality_12_64/input_emb/weights_7 shape (1, 64) size 64
INFO:tensorflow:Weight symbol_modality_12_64/input_emb/weights_8 shape (1, 64) size 64
INFO:tensorflow:Weight symbol_modality_12_64/input_emb/weights_9 shape (1, 64) size 64
INFO:tensorflow:Weight symbol_modality_12_64/softmax/weights_0 shape (1, 64) size 64
INFO:tensorflow:Weight symbol_modality_12_64/softmax/weights_10 shape (1, 64) size 64
INFO:tensorflow:Weight symbol_modality_12_64/softmax/weights_11 shape (1, 64) size 64
INFO:tensorflow:Weight symbol_modality_12_64/softmax/weights_12 shape (0, 64) size 0
INFO:tensorflow:Weight symbol_modality_12_64/softmax/weights_13 shape (0, 64) size 0
INFO:tensorflow:Weight symbol_modality_12_64/softmax/weights_14 shape (0, 64) size 0
INFO:tensorflow:Weight symbol_modality_12_64/softmax/weights_15 shape (0, 64) size 0
INFO:tensorflow:Weight symbol_modality_12_64/softmax/weights_1 shape (1, 64) size 64
INFO:tensorflow:Weight symbol_modality_12_64/softmax/weights_2 shape (1, 64) size 64
INFO:tensorflow:Weight symbol_modality_12_64/softmax/weights_3 shape (1, 64) size 64
INFO:tensorflow:Weight symbol_modality_12_64/softmax/weights_4 shape (1, 64) size 64
INFO:tensorflow:Weight symbol_modality_12_64/softmax/weights_5 shape (1, 64) size 64
INFO:tensorflow:Weight symbol_modality_12_64/softmax/weights_6 shape (1, 64) size 64
INFO:tensorflow:Weight symbol_modality_12_64/softmax/weights_7 shape (1, 64) size 64
INFO:tensorflow:Weight symbol_modality_12_64/softmax/weights_8 shape (1, 64) size 64
INFO:tensorflow:Weight symbol_modality_12_64/softmax/weights_9 shape (1, 64) size 64
INFO:tensorflow:Weight symbol_modality_12_64/target_emb/weights_0 shape (1, 64) size 64
INFO:tensorflow:Weight symbol_modality_12_64/target_emb/weights_10 shape (1, 64) size 64
INFO:tensorflow:Weight symbol_modality_12_64/target_emb/weights_11 shape (1, 64) size 64
INFO:tensorflow:Weight symbol_modality_12_64/target_emb/weights_12 shape (0, 64) size 0
INFO:tensorflow:Weight symbol_modality_12_64/target_emb/weights_13 shape (0, 64) size 0
INFO:tensorflow:Weight symbol_modality_12_64/target_emb/weights_14 shape (0, 64) size 0
INFO:tensorflow:Weight symbol_modality_12_64/target_emb/weights_15 shape (0, 64) size 0
INFO:tensorflow:Weight symbol_modality_12_64/target_emb/weights_1 shape (1, 64) size 64
INFO:tensorflow:Weight symbol_modality_12_64/target_emb/weights_2 shape (1, 64) size 64
INFO:tensorflow:Weight symbol_modality_12_64/target_emb/weights_3 shape (1, 64) size 64
INFO:tensorflow:Weight symbol_modality_12_64/target_emb/weights_4 shape (1, 64) size 64
INFO:tensorflow:Weight symbol_modality_12_64/target_emb/weights_5 shape (1, 64) size 64
INFO:tensorflow:Weight symbol_modality_12_64/target_emb/weights_6 shape (1, 64) size 64
INFO:tensorflow:Weight symbol_modality_12_64/target_emb/weights_7 shape (1, 64) size 64
INFO:tensorflow:Weight symbol_modality_12_64/target_emb/weights_8 shape (1, 64) size 64
INFO:tensorflow:Weight symbol_modality_12_64/target_emb/weights_9 shape (1, 64) size 64
INFO:tensorflow:Total trainable variables size: 266496
INFO:tensorflow:Total embedding variables size: 0
INFO:tensorflow:Total non-embedding variables size: 266496
INFO:tensorflow:Computing gradients for global model_fn.
INFO:tensorflow:Global model_fn finished.
INFO:tensorflow:Create CheckpointSaverHook.
2017-06-23 11:14:29.014420: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-06-23 11:14:29.014488: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-06-23 11:14:29.014519: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-06-23 11:14:29.076780: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:893] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2017-06-23 11:14:29.077245: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 0 with properties:
name: GeForce GTX 670MX
major: 3 minor: 0 memoryClockRate (GHz) 0.601
pciBusID 0000:01:00.0
Total memory: 2.94GiB
Free memory: 2.62GiB
2017-06-23 11:14:29.077338: I tensorflow/core/common_runtime/gpu/gpu_device.cc:961] DMA: 0
2017-06-23 11:14:29.077374: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0: Y
2017-06-23 11:14:29.084741: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 670MX, pci bus id: 0000:01:00.0)
2017-06-23 11:14:38.368302: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 5370 get requests, put_count=3301 evicted_count=1000 eviction_rate=0.302939 and unsatisfied allocation rate=0.59013
2017-06-23 11:14:38.368375: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:259] Raising pool_size_limit_ from 100 to 110
INFO:tensorflow:Saving checkpoints for 1 into /tmp/t2t_train/algorithmic_reverse_decimal40/baseline_lstm_seq2seq-basic_1/model.ckpt.
ERROR:tensorflow:Model diverged with loss = NaN.
Traceback (most recent call last):
File "/usr/local/bin/t2t-trainer", line 83, in <module>
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "/usr/local/bin/t2t-trainer", line 79, in main
schedule=FLAGS.schedule)
File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/trainer_utils.py", line 240, in run
run_locally(exp_fn(output_dir))
File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/trainer_utils.py", line 531, in run_locally
exp.train()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/experiment.py", line 275, in train
hooks=self._train_monitors + extra_hooks)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/experiment.py", line 665, in _call_train
monitors=hooks)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/deprecation.py", line 289, in new_func
return func(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 455, in fit
loss = self._train_model(input_fn=input_fn, hooks=hooks)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 1007, in _train_model
_, loss = mon_sess.run([model_fn_ops.train_op, model_fn_ops.loss])
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 505, in run
run_metadata=run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 842, in run
run_metadata=run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 798, in run
return self._sess.run(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 960, in run
run_metadata=run_metadata))
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/basic_session_run_hooks.py", line 477, in after_run
raise NanLossDuringTrainingError
tensorflow.python.training.basic_session_run_hooks.NanLossDuringTrainingError: NaN loss during training.
I have missed something during configuration? I have also tried the same problem on the transformer model and the training seems fine, but during inference it doesn't reproduce reverse input!!!(Later i'll post the output of this last command and my configuration).
Edit: With transformer everything is ok.
I think that would be very cool and attractive adding the support for the Kinetics Dataset and the last Inflated 3D ConvNet(I3C) models to tensor2tensor
.
There are some holes in the Python 3 compatibility of the Tensor2tensor code. For instance:
In data_generators/generator_utils.py, import urllib
needs to be:
import sys
if sys.version_info[0] >= 3:
import urllib.request as urllib
else:
import urllib
In data_generators/image.py, import cPickle
needs to be:
try:
import cPickle
except ImportError:
import pickle as cPickle
Finally, data_generators/tokenizer.py needs to be revised as it assumes that a char ordinal is always in the range (0, 256), which is not a safe assumption in Python 3. A better solution uses a set instead of array subscripts based on char ordinals. Would you like me to submit a revised version in a pull request?
I trained the model on two Tesla M60s, each of which is 8G. I did not modify any hyper-parameter. The loss seems not change after 50000 steps.
INFO:tensorflow:Saving checkpoints for 54877 into /data/t2t/t2t_train/wmt_ende_tokens_32k/transformer-transformer_base/model.ckpt.
INFO:tensorflow:global_step/sec: 0.858429
INFO:tensorflow:loss = 1.96779, step = 54901 (116.492 sec)
INFO:tensorflow:global_step/sec: 0.876813
INFO:tensorflow:loss = 1.96174, step = 55001 (114.049 sec)
INFO:tensorflow:global_step/sec: 0.864947
INFO:tensorflow:loss = 1.98628, step = 55101 (115.614 sec)
INFO:tensorflow:global_step/sec: 0.860629
INFO:tensorflow:loss = 2.26156, step = 55201 (116.195 sec)
INFO:tensorflow:global_step/sec: 0.864128
INFO:tensorflow:loss = 1.98318, step = 55301 (115.723 sec)
INFO:tensorflow:Saving checkpoints for 55396 into /data/t2t/t2t_train/wmt_ende_tokens_32k/transformer-transformer_base/model.ckpt.
INFO:tensorflow:global_step/sec: 0.852946
INFO:tensorflow:loss = 2.30657, step = 55401 (117.241 sec)
INFO:tensorflow:global_step/sec: 0.870939
INFO:tensorflow:loss = 2.11571, step = 55501 (114.819 sec)
INFO:tensorflow:global_step/sec: 0.86979
INFO:tensorflow:loss = 1.99461, step = 55601 (114.970 sec)
INFO:tensorflow:global_step/sec: 0.86269
INFO:tensorflow:loss = 2.01496, step = 55701 (115.916 sec)
INFO:tensorflow:global_step/sec: 0.869183
INFO:tensorflow:loss = 1.98261, step = 55801 (115.051 sec)
INFO:tensorflow:global_step/sec: 0.862935
INFO:tensorflow:loss = 1.88075, step = 55901 (115.883 sec)
INFO:tensorflow:Saving checkpoints for 55915 into /data/t2t/t2t_train/wmt_ende_tokens_32k/transformer-transformer_base/model.ckpt.
INFO:tensorflow:global_step/sec: 0.855085
INFO:tensorflow:loss = 1.9415, step = 56001 (116.948 sec)
INFO:tensorflow:global_step/sec: 0.86353
INFO:tensorflow:loss = 2.26614, step = 56101 (115.804 sec)
INFO:tensorflow:global_step/sec: 0.871136
INFO:tensorflow:loss = 2.14308, step = 56201 (114.793 sec)
INFO:tensorflow:global_step/sec: 0.860752
INFO:tensorflow:loss = 1.96734, step = 56301 (116.178 sec)
INFO:tensorflow:global_step/sec: 0.871609
INFO:tensorflow:loss = 1.98928, step = 56401 (114.730 sec)
However, the decoder result does not make any sense. Anyone knows the reason?
INFO:tensorflow:Inference results INPUT: Goodbye world
INFO:tensorflow:Inference results OUTPUT: Esconnectentareaconnectentkannconnectent
INFO:tensorflow:Inference results INPUT: Hello world
INFO:tensorflow:Inference results OUTPUT: Esconnectentareaconnectentkannconnectent
I've tried to add a new text dataset for a basic text classification task: the data generator works flawlessly. I then tried to add my task in problem_hparams.py, but it crashes with:
Variable symbol_modality_13_512/shared/weights_0 does not exist
Why ? I've got the feeling that I added all required hyperparameters. Should I add something else elsewhere ?
Thanks !
Here's my problem hparms:
def txtclassif_tokens(model_hparams):
p = default_problem_hparams()
class DetTextEncoder():
def __init__(self, vocfile):
import codecs
# just stupid code that builds a vocabulary dict
self.v = {}
with codecs.open(vocfile,"r","utf-8") as f:
for l in f:
s=l.strip()
if len(s)>0:
i=s.rfind(' ')
self.v[s[0:i]]=int(s[i+1:])
def encode(self, sentence):
"""Converts a space-separated string of tokens to a list of ids."""
ret = []
for tok in sentence.strip().split():
if tok in self.v: ret.append(self.v[tok])
else: ret.append(self.v['UNK'])
if self._reverse: ret = ret[::-1]
return ret
def decode(self, ids):
if self._reverse: ids = ids[::-1]
toks=[]
for i in ids:
for w in self.v.keys():
if self.v[w]==i:
toks.append(w)
break
return ' '.join(toks)
@property
def vocab_size(self):
return len(self.v)
wvoc = DetTextEncoder(model_hparams.data_dir+"/voc.txt")
lvoc = DetTextEncoder(model_hparams.data_dir+"/voclab.txt")
p.input_modality = {
"inputs": (registry.Modalities.SYMBOL, wvoc.vocab_size)
}
p.target_modality = (registry.Modalities.SYMBOL, lvoc.vocab_size)
p.vocabulary = {
"inputs": wvoc,
"targets": lvoc,
}
p.input_space_id = 3
p.target_space_id = 3
return p
The tensorflow/ecosystem kubernetes config might be a good place to start.
Ideally, it'd be trivial to launch a distributed tensor2tensor job with Kubernetes using an arbitrary number of GPUs/TPUs.
We already have some documentation for distributed training with tensor2tensor.
If there's someone with experience with Kubernetes or GCP, we'd very much welcome the contribution.
Hi,
I modified the wmt_ende_characters
to translate Macedonian to English (bleu-score after training was 0.526888).
The input sentence is:
Kosovskiot proces na privatizaciјa se ispituva
Then the t2t_trainer
command shows some weird output:
INFO:tensorflow:Restoring parameters from t2t_train/model.ckpt-250000
INFO:tensorflow:Inference results INPUT: Mquqxumkqv"rtqegu"pc"rtkxcvk|cekӚc"ug"kurkvwxc
INFO:tensorflow:Inference results OUTPUT: Mukwak.cwave.gurk.fe.ce.sce.gurkwe.ce.ce
INFO:tensorflow:Writing decodes into test.txt.transformer.transformer_base.beam4.alpha0.6.decodes
Tested with version 1.0.5 and 1.0.7. Is this a bug?
Hi,
I have trained a transformer_big
model for the wmt_ende_tokens_32k
problem.
After 37118 steps, I found that it gives a decent result:
INFO:tensorflow:Saving dict for global step 37118:
global_step = 37118,
loss = 0.980365,
metrics-wmt_ende_tokens_32k/accuracy = 0.789868,
metrics-wmt_ende_tokens_32k/accuracy_per_sequence = 0.0,
metrics-wmt_ende_tokens_32k/accuracy_top5 = 0.90224,
metrics-wmt_ende_tokens_32k/bleu_score = 0.493593,
metrics-wmt_ende_tokens_32k/neg_log_perplexity = -1.11336,
metrics/accuracy = 0.789868,
metrics/accuracy_per_sequence = 0.0,
metrics/accuracy_top5 = 0.90224,
metrics/bleu_score = 0.493593,
metrics/neg_log_perplexity = -1.11336
I then tried to translate a newstest2014-deen-src.en
file which consists of 10008 lines.
I followed the default HPARAMS
setting for transformer_big
, and set BEAM_SIZE=3
and ALPHA=0.6
.
However, as the decoding process seemed to be taking forever, I re-tried the same process with a smaller file that consisted of just 10 lines. This time, the decoding took approx. 30 seconds after the loading of the learned model parameters.
Taking one second to decode a source sentence seems to be too long as this would suggest that translating a newstest2014-deen-src.en
file would take a couple of hours.
Am I missing some options here?
@lukaszkaiser Can you please help me with an inference script that basically takes a lot of hypothesis sentences and gives a score for each sentence using tensor2tensor approach. The current rnn based lm approach is quite slow. Meanwhile, I will try training a character level language model using the same technique.
Thanks
"one model to learn them all" relates to many tasks, but T2T problems are for single task.
How to run multimodel training by T2T?
Thanks a lot!
Hi,
I'd very much like to try this, but I don't have an nvidia gpu... Is the dependency on tensorflow-gpu a hard requirement?
Thanks a lot
Sigrid
[up-for-grabs] maybe?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.