tensorflow / tensor2tensor Goto Github PK

View Code? Open in Web Editor NEW

15.0K 464.0 3.4K 17.1 MB

Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.

License: Apache License 2.0

Python 62.87% Shell 0.15% Jupyter Notebook 35.19% JavaScript 0.96% HTML 0.43% C++ 0.40%

machine-learning machine-translation deep-learning reinforcement-learning tpu

tensor2tensor's Issues

Resume training

Is it possible to resume training for a certain amount of additional steps?

Vocabulary size in WMT translation task

Hi all,
I'm a little confused with the vocab_size defined in problem_hparams.py.

In wmt_ende_bpe32k, the vocab_size param is set to 40960, while in wmt_enfr_tokens, there's a wrong_vocab_size param, which is set to 2**13 if in wmt_enfr_tokens_8k. I'm guessing that this might not be the actual number of the vocabulary.

My question is:
When setting input_modality, how to set the vocab_size?

More specifically, If I have a separated source vocab and target vocab, with size n_src, n_tgt, respectively, should I set the vocab_size to n_src + n_tgt or something else?

Thank you

"Invalid argument: slice index 1 of dimension 1 out of bounds." Error when decoding the text to Class_Label using Transformer model

Hi,
I add a dataset to do the classification using the Transformer model. I generated the dataset successfully and trained the model successfully. But when I do the decoding, the invalid argument error is thrown.
The following codes are used to generate dataset, it's simply to pick up data from several dataset and make the combinations. Each combination is a category. The task is to find out which category is for a new set of data.
For an example, there are four dataset: A,B,C,D. They have several members like:
A: a0,a1,a2
B: b0,b1,b2,b3
C: c0,c1
D: d0,d1,d2
So there will be 15 combinations(aka. categories): A,B,C,D,AB,AC,AD,BC,BD,CD,ABC,ABD,ACD,BCD,ABCD. My generator will generate random number of sequences that picks members from set A,B,C,D and generate targets label between the 0 to 15.
All data generation and training worked well. But when I tried to do the decoding, The invalid argument error is thrown. This error is too unspecific and I have no idea how to solve it.
Please be kind to review my code what's wrong in it. Thanks a lot!
Generator:
This generator is to pick up members from several files and the file name is the single set name and assign a target classification label to it.

import numpy as np
import os, sys

from tensor2tensor.data_generators import generator_utils
from tensor2tensor.data_generators import text_encoder
from six.moves import xrange 
import ssl
import itertools	

def generateDicts(dataDir,nameFiles,idFrom=100):
	dictFileName=dataDir+"/name.dict"
	categories=len(nameFiles)
	nameset=dict()
	for name in nameFiles:
		nameFile=open(dataDir+"/"+name)
		names=list()
		for individual in nameFile:
			names.append(individual.strip())
		nameset[name]=names
		nameFile.close()
	allNames=set()
	for (k,v) in zip(nameset.keys(),nameset.values()) :
		allNames=allNames.union(set(v))
		print("lenOfAllNames after add "+k+" is "+str(len(allNames)))
	allNamesList=sorted(allNames)
	names=dict()
	idx=idFrom+1
	for name in allNamesList:
		names[idx]=name
		idx+=1
	dictFile=open(dictFileName,"w")
	for (k,v) in zip(names.keys(),names.values()):
		dictFile.write(v+"\n")
	dictFile.close()

	combos=dict()

	for i in xrange(len(nameFiles)):
		combination=itertools.combinations(nameFiles,i+1)
		for signleComb in combination:
			combos["_".join(signleComb)]=i
	return nameset,names,combos

def generateCase(nameset,names,nameFiles,combos,maxMembers):
	categories=len(nameFiles)
	categorySize=list()
	members = np.random.randint(maxMembers)+1
	leftMembers = members
	categoryList=list()
	inputs=list()

	for i in xrange(categories-1):
		categorySize.append(np.random.randint(leftMembers))
		leftMembers-=categorySize[i]
		if categorySize[i] != 0:
			categoryList.append(nameFiles[i])
			names=nameset[nameFiles[i]]
			nameLen=len(names)
			for j in xrange(categorySize[i]):
				nameIndex=np.random.randint(nameLen)
				inputs.append(names[nameIndex])

	i+=1
	if leftMembers != 0:
		categoryList.append(nameFiles[i])
		names=nameset[nameFiles[i]]
		nameLen=len(names)
		for j in xrange(leftMembers):
			nameIndex=np.random.randint(nameLen)
			inputs.append(names[nameIndex])

	cateStr="_".join(categoryList)
	outputs=[cateStr]
	return inputs,outputs

def party_party_generator(dataDir,nameFiles,maxMembers,numOfCases):
	nameset,names,combos=generateDicts(dataDir,nameFiles)
	#targetDict=dataDir+"/targets.dict"
	#targetDictFile=open(targetDict,"w")
	#for combo in combos:
	#	targetDictFile.write(combo+"\n")
	#targetDictFile.close()
	dictFileName=dataDir+"/name.dict"
	inputsTextToken=text_encoder.TokenTextEncoder(dictFileName)
	#targetsTextToken=text_encoder.TokenTextEncoder(targetDict)

	for i in xrange(numOfCases):
		inputs,outputs=generateCase(nameset,names,nameFiles,combos,maxMembers)
		strInput=" ".join(inputs)
		encodedInputs=inputsTextToken.encode(strInput)
		np.random.shuffle(encodedInputs)
		encodedOutputs=[combos[outputs[0]]]
		yield {"inputs":encodedInputs,"targets":encodedOutputs}

HParams I added
I feel a little bit confused about the input_space_id and target_space_id, it seems like I can set anything to it without any problems.

def party_party(model_hparams):
  """Party Party."""
  p = default_problem_hparams()
  nameDict=model_hparams.data_dir+"/name.dict"
  targetDict=model_hparams.data_dir+"/targets.dict"
  num_lines = sum(1 for line in open(nameDict))
  target_lines= sum(1 for line in open(targetDict))
  p.input_modality = {"inputs": (registry.Modalities.SYMBOL, num_lines)}
  p.vocabulary = {
    "inputs": text_encoder.TokenTextEncoder(vocab_filename=nameDict),
	  "targets": text_encoder.TextEncoder(),
  }
  p.target_modality = (registry.Modalities.CLASS_LABEL, target_lines)
  p.batch_size_multiplier = 4
  p.max_expected_batch_size_per_shard = 8
  p.loss_multiplier = 3.0
  p.input_space_id = 1
  p.target_space_id = 1
  return p

"party_party": lambda p:party_party(p),

The training script I am using:
Since I need to change the t2t codes frequently, I don't install it to my site-package directory.

tensor2tensor/bin/t2t-trainer --data_dir /mnt/5efa3937-4221-48b5-9660-85a4a7eb0cfd/data/ --problems=party_party --model=transformer --hparams_set=transformer_base_single_gpu --keep_checkpoint_max=5 --save_checkpoints_secs=3600 --hparams='batch_size=2048' --output_dir /mnt/5efa3937-4221-48b5-9660-85a4a7eb0cfd/model

The predict script I'm using:


# Decode
DATA_DIR=/mnt/5efa3937-4221-48b5-9660-85a4a7eb0cfd/data
PROBLEM=party_party
MODEL=transformer
DECODE_FILE=decode.txt
TRAIN_DIR=/mnt/5efa3937-4221-48b5-9660-85a4a7eb0cfd/model
HPARAMS=transformer_base_single_gpu
BEAM_SIZE=4
ALPHA=0.6

tensor2tensor/bin/t2t-trainer \
  --data_dir=$DATA_DIR \
  --problems=$PROBLEM \
  --model=$MODEL \
  --hparams_set=$HPARAMS \
  --output_dir=$TRAIN_DIR \
  --train_steps=0 \
  --eval_steps=0 \
  --decode_beam_size=$BEAM_SIZE \
  --decode_alpha=$ALPHA \
  --decode_from_file=$DECODE_FILE

The decode.txt is pretty simple:

324 f2r32f2r5 3f2fsfda fewfoIFE fselfj203 fselfj203 fj203rf2 3jr22iofj dsfslfkj23LJ dsf2f dsfslfkj23LJ dsflw2mf>K

This text will be encoded to int ids just like what the generator does.

However this text is random generated, the procedure is very common to classify the text corpus.
Please help to see where I made wrong. Thanks a lot!

Support framework to plug-in generative adversarial networks

There are like a hundred GANs out there, would be great if you can also provide a general framework to standarize the typical GAN parts into tensor2tensor

tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[27,4,0] = -1 is not in [0, 31488)

Hi all,

When I run the shell of walkthrough training a good English-to-German translation model using the Transformer model, but I encountered the problem.

My problem is:
INFO:tensorflow:Total trainable variables size: 60276736
INFO:tensorflow:Total embedding variables size: 16384
INFO:tensorflow:Total non-embedding variables size: 60260352
INFO:tensorflow:Computing gradients for global model_fn.
INFO:tensorflow:Global model_fn finished.
INFO:tensorflow:Create CheckpointSaverHook.
2017-06-30 15:05:58.562782: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-06-30 15:05:58.562814: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-06-30 15:05:58.562820: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-06-30 15:05:58.562824: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-06-30 15:05:58.562829: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
Traceback (most recent call last):
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 1139, in _do_call
return fn(*args)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 1121, in _run_fn
status, run_metadata)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/contextlib.py", line 66, in exit
next(self.gen)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status
pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[27,4,0] = -1 is not in [0, 31488)
[[Node: symbol_modality_31488_512/parallel_0/symbol_modality_31488_512/shared/Gather = Gather[Tindices=DT_INT32, Tparams=DT_FLOAT, validate_indices=true, _device="/job:localhost/replica:0/task:0/cpu:0"](symbol_modality_31488_512/parallel_0/symbol_modality_31488_512/shared/ConvertGradientToTensor_cc661786, symbol_modality_31488_512/parallel_0/symbol_modality_31488_512/shared/Squeeze)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/sycmss/tc/anaconda3/envs/tensorflow/bin/t2t-trainer", line 83, in
tf.app.run()
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "/home/sycmss/tc/anaconda3/envs/tensorflow/bin/t2t-trainer", line 79, in main
schedule=FLAGS.schedule)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensor2tensor/utils/trainer_utils.py", line 240, in run
run_locally(exp_fn(output_dir))
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensor2tensor/utils/trainer_utils.py", line 532, in run_locally
exp.train()
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/contrib/learn/python/learn/experiment.py", line 275, in train
hooks=self._train_monitors + extra_hooks)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/contrib/learn/python/learn/experiment.py", line 665, in _call_train
monitors=hooks)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/python/util/deprecation.py", line 289, in new_func
return func(*args, **kwargs)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 455, in fit
loss = self._train_model(input_fn=input_fn, hooks=hooks)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 1007, in _train_model
_, loss = mon_sess.run([model_fn_ops.train_op, model_fn_ops.loss])
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/python/training/monitored_session.py", line 505, in run
run_metadata=run_metadata)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/python/training/monitored_session.py", line 842, in run
run_metadata=run_metadata)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/python/training/monitored_session.py", line 798, in run
return self._sess.run(*args, **kwargs)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/python/training/monitored_session.py", line 952, in run
run_metadata=run_metadata)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/python/training/monitored_session.py", line 798, in run
return self._sess.run(*args, **kwargs)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 789, in run
run_metadata_ptr)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 997, in _run
feed_dict_string, options, run_metadata)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 1132, in _do_run
target_list, options, run_metadata)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/python/client/session.py", line 1152, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[27,4,0] = -1 is not in [0, 31488)
[[Node: symbol_modality_31488_512/parallel_0/symbol_modality_31488_512/shared/Gather = Gather[Tindices=DT_INT32, Tparams=DT_FLOAT, validate_indices=true, _device="/job:localhost/replica:0/task:0/cpu:0"](symbol_modality_31488_512/parallel_0/symbol_modality_31488_512/shared/ConvertGradientToTensor_cc661786, symbol_modality_31488_512/parallel_0/symbol_modality_31488_512/shared/Squeeze)]]

Caused by op 'symbol_modality_31488_512/parallel_0/symbol_modality_31488_512/shared/Gather', defined at:
File "/home/sycmss/tc/anaconda3/envs/tensorflow/bin/t2t-trainer", line 83, in
tf.app.run()
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "/home/sycmss/tc/anaconda3/envs/tensorflow/bin/t2t-trainer", line 79, in main
schedule=FLAGS.schedule)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensor2tensor/utils/trainer_utils.py", line 240, in run
run_locally(exp_fn(output_dir))
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensor2tensor/utils/trainer_utils.py", line 532, in run_locally
exp.train()
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/contrib/learn/python/learn/experiment.py", line 275, in train
hooks=self._train_monitors + extra_hooks)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/contrib/learn/python/learn/experiment.py", line 665, in _call_train
monitors=hooks)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/python/util/deprecation.py", line 289, in new_func
return func(*args, **kwargs)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 455, in fit
loss = self._train_model(input_fn=input_fn, hooks=hooks)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 955, in _train_model
model_fn_ops = self._get_train_ops(features, labels)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 1162, in _get_train_ops
return self._call_model_fn(features, labels, model_fn_lib.ModeKeys.TRAIN)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 1133, in _call_model_fn
model_fn_results = self._model_fn(features, labels, **kwargs)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensor2tensor/utils/trainer_utils.py", line 424, in model_fn
len(hparams.problems) - 1)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensor2tensor/utils/trainer_utils.py", line 751, in _cond_on_index
return fn(cur_idx)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensor2tensor/utils/trainer_utils.py", line 406, in nth_model
features, skip=(skipping_is_on and skip_this_one))
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensor2tensor/utils/t2t_model.py", line 377, in model_fn
sharded_features[key], dp)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensor2tensor/utils/modality.py", line 91, in bottom_sharded
return data_parallelism(self.bottom, xs)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensor2tensor/utils/expert_utils.py", line 294, in call
outputs.append(fns[i](*my_args[i], **my_kwargs[i]))
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensor2tensor/models/modalities.py", line 88, in bottom
return self.bottom_simple(x, "shared", reuse=None)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensor2tensor/models/modalities.py", line 80, in bottom_simple
ret = tf.gather(var, x)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/python/ops/gen_array_ops.py", line 1179, in gather
validate_indices=validate_indices, name=name)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
op_def=op_def)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/python/framework/ops.py", line 2506, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/python/framework/ops.py", line 1269, in init
self._traceback = _extract_stack()

InvalidArgumentError (see above for traceback): indices[27,4,0] = -1 is not in [0, 31488)
[[Node: symbol_modality_31488_512/parallel_0/symbol_modality_31488_512/shared/Gather = Gather[Tindices=DT_INT32, Tparams=DT_FLOAT, validate_indices=true, _device="/job:localhost/replica:0/task:0/cpu:0"](symbol_modality_31488_512/parallel_0/symbol_modality_31488_512/shared/ConvertGradientToTensor_cc661786, symbol_modality_31488_512/parallel_0/symbol_modality_31488_512/shared/Squeeze)]]

INFO:tensorflow:Creating experiment, storing model files in /root/t2t_train/wmt_ende_tokens_32k/transformer-transformer_base
INFO:tensorflow:datashard_devices: ['gpu:0']
INFO:tensorflow:caching_devices: None
INFO:tensorflow:Using config: {'_task_type': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f884bf21630>, '_model_dir': '/root/t2t_train/wmt_ende_tokens_32k/transformer-transformer_base', '_save_checkpoints_secs': 600, '_save_summary_steps': 100, '_session_config': allow_soft_placement: true
graph_options {
optimizer_options {
}
}
, '_tf_config': gpu_options {
per_process_gpu_memory_fraction: 1.0
}
, '_task_id': 0, '_tf_random_seed': None, '_num_ps_replicas': 0, '_evaluation_master': '', '_keep_checkpoint_max': 20, '_keep_checkpoint_every_n_hours': 10000, '_master': '', '_is_chief': True, '_num_worker_replicas': 0, '_save_checkpoints_steps': None, '_environment': 'local'}
INFO:tensorflow:Performing Decoding from a file.
INFO:tensorflow:Getting sorted inputs
Traceback (most recent call last):
File "/home/sycmss/tc/anaconda3/envs/tensorflow/bin/t2t-trainer", line 83, in
tf.app.run()
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "/home/sycmss/tc/anaconda3/envs/tensorflow/bin/t2t-trainer", line 79, in main
schedule=FLAGS.schedule)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensor2tensor/utils/trainer_utils.py", line 240, in run
run_locally(exp_fn(output_dir))
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensor2tensor/utils/trainer_utils.py", line 544, in run_locally
decode_from_file(estimator, FLAGS.decode_from_file)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensor2tensor/utils/trainer_utils.py", line 648, in decode_from_file
as_iterable=True)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/python/util/deprecation.py", line 289, in new_func
return func(*args, **kwargs)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 590, in predict
as_iterable=as_iterable)
File "/home/sycmss/tc/anaconda3/envs/tensorflow/lib/python3.4/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 878, in _infer_model
% self._model_dir)
tensorflow.contrib.learn.python.learn.estimators._sklearn.NotFittedError: Couldn't find trained model at /root/t2t_train/wmt_ende_tokens_32k/transformer-transformer_base.
cat: /root/t2t_data/decode_this.txt.transformer.transformer_base.beam4.alpha0.6.decodes: No such file or directory

How should I solve this problem?

Thank you

Tensorboard Support?

Does the trainer currently write out logs for Tensorboard? I looked through the code in utils/trainer_utils.py, and while I see calls to tf.summary.scalar, I don't see a call to tf.summary.FileWriter.

If it does support Tensorboard, how do I configure it?
If not, I'll start working on a pull request tomorrow to implement this.

t2t-trainer: command not found

I have run into some issues installing on my GPU with Ubuntu 14.04.5. it reaches the
Installing collected packages: mpmath, sympy, tensor2tensor line and then there is a permission denied error.

I tried pip installing each individual error into the directory specified and then it said that everything was installed, but the t2t-trainer --registry_help did not work.

this step works fine on my CPU but I also run into the known issue of downloading the dataset.

How to run the Walkthrough example with other data than WMT?

Is it possible to run the Walkthrough example from the website with other data than WMT?

I've tried changing the data paths in wmt.py:

_ENDE_TRAIN_DATASETS = [
    [
        "http://data.statmt.org/wmt16/translation-task/training-parallel-nc-v11.tgz",  # pylint: disable=line-too-long
        ("training-parallel-nc-v11/news-commentary-v11.de-en.en",
         "training-parallel-nc-v11/news-commentary-v11.de-en.de")
    ],
    [
        "http://www.statmt.org/wmt13/training-parallel-commoncrawl.tgz",
        ("commoncrawl.de-en.en", "commoncrawl.de-en.de")
    ],
    [
        "http://www.statmt.org/wmt13/training-parallel-europarl-v7.tgz",
        ("training/europarl-v7.de-en.en", "training/europarl-v7.de-en.de")
    ],
]
_ENDE_TEST_DATASETS = [
    [
        "http://data.statmt.org/wmt16/translation-task/dev.tgz",
        ("dev/newstest2013.en", "dev/newstest2013.de")

But when I run the example with new paths, it still downloads the WMT data...

how to use an existing model to decode in wmt problem

I trained the wmt transformer_base model, and see some model checkpoints like.

-rw-rw-r-- 1 public public         24 Jun 23 09:47 model.ckpt-55975.data-00000-of-00002
-rw-rw-r-- 1 public public 1320167432 Jun 23 09:47 model.ckpt-55975.data-00001-of-00002
-rw-rw-r-- 1 public public      10449 Jun 23 09:47 model.ckpt-55975.index
-rw-rw-r-- 1 public public   24328177 Jun 23 09:47 model.ckpt-55975.meta
-rw-rw-r-- 1 public public         24 Jun 23 09:57 model.ckpt-56426.data-00000-of-00002
-rw-rw-r-- 1 public public 1320167432 Jun 23 09:57 model.ckpt-56426.data-00001-of-00002
-rw-rw-r-- 1 public public      10414 Jun 23 09:57 model.ckpt-56426.index

There are 2 questions:

The model.ckpt-xxxx.data-00001-of-00002 is the saved model?
how to specify the model to use when decoding? I looked at the demo experiment in the readme.md, to see that the --model param is set to transformer, which I thought should be set to a specific model file. Will t2t-trainer automatically use the latest checkpoint of saved model?


DECODE_FILE=$DATA_DIR/decode_this.txt
echo "Hello world" >> $DECODE_FILE
echo "Goodbye world" >> $DECODE_FILE

BEAM_SIZE=4
ALPHA=0.6

t2t-trainer \
  --data_dir=$DATA_DIR \
  --problems=$PROBLEM \
  --model=$MODEL \
  --hparams_set=$HPARAMS \
  --output_dir=$TRAIN_DIR \
  --train_steps=0 \
  --eval_steps=0 \
  --decode_beam_size=$BEAM_SIZE \
  --decode_alpha=$ALPHA \
  --decode_from_file=$DECODE_FILE

Issue when trying to decode a file that was not part of the training

When I try decode a file that was not part of the training/testing set, the following error occurs:

INFO:tensorflow:Performing Decoding from a file.
INFO:tensorflow:Getting sorted inputs
INFO:tensorflow: batch 94
INFO:tensorflow:Deocding batch 0
Traceback (most recent call last):
  File "/usr/local/bin/t2t-trainer", line 83, in <module>
    tf.app.run()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "/usr/local/bin/t2t-trainer", line 79, in main
    schedule=FLAGS.schedule)
  File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/trainer_utils.py", line 240, in run
    run_locally(exp_fn(output_dir))
  File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/trainer_utils.py", line 543, in run_locally
    decode_from_file(estimator, FLAGS.decode_from_file)
  File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/trainer_utils.py", line 645, in decode_from_file
    result_iter = estimator.predict(input_fn=input_fn.next, as_iterable=True)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/deprecation.py", line 289, in new_func
    return func(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 590, in predict
    as_iterable=as_iterable)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 883, in _infer_model
    features = self._get_features_from_input_fn(input_fn)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 863, in _get_features_from_input_fn
    result = input_fn()
  File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/trainer_utils.py", line 725, in _decode_batch_input_fn
    input_ids = vocabulary.encode(inputs)
  File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/data_generators/text_encoder.py", line 132, in encode
    ret = [self._token_to_id[tok] for tok in sentence.strip().split()]
KeyError: '@-@'

The decoding command I use is as follows:

PROBLEM=wmt_ende_bpe32k
MODEL=transformer
HPARAMS=transformer_base

DATA_DIR=$HOME/t2t_data
TMP_DIR=/tmp/t2t_datagen
TRAIN_DIR=$HOME/t2t_train/$PROBLEM/$MODEL-$HPARAMS


BEAM_SIZE=4
ALPHA=0.6

t2t-trainer \
  --data_dir=$DATA_DIR \
  --problems=$PROBLEM \
  --model=$MODEL \
  --hparams_set=$HPARAMS \
  --output_dir=$TRAIN_DIR \
  --train_steps=0 \
  --eval_steps=0 \
  --decode_beam_size=$BEAM_SIZE \
  --decode_alpha=$ALPHA \
  --decode_from_file /tmp/t2t_datagen/newsdev2016.bpe.en

SYMBOL modality vocab size

I'm trying to train a bytes-to-subwords model:

def problem(model_hparams):
    # This vocab file must be present within the data directory.
    vocab_filename = os.path.join(model_hparams.data_dir, 'vocab')

    source_encoder = text_encoder.ByteTextEncoder()
    target_encoder = text_encoder.SubwordTextEncoder(vocab_filename)

    p = problem_hparams.default_problem_hparams()
    p.input_modality = {"inputs": (registry.Modalities.SYMBOL, source_encoder.vocab_size)}
    p.target_modality = (registry.Modalities.SYMBOL, target_encoder.vocab_size)
    p.vocabulary = {
        "inputs": source_encoder,
        "targets": target_encoder,
    }

    return p

This fails catastrophically during model construction. It appears to work if the input & target modalities have the same vocab size (eg switching both to share the same SubwordTextEncoder) but fails if they differ in size. This appears to not be the case for other modalities (eg changing both the above to CLASS_LABEL appears to work).

KITTI datases

I think that would be very cool and attractive adding the support for the KITTI datasets

compatibility TensorFlow 1.1.0

TensorFlow version : 1.1.0
OS:CentOS : 7.0
tensor2tensor version : 1.0.7 from pip

I encounter an exception when I run 'Work Through'.
tensor2tensor/utils/trainer_utils.py use tf.contrib.learn.Estimator.Invoke Estimator constructor with session_config.But In tensorflow 1.1.0 ,tf.contrib.learn.Estimator constructor no session_config args.
So tensor2tensor not compatibility with TensorFlow 1.1.0?

--decode_to_file does not create output file

During inference, I'm not able to create a file containing the inference output.
I've tried --decode_to_file, but no output file is being created...

GPU usage

Hi all,
I tested the training example in readme.
I found that the volatile GPU-util of almost all GPUs are 0% except the first one but took all GPU memories. I'm not sure whether it's a tensorflow or tensor2tensor error.

Thank you

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.26                 Driver Version: 375.26                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla M40 24GB      Off  | 0000:04:00.0     Off |                    0 |
| N/A   56C    P0   187W / 250W |  21871MiB / 22939MiB |    100%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla M40 24GB      Off  | 0000:05:00.0     Off |                    0 |
| N/A   28C    P0    56W / 250W |  21806MiB / 22939MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla M40 24GB      Off  | 0000:08:00.0     Off |                    0 |
| N/A   28C    P0    55W / 250W |  21804MiB / 22939MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla M40 24GB      Off  | 0000:09:00.0     Off |                    0 |
| N/A   29C    P0    55W / 250W |  21804MiB / 22939MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   4  Tesla M40 24GB      Off  | 0000:86:00.0     Off |                    0 |
| N/A   29C    P0    56W / 250W |  21808MiB / 22939MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   5  Tesla M40 24GB      Off  | 0000:87:00.0     Off |                    0 |
| N/A   27C    P0    57W / 250W |  21806MiB / 22939MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   6  Tesla M40 24GB      Off  | 0000:8A:00.0     Off |                    0 |
| N/A   30C    P0    57W / 250W |  21804MiB / 22939MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   7  Tesla M40 24GB      Off  | 0000:8B:00.0     Off |                    0 |
| N/A   27C    P0    56W / 250W |  21804MiB / 22939MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

how to use distributed tensor2tensor

Hi, I am a newbie for distributed tensorflow. I followed the instruction here
https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/docs/distributed_training.md
I created the TF_CONFIG, But, I don't know how to use it during the training in commanline or anyother way.
could you give me some advice or some document instructions.

Training step in Walkthrough fails

I downloaded and ran the tensorflow docker, then started following the walkthough by installing tensor2tensor with pip install, setting the environment variables, and running t2t-datagen.

Next. I ran the t2t-trainer:
t2t-trainer --data_dir=$DATA_DIR --problems=$PROBLEM --model=$MODEL --hparams_set=$HPARAMS --output_dir=$TRAIN_DIR

It looked like it was training for a minute, until it failed with:
t2t-trainer --data_dir=$DATA_DIR --problems=$PROBLEM --model=$MODEL --hparams_set=$HPARAMS --output_dir=$TRAIN_DIR
INFO:tensorflow:Creating experiment, storing model files in /root/t2t_train/wmt_ende_tokens_32k/transformer-transformer_base
INFO:tensorflow:datashard_devices: ['gpu:0']
INFO:tensorflow:caching_devices: None
INFO:tensorflow:Using config: {'_model_dir': '/root/t2t_train/wmt_ende_tokens_32k/transformer-transformer_base', '_save_checkpoints_secs': 600, '_num_ps_replicas': 0, '_keep_checkpoint_max': 20, '_tf_random_seed': None, '_task_type': None, '_environment': 'local', '_is_chief': True, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f054c81bb50>, '_tf_config': gpu_options {
per_process_gpu_memory_fraction: 1.0
}
, '_num_worker_replicas': 0, '_task_id': 0, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_evaluation_master': '', '_keep_checkpoint_every_n_hours': 10000, '_master': '', '_session_config': allow_soft_placement: true
graph_options {
optimizer_options {
}
}
}
INFO:tensorflow:Performing local training.
INFO:tensorflow:datashard_devices: ['gpu:0']
INFO:tensorflow:caching_devices: None
INFO:tensorflow:Doing model_fn_body took 2.320 sec.
INFO:tensorflow:This model_fn took 2.521 sec.
INFO:tensorflow:Weight body/decoder/layer_0/conv_hidden_relu/conv1_single/bias shape (2048,) size 2048
INFO:tensorflow:Weight body/decoder/layer_0/conv_hidden_relu/conv1_single/kernel shape (1, 1, 512, 2048) size 1048576
INFO:tensorflow:Weight body/decoder/layer_0/conv_hidden_relu/conv2_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_0/conv_hidden_relu/conv2_single/kernel shape (1, 1, 2048, 512) size 1048576
INFO:tensorflow:Weight body/decoder/layer_0/decoder_self_attention/output_transform_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_0/decoder_self_attention/output_transform_single/kernel shape (1, 1, 512, 512) size 262144
INFO:tensorflow:Weight body/decoder/layer_0/decoder_self_attention/qkv_transform_single/bias shape (1536,) size 1536
INFO:tensorflow:Weight body/decoder/layer_0/decoder_self_attention/qkv_transform_single/kernel shape (1, 1, 512, 1536) size 786432
INFO:tensorflow:Weight body/decoder/layer_0/encdec_attention/kv_transform_single/bias shape (1024,) size 1024
INFO:tensorflow:Weight body/decoder/layer_0/encdec_attention/kv_transform_single/kernel shape (1, 1, 512, 1024) size 524288
INFO:tensorflow:Weight body/decoder/layer_0/encdec_attention/output_transform_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_0/encdec_attention/output_transform_single/kernel shape (1, 1, 512, 512) size 262144
INFO:tensorflow:Weight body/decoder/layer_0/encdec_attention/q_transform_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_0/encdec_attention/q_transform_single/kernel shape (1, 1, 512, 512) size 262144
INFO:tensorflow:Weight body/decoder/layer_0/layer_norm/layer_norm_bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_0/layer_norm/layer_norm_scale shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_0/layer_norm_1/layer_norm_bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_0/layer_norm_1/layer_norm_scale shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_0/layer_norm_2/layer_norm_bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_0/layer_norm_2/layer_norm_scale shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_1/conv_hidden_relu/conv1_single/bias shape (2048,) size 2048
INFO:tensorflow:Weight body/decoder/layer_1/conv_hidden_relu/conv1_single/kernel shape (1, 1, 512, 2048) size 1048576
INFO:tensorflow:Weight body/decoder/layer_1/conv_hidden_relu/conv2_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_1/conv_hidden_relu/conv2_single/kernel shape (1, 1, 2048, 512) size 1048576
INFO:tensorflow:Weight body/decoder/layer_1/decoder_self_attention/output_transform_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_1/decoder_self_attention/output_transform_single/kernel shape (1, 1, 512, 512) size 262144
INFO:tensorflow:Weight body/decoder/layer_1/decoder_self_attention/qkv_transform_single/bias shape (1536,) size 1536
INFO:tensorflow:Weight body/decoder/layer_1/decoder_self_attention/qkv_transform_single/kernel shape (1, 1, 512, 1536) size 786432
INFO:tensorflow:Weight body/decoder/layer_1/encdec_attention/kv_transform_single/bias shape (1024,) size 1024
INFO:tensorflow:Weight body/decoder/layer_1/encdec_attention/kv_transform_single/kernel shape (1, 1, 512, 1024) size 524288
INFO:tensorflow:Weight body/decoder/layer_1/encdec_attention/output_transform_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_1/encdec_attention/output_transform_single/kernel shape (1, 1, 512, 512) size 262144
INFO:tensorflow:Weight body/decoder/layer_1/encdec_attention/q_transform_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_1/encdec_attention/q_transform_single/kernel shape (1, 1, 512, 512) size 262144
INFO:tensorflow:Weight body/decoder/layer_1/layer_norm/layer_norm_bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_1/layer_norm/layer_norm_scale shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_1/layer_norm_1/layer_norm_bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_1/layer_norm_1/layer_norm_scale shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_1/layer_norm_2/layer_norm_bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_1/layer_norm_2/layer_norm_scale shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_2/conv_hidden_relu/conv1_single/bias shape (2048,) size 2048
INFO:tensorflow:Weight body/decoder/layer_2/conv_hidden_relu/conv1_single/kernel shape (1, 1, 512, 2048) size 1048576
INFO:tensorflow:Weight body/decoder/layer_2/conv_hidden_relu/conv2_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_2/conv_hidden_relu/conv2_single/kernel shape (1, 1, 2048, 512) size 1048576
INFO:tensorflow:Weight body/decoder/layer_2/decoder_self_attention/output_transform_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_2/decoder_self_attention/output_transform_single/kernel shape (1, 1, 512, 512) size 262144
INFO:tensorflow:Weight body/decoder/layer_2/decoder_self_attention/qkv_transform_single/bias shape (1536,) size 1536
INFO:tensorflow:Weight body/decoder/layer_2/decoder_self_attention/qkv_transform_single/kernel shape (1, 1, 512, 1536) size 786432
INFO:tensorflow:Weight body/decoder/layer_2/encdec_attention/kv_transform_single/bias shape (1024,) size 1024
INFO:tensorflow:Weight body/decoder/layer_2/encdec_attention/kv_transform_single/kernel shape (1, 1, 512, 1024) size 524288
INFO:tensorflow:Weight body/decoder/layer_2/encdec_attention/output_transform_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_2/encdec_attention/output_transform_single/kernel shape (1, 1, 512, 512) size 262144
INFO:tensorflow:Weight body/decoder/layer_2/encdec_attention/q_transform_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_2/encdec_attention/q_transform_single/kernel shape (1, 1, 512, 512) size 262144
INFO:tensorflow:Weight body/decoder/layer_2/layer_norm/layer_norm_bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_2/layer_norm/layer_norm_scale shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_2/layer_norm_1/layer_norm_bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_2/layer_norm_1/layer_norm_scale shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_2/layer_norm_2/layer_norm_bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_2/layer_norm_2/layer_norm_scale shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_3/conv_hidden_relu/conv1_single/bias shape (2048,) size 2048
INFO:tensorflow:Weight body/decoder/layer_3/conv_hidden_relu/conv1_single/kernel shape (1, 1, 512, 2048) size 1048576
INFO:tensorflow:Weight body/decoder/layer_3/conv_hidden_relu/conv2_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_3/conv_hidden_relu/conv2_single/kernel shape (1, 1, 2048, 512) size 1048576
INFO:tensorflow:Weight body/decoder/layer_3/decoder_self_attention/output_transform_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_3/decoder_self_attention/output_transform_single/kernel shape (1, 1, 512, 512) size 262144
INFO:tensorflow:Weight body/decoder/layer_3/decoder_self_attention/qkv_transform_single/bias shape (1536,) size 1536
INFO:tensorflow:Weight body/decoder/layer_3/decoder_self_attention/qkv_transform_single/kernel shape (1, 1, 512, 1536) size 786432
INFO:tensorflow:Weight body/decoder/layer_3/encdec_attention/kv_transform_single/bias shape (1024,) size 1024
INFO:tensorflow:Weight body/decoder/layer_3/encdec_attention/kv_transform_single/kernel shape (1, 1, 512, 1024) size 524288
INFO:tensorflow:Weight body/decoder/layer_3/encdec_attention/output_transform_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_3/encdec_attention/output_transform_single/kernel shape (1, 1, 512, 512) size 262144
INFO:tensorflow:Weight body/decoder/layer_3/encdec_attention/q_transform_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_3/encdec_attention/q_transform_single/kernel shape (1, 1, 512, 512) size 262144
INFO:tensorflow:Weight body/decoder/layer_3/layer_norm/layer_norm_bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_3/layer_norm/layer_norm_scale shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_3/layer_norm_1/layer_norm_bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_3/layer_norm_1/layer_norm_scale shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_3/layer_norm_2/layer_norm_bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_3/layer_norm_2/layer_norm_scale shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_4/conv_hidden_relu/conv1_single/bias shape (2048,) size 2048
INFO:tensorflow:Weight body/decoder/layer_4/conv_hidden_relu/conv1_single/kernel shape (1, 1, 512, 2048) size 1048576
INFO:tensorflow:Weight body/decoder/layer_4/conv_hidden_relu/conv2_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_4/conv_hidden_relu/conv2_single/kernel shape (1, 1, 2048, 512) size 1048576
INFO:tensorflow:Weight body/decoder/layer_4/decoder_self_attention/output_transform_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_4/decoder_self_attention/output_transform_single/kernel shape (1, 1, 512, 512) size 262144
INFO:tensorflow:Weight body/decoder/layer_4/decoder_self_attention/qkv_transform_single/bias shape (1536,) size 1536
INFO:tensorflow:Weight body/decoder/layer_4/decoder_self_attention/qkv_transform_single/kernel shape (1, 1, 512, 1536) size 786432
INFO:tensorflow:Weight body/decoder/layer_4/encdec_attention/kv_transform_single/bias shape (1024,) size 1024
INFO:tensorflow:Weight body/decoder/layer_4/encdec_attention/kv_transform_single/kernel shape (1, 1, 512, 1024) size 524288
INFO:tensorflow:Weight body/decoder/layer_4/encdec_attention/output_transform_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_4/encdec_attention/output_transform_single/kernel shape (1, 1, 512, 512) size 262144
INFO:tensorflow:Weight body/decoder/layer_4/encdec_attention/q_transform_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_4/encdec_attention/q_transform_single/kernel shape (1, 1, 512, 512) size 262144
INFO:tensorflow:Weight body/decoder/layer_4/layer_norm/layer_norm_bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_4/layer_norm/layer_norm_scale shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_4/layer_norm_1/layer_norm_bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_4/layer_norm_1/layer_norm_scale shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_4/layer_norm_2/layer_norm_bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_4/layer_norm_2/layer_norm_scale shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_5/conv_hidden_relu/conv1_single/bias shape (2048,) size 2048
INFO:tensorflow:Weight body/decoder/layer_5/conv_hidden_relu/conv1_single/kernel shape (1, 1, 512, 2048) size 1048576
INFO:tensorflow:Weight body/decoder/layer_5/conv_hidden_relu/conv2_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_5/conv_hidden_relu/conv2_single/kernel shape (1, 1, 2048, 512) size 1048576
INFO:tensorflow:Weight body/decoder/layer_5/decoder_self_attention/output_transform_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_5/decoder_self_attention/output_transform_single/kernel shape (1, 1, 512, 512) size 262144
INFO:tensorflow:Weight body/decoder/layer_5/decoder_self_attention/qkv_transform_single/bias shape (1536,) size 1536
INFO:tensorflow:Weight body/decoder/layer_5/decoder_self_attention/qkv_transform_single/kernel shape (1, 1, 512, 1536) size 786432
INFO:tensorflow:Weight body/decoder/layer_5/encdec_attention/kv_transform_single/bias shape (1024,) size 1024
INFO:tensorflow:Weight body/decoder/layer_5/encdec_attention/kv_transform_single/kernel shape (1, 1, 512, 1024) size 524288
INFO:tensorflow:Weight body/decoder/layer_5/encdec_attention/output_transform_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_5/encdec_attention/output_transform_single/kernel shape (1, 1, 512, 512) size 262144
INFO:tensorflow:Weight body/decoder/layer_5/encdec_attention/q_transform_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_5/encdec_attention/q_transform_single/kernel shape (1, 1, 512, 512) size 262144
INFO:tensorflow:Weight body/decoder/layer_5/layer_norm/layer_norm_bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_5/layer_norm/layer_norm_scale shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_5/layer_norm_1/layer_norm_bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_5/layer_norm_1/layer_norm_scale shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_5/layer_norm_2/layer_norm_bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_5/layer_norm_2/layer_norm_scale shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_0/conv_hidden_relu/conv1_single/bias shape (2048,) size 2048
INFO:tensorflow:Weight body/encoder/layer_0/conv_hidden_relu/conv1_single/kernel shape (1, 1, 512, 2048) size 1048576
INFO:tensorflow:Weight body/encoder/layer_0/conv_hidden_relu/conv2_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_0/conv_hidden_relu/conv2_single/kernel shape (1, 1, 2048, 512) size 1048576
INFO:tensorflow:Weight body/encoder/layer_0/encoder_self_attention/output_transform_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_0/encoder_self_attention/output_transform_single/kernel shape (1, 1, 512, 512) size 262144
INFO:tensorflow:Weight body/encoder/layer_0/encoder_self_attention/qkv_transform_single/bias shape (1536,) size 1536
INFO:tensorflow:Weight body/encoder/layer_0/encoder_self_attention/qkv_transform_single/kernel shape (1, 1, 512, 1536) size 786432
INFO:tensorflow:Weight body/encoder/layer_0/layer_norm/layer_norm_bias shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_0/layer_norm/layer_norm_scale shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_0/layer_norm_1/layer_norm_bias shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_0/layer_norm_1/layer_norm_scale shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_1/conv_hidden_relu/conv1_single/bias shape (2048,) size 2048
INFO:tensorflow:Weight body/encoder/layer_1/conv_hidden_relu/conv1_single/kernel shape (1, 1, 512, 2048) size 1048576
INFO:tensorflow:Weight body/encoder/layer_1/conv_hidden_relu/conv2_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_1/conv_hidden_relu/conv2_single/kernel shape (1, 1, 2048, 512) size 1048576
INFO:tensorflow:Weight body/encoder/layer_1/encoder_self_attention/output_transform_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_1/encoder_self_attention/output_transform_single/kernel shape (1, 1, 512, 512) size 262144
INFO:tensorflow:Weight body/encoder/layer_1/encoder_self_attention/qkv_transform_single/bias shape (1536,) size 1536
INFO:tensorflow:Weight body/encoder/layer_1/encoder_self_attention/qkv_transform_single/kernel shape (1, 1, 512, 1536) size 786432
INFO:tensorflow:Weight body/encoder/layer_1/layer_norm/layer_norm_bias shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_1/layer_norm/layer_norm_scale shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_1/layer_norm_1/layer_norm_bias shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_1/layer_norm_1/layer_norm_scale shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_2/conv_hidden_relu/conv1_single/bias shape (2048,) size 2048
INFO:tensorflow:Weight body/encoder/layer_2/conv_hidden_relu/conv1_single/kernel shape (1, 1, 512, 2048) size 1048576
INFO:tensorflow:Weight body/encoder/layer_2/conv_hidden_relu/conv2_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_2/conv_hidden_relu/conv2_single/kernel shape (1, 1, 2048, 512) size 1048576
INFO:tensorflow:Weight body/encoder/layer_2/encoder_self_attention/output_transform_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_2/encoder_self_attention/output_transform_single/kernel shape (1, 1, 512, 512) size 262144
INFO:tensorflow:Weight body/encoder/layer_2/encoder_self_attention/qkv_transform_single/bias shape (1536,) size 1536
INFO:tensorflow:Weight body/encoder/layer_2/encoder_self_attention/qkv_transform_single/kernel shape (1, 1, 512, 1536) size 786432
INFO:tensorflow:Weight body/encoder/layer_2/layer_norm/layer_norm_bias shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_2/layer_norm/layer_norm_scale shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_2/layer_norm_1/layer_norm_bias shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_2/layer_norm_1/layer_norm_scale shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_3/conv_hidden_relu/conv1_single/bias shape (2048,) size 2048
INFO:tensorflow:Weight body/encoder/layer_3/conv_hidden_relu/conv1_single/kernel shape (1, 1, 512, 2048) size 1048576
INFO:tensorflow:Weight body/encoder/layer_3/conv_hidden_relu/conv2_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_3/conv_hidden_relu/conv2_single/kernel shape (1, 1, 2048, 512) size 1048576
INFO:tensorflow:Weight body/encoder/layer_3/encoder_self_attention/output_transform_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_3/encoder_self_attention/output_transform_single/kernel shape (1, 1, 512, 512) size 262144
INFO:tensorflow:Weight body/encoder/layer_3/encoder_self_attention/qkv_transform_single/bias shape (1536,) size 1536
INFO:tensorflow:Weight body/encoder/layer_3/encoder_self_attention/qkv_transform_single/kernel shape (1, 1, 512, 1536) size 786432
INFO:tensorflow:Weight body/encoder/layer_3/layer_norm/layer_norm_bias shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_3/layer_norm/layer_norm_scale shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_3/layer_norm_1/layer_norm_bias shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_3/layer_norm_1/layer_norm_scale shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_4/conv_hidden_relu/conv1_single/bias shape (2048,) size 2048
INFO:tensorflow:Weight body/encoder/layer_4/conv_hidden_relu/conv1_single/kernel shape (1, 1, 512, 2048) size 1048576
INFO:tensorflow:Weight body/encoder/layer_4/conv_hidden_relu/conv2_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_4/conv_hidden_relu/conv2_single/kernel shape (1, 1, 2048, 512) size 1048576
INFO:tensorflow:Weight body/encoder/layer_4/encoder_self_attention/output_transform_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_4/encoder_self_attention/output_transform_single/kernel shape (1, 1, 512, 512) size 262144
INFO:tensorflow:Weight body/encoder/layer_4/encoder_self_attention/qkv_transform_single/bias shape (1536,) size 1536
INFO:tensorflow:Weight body/encoder/layer_4/encoder_self_attention/qkv_transform_single/kernel shape (1, 1, 512, 1536) size 786432
INFO:tensorflow:Weight body/encoder/layer_4/layer_norm/layer_norm_bias shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_4/layer_norm/layer_norm_scale shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_4/layer_norm_1/layer_norm_bias shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_4/layer_norm_1/layer_norm_scale shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_5/conv_hidden_relu/conv1_single/bias shape (2048,) size 2048
INFO:tensorflow:Weight body/encoder/layer_5/conv_hidden_relu/conv1_single/kernel shape (1, 1, 512, 2048) size 1048576
INFO:tensorflow:Weight body/encoder/layer_5/conv_hidden_relu/conv2_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_5/conv_hidden_relu/conv2_single/kernel shape (1, 1, 2048, 512) size 1048576
INFO:tensorflow:Weight body/encoder/layer_5/encoder_self_attention/output_transform_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_5/encoder_self_attention/output_transform_single/kernel shape (1, 1, 512, 512) size 262144
INFO:tensorflow:Weight body/encoder/layer_5/encoder_self_attention/qkv_transform_single/bias shape (1536,) size 1536
INFO:tensorflow:Weight body/encoder/layer_5/encoder_self_attention/qkv_transform_single/kernel shape (1, 1, 512, 1536) size 786432
INFO:tensorflow:Weight body/encoder/layer_5/layer_norm/layer_norm_bias shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_5/layer_norm/layer_norm_scale shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_5/layer_norm_1/layer_norm_bias shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_5/layer_norm_1/layer_norm_scale shape (512,) size 512
INFO:tensorflow:Weight body/target_space_embedding/kernel shape (32, 512) size 16384
INFO:tensorflow:Weight symbol_modality_31236_512/shared/weights_0 shape (1953, 512) size 999936
INFO:tensorflow:Weight symbol_modality_31236_512/shared/weights_10 shape (1952, 512) size 999424
INFO:tensorflow:Weight symbol_modality_31236_512/shared/weights_11 shape (1952, 512) size 999424
INFO:tensorflow:Weight symbol_modality_31236_512/shared/weights_12 shape (1952, 512) size 999424
INFO:tensorflow:Weight symbol_modality_31236_512/shared/weights_13 shape (1952, 512) size 999424
INFO:tensorflow:Weight symbol_modality_31236_512/shared/weights_14 shape (1952, 512) size 999424
INFO:tensorflow:Weight symbol_modality_31236_512/shared/weights_15 shape (1952, 512) size 999424
INFO:tensorflow:Weight symbol_modality_31236_512/shared/weights_1 shape (1953, 512) size 999936
INFO:tensorflow:Weight symbol_modality_31236_512/shared/weights_2 shape (1953, 512) size 999936
INFO:tensorflow:Weight symbol_modality_31236_512/shared/weights_3 shape (1953, 512) size 999936
INFO:tensorflow:Weight symbol_modality_31236_512/shared/weights_4 shape (1952, 512) size 999424
INFO:tensorflow:Weight symbol_modality_31236_512/shared/weights_5 shape (1952, 512) size 999424
INFO:tensorflow:Weight symbol_modality_31236_512/shared/weights_6 shape (1952, 512) size 999424
INFO:tensorflow:Weight symbol_modality_31236_512/shared/weights_7 shape (1952, 512) size 999424
INFO:tensorflow:Weight symbol_modality_31236_512/shared/weights_8 shape (1952, 512) size 999424
INFO:tensorflow:Weight symbol_modality_31236_512/shared/weights_9 shape (1952, 512) size 999424
INFO:tensorflow:Total trainable variables size: 60147712
INFO:tensorflow:Total embedding variables size: 16384
INFO:tensorflow:Total non-embedding variables size: 60131328
INFO:tensorflow:Computing gradients for global model_fn.
INFO:tensorflow:Global model_fn finished.
INFO:tensorflow:Create CheckpointSaverHook.
2017-06-27 04:34:58.910748: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-06-27 04:34:58.910798: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-06-27 04:34:58.910821: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
INFO:tensorflow:Saving checkpoints for 1 into /root/t2t_train/wmt_ende_tokens_32k/transformer-transformer_base/model.ckpt.
INFO:tensorflow:loss = 8.79561, step = 1
Traceback (most recent call last):
File "/usr/local/bin/t2t-trainer", line 83, in
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "/usr/local/bin/t2t-trainer", line 79, in main
schedule=FLAGS.schedule)
File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/trainer_utils.py", line 240, in run
run_locally(exp_fn(output_dir))
File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/trainer_utils.py", line 531, in run_locally
exp.train()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/experiment.py", line 275, in train
hooks=self._train_monitors + extra_hooks)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/experiment.py", line 665, in _call_train
monitors=hooks)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/deprecation.py", line 289, in new_func
return func(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 455, in fit
loss = self._train_model(input_fn=input_fn, hooks=hooks)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 1007, in _train_model
_, loss = mon_sess.run([model_fn_ops.train_op, model_fn_ops.loss])
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 505, in run
run_metadata=run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 842, in run
run_metadata=run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 798, in run
return self._sess.run(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 952, in run
run_metadata=run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 798, in run
return self._sess.run(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 789, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 997, in _run
feed_dict_string, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1132, in _do_run
target_list, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1152, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[16,1,0] = -1 is not in [0, 31236)
[[Node: symbol_modality_31236_512_1/parallel_0/symbol_modality_31236_512/shared/Gather = Gather[Tindices=DT_INT32, Tparams=DT_FLOAT, validate_indices=true, _device="/job:localhost/replica:0/task:0/cpu:0"](symbol_modality_31236_512_1/parallel_0/symbol_modality_31236_512/shared/ConvertGradientToTensor_cc661786, symbol_modality_31236_512_1/parallel_0/symbol_modality_31236_512/shared/Squeeze)]]

Caused by op u'symbol_modality_31236_512_1/parallel_0/symbol_modality_31236_512/shared/Gather', defined at:
File "/usr/local/bin/t2t-trainer", line 83, in
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "/usr/local/bin/t2t-trainer", line 79, in main
schedule=FLAGS.schedule)
File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/trainer_utils.py", line 240, in run
run_locally(exp_fn(output_dir))
File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/trainer_utils.py", line 531, in run_locally
exp.train()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/experiment.py", line 275, in train
hooks=self._train_monitors + extra_hooks)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/experiment.py", line 665, in _call_train
monitors=hooks)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/deprecation.py", line 289, in new_func
return func(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 455, in fit
loss = self._train_model(input_fn=input_fn, hooks=hooks)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 955, in _train_model
model_fn_ops = self._get_train_ops(features, labels)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 1162, in _get_train_ops
return self._call_model_fn(features, labels, model_fn_lib.ModeKeys.TRAIN)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 1133, in _call_model_fn
model_fn_results = self._model_fn(features, labels, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/trainer_utils.py", line 423, in model_fn
len(hparams.problems) - 1)
File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/trainer_utils.py", line 748, in _cond_on_index
return fn(cur_idx)
File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/trainer_utils.py", line 405, in nth_model
features, train, skip=(skipping_is_on and skip_this_one))
File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/t2t_model.py", line 387, in model_fn
sharded_features["targets"], dp)
File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/modality.py", line 115, in targets_bottom_sharded
return data_parallelism(self.targets_bottom, xs)
File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/expert_utils.py", line 294, in call
outputs.append(fns[i](*my_args[i], **my_kwargs[i]))
File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/models/modalities.py", line 94, in targets_bottom
return self.bottom_simple(x, "shared", reuse=True)
File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/models/modalities.py", line 80, in bottom_simple
ret = tf.gather(var, x)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_array_ops.py", line 1179, in gather
validate_indices=validate_indices, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2506, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1269, in init
self._traceback = _extract_stack()

InvalidArgumentError (see above for traceback): indices[16,1,0] = -1 is not in [0, 31236)
[[Node: symbol_modality_31236_512_1/parallel_0/symbol_modality_31236_512/shared/Gather = Gather[Tindices=DT_INT32, Tparams=DT_FLOAT, validate_indices=true, _device="/job:localhost/replica:0/task:0/cpu:0"](symbol_modality_31236_512_1/parallel_0/symbol_modality_31236_512/shared/ConvertGradientToTensor_cc661786, symbol_modality_31236_512_1/parallel_0/symbol_modality_31236_512/shared/Squeeze)]]

Decoding problem for the wmt_ende_tokens_32k task

Hi,
I run the basic wmt_ende_tokens_32k problem as showed in the Walkthrough in README.md, I use 8 gpus and the loss figure is as follows:

But when I run decoding with decode_from_file, the decoding file is the standard WMT2014 en-de test set(newstest2014.en), but the decoding output means nothing, just show irrelevant decoding sentences as follows:

The input and output can not match, and the BLEU score is 0.

When I run eval script following the answer provided by @lukaszkaiser in #36, the bleu score is nearly 0.006.

So what's the problems? Thanks!

Multi-GPU decoding support

Hi all,

I'm wondering whether tensor2tensor support multi-GPU decoding for now? (wmt translation task)

I'm saying this because when I tried to use multiple GPU cards to decode a data (translation task), the following exception raised, while no exception in a single GPU decoding scenario.

I'm putting the decoding script and full exception trace here. Thank you.

decoding script

t2t-trainer   --data_dir=/tensor2tensor/t2t_data   --problems=wmt_ende_tokens_32k \
    --model=transformer   --hparams_set=transformer_base --worker_gpu=3 \
    --output_dir=/tensor2tensor/exp/8cards/wmt_ende_tokens_32k/transformer-transformer_base \  
    --train_steps=0   --eval_steps=0   --decode_beam_size=4   --decode_alpha=0.6 \
    --decode_use_last_position_only  --decode_batch_size=128 \ 
    --decode_from_file=/tensor2tensor/t2t_data/validate.en

exception info

INFO:tensorflow:Restoring parameters from /search/odin/public/experiments/tensor2tensor/exp/8cards/wmt_ende_tokens_32k/transformer-transformer_base/model.ckpt-56426
2017-06-23 11:34:20.978020: W tensorflow/core/framework/op_kernel.cc:1158] Invalid argument: Number of ways to split should evenly divide the split dimension, but got split_dim 0 (size = 128) and num_split 3
	 [[Node: while/split = Split[T=DT_INT32, num_split=3, _device="/job:localhost/replica:0/task:0/cpu:0"](while/split/split_dim, while/split/Enter)]]
2017-06-23 11:34:20.978210: W tensorflow/core/framework/op_kernel.cc:1158] Invalid argument: Number of ways to split should evenly divide the split dimension, but got split_dim 0 (size = 128) and num_split 3
	 [[Node: while/split = Split[T=DT_INT32, num_split=3, _device="/job:localhost/replica:0/task:0/cpu:0"](while/split/split_dim, while/split/Enter)]]
Traceback (most recent call last):
  File "/search/odin/public/anaconda2/bin/t2t-trainer", line 4, in <module>
    __import__('pkg_resources').run_script('tensor2tensor==1.0.4', 't2t-trainer')
  File "/search/odin/public/anaconda2/lib/python2.7/site-packages/pkg_resources/__init__.py", line 739, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/search/odin/public/anaconda2/lib/python2.7/site-packages/pkg_resources/__init__.py", line 1507, in run_script
    exec(script_code, namespace, namespace)
  File "/search/odin/public/anaconda2/lib/python2.7/site-packages/tensor2tensor-1.0.4-py2.7.egg/EGG-INFO/scripts/t2t-trainer", line 55, in <module>
    
  File "/search/odin/public/anaconda2/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "/search/odin/public/anaconda2/lib/python2.7/site-packages/tensor2tensor-1.0.4-py2.7.egg/EGG-INFO/scripts/t2t-trainer", line 51, in main
    
  File "build/bdist.linux-x86_64/egg/tensor2tensor/utils/trainer_utils.py", line 240, in run
  File "build/bdist.linux-x86_64/egg/tensor2tensor/utils/trainer_utils.py", line 543, in run_locally
  File "build/bdist.linux-x86_64/egg/tensor2tensor/utils/trainer_utils.py", line 646, in decode_from_file
  File "/search/odin/public/anaconda2/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 902, in _predict_generator
    preds = mon_sess.run(predictions, feed_fn() if feed_fn else None)
  File "/search/odin/public/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 505, in run
    run_metadata=run_metadata)
  File "/search/odin/public/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 842, in run
    run_metadata=run_metadata)
  File "/search/odin/public/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 798, in run
    return self._sess.run(*args, **kwargs)
  File "/search/odin/public/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 952, in run
    run_metadata=run_metadata)
  File "/search/odin/public/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/monitored_session.py", line 798, in run
    return self._sess.run(*args, **kwargs)
  File "/search/odin/public/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 789, in run
    run_metadata_ptr)
  File "/search/odin/public/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 997, in _run
    feed_dict_string, options, run_metadata)
  File "/search/odin/public/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1132, in _do_run
    target_list, options, run_metadata)
  File "/search/odin/public/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1152, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Number of ways to split should evenly divide the split dimension, but got split_dim 0 (size = 128) and num_split 3
	 [[Node: while/split = Split[T=DT_INT32, num_split=3, _device="/job:localhost/replica:0/task:0/cpu:0"](while/split/split_dim, while/split/Enter)]]
	 [[Node: while/GatherNd/_1405 = _HostRecv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_4080_while/GatherNd", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/gpu:0"](^_cloopwhile/parallel_0/Identity/_1292)]]

Caused by op u'while/split', defined at:
  File "/search/odin/public/anaconda2/bin/t2t-trainer", line 4, in <module>
    __import__('pkg_resources').run_script('tensor2tensor==1.0.4', 't2t-trainer')
  File "/search/odin/public/anaconda2/lib/python2.7/site-packages/pkg_resources/__init__.py", line 739, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/search/odin/public/anaconda2/lib/python2.7/site-packages/pkg_resources/__init__.py", line 1507, in run_script
    exec(script_code, namespace, namespace)
  File "/search/odin/public/anaconda2/lib/python2.7/site-packages/tensor2tensor-1.0.4-py2.7.egg/EGG-INFO/scripts/t2t-trainer", line 55, in <module>
  File "/search/odin/public/anaconda2/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "/search/odin/public/anaconda2/lib/python2.7/site-packages/tensor2tensor-1.0.4-py2.7.egg/EGG-INFO/scripts/t2t-trainer", line 51, in main
  File "build/bdist.linux-x86_64/egg/tensor2tensor/utils/trainer_utils.py", line 240, in run
    run_locally(exp_fn(output_dir))
  File "build/bdist.linux-x86_64/egg/tensor2tensor/utils/trainer_utils.py", line 543, in run_locally
    decode_from_file(estimator, FLAGS.decode_from_file)
  File "build/bdist.linux-x86_64/egg/tensor2tensor/utils/trainer_utils.py", line 645, in decode_from_file
    result_iter = estimator.predict(input_fn=input_fn.next, as_iterable=True)
  File "/search/odin/public/anaconda2/lib/python2.7/site-packages/tensorflow/python/util/deprecation.py", line 289, in new_func
    return func(*args, **kwargs)
  File "/search/odin/public/anaconda2/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 590, in predict
    as_iterable=as_iterable)
  File "/search/odin/public/anaconda2/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 884, in _infer_model
    infer_ops = self._get_predict_ops(features)
  File "/search/odin/public/anaconda2/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 1218, in _get_predict_ops
    return self._call_model_fn(features, labels, model_fn_lib.ModeKeys.INFER)
  File "/search/odin/public/anaconda2/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 1133, in _call_model_fn
    model_fn_results = self._model_fn(features, labels, **kwargs)
  File "build/bdist.linux-x86_64/egg/tensor2tensor/utils/trainer_utils.py", line 423, in model_fn
    len(hparams.problems) - 1)
  File "build/bdist.linux-x86_64/egg/tensor2tensor/utils/trainer_utils.py", line 748, in _cond_on_index
    return fn(cur_idx)
  File "build/bdist.linux-x86_64/egg/tensor2tensor/utils/trainer_utils.py", line 396, in nth_model
    decode_length=FLAGS.decode_extra_length)
  File "build/bdist.linux-x86_64/egg/tensor2tensor/utils/t2t_model.py", line 154, in infer
    last_position_only, alpha)
  File "build/bdist.linux-x86_64/egg/tensor2tensor/utils/t2t_model.py", line 211, in _beam_decode
    alpha)
  File "build/bdist.linux-x86_64/egg/tensor2tensor/utils/beam_search.py", line 405, in beam_search
    back_prop=False)
  File "/search/odin/public/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2766, in while_loop
    result = context.BuildLoop(cond, body, loop_vars, shape_invariants)
  File "/search/odin/public/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2595, in BuildLoop
    pred, body, original_loop_vars, loop_vars, shape_invariants)
  File "/search/odin/public/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2545, in _BuildLoop
    body_result = body(*packed_vars_for_body)
  File "build/bdist.linux-x86_64/egg/tensor2tensor/utils/beam_search.py", line 336, in inner_loop
    i, alive_seq, alive_log_probs)
  File "build/bdist.linux-x86_64/egg/tensor2tensor/utils/beam_search.py", line 240, in grow_topk
    flat_logits = symbols_to_logits_fn(flat_ids)
  File "build/bdist.linux-x86_64/egg/tensor2tensor/utils/t2t_model.py", line 181, in symbols_to_logits_fn
    features, False, last_position_only=last_position_only)
  File "build/bdist.linux-x86_64/egg/tensor2tensor/utils/t2t_model.py", line 352, in model_fn
    sharded_features = self._shard_features(features)
  File "build/bdist.linux-x86_64/egg/tensor2tensor/utils/t2t_model.py", line 332, in _shard_features
    0))
  File "/search/odin/public/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/array_ops.py", line 1214, in split
    split_dim=axis, num_split=num_or_size_splits, value=value, name=name)
  File "/search/odin/public/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/gen_array_ops.py", line 3261, in _split
    num_split=num_split, name=name)
  File "/search/odin/public/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
    op_def=op_def)
  File "/search/odin/public/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2506, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/search/odin/public/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1269, in __init__
    self._traceback = _extract_stack()

InvalidArgumentError (see above for traceback): Number of ways to split should evenly divide the split dimension, but got split_dim 0 (size = 128) and num_split 3
	 [[Node: while/split = Split[T=DT_INT32, num_split=3, _device="/job:localhost/replica:0/task:0/cpu:0"](while/split/split_dim, while/split/Enter)]]
	 [[Node: while/GatherNd/_1405 = _HostRecv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_4080_while/GatherNd", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/gpu:0"](^_cloopwhile/parallel_0/Identity/_1292)]]

Toy DataSet - algorithmic_reverse_nlplike

Would a Toy DataSet(e.g. predict a reverse sentence) be useful for sanity-check as used in tf-seq2seq?

I followed the following guideline to register a new hyperparameter sets,but failed.

I followed the following guideline:

You can currently do so for models, hyperparameter sets, and modalities. Please do submit a pull request if your component might be useful to others.

Here's an example with a new hyperparameter set:

In ~/usr/t2t_usr/my_registrations.py

from tensor2tensor.models import transformer
from tensor2tensor.utils import registry

@registry.register_hparams
def transformer_my_very_own_hparams_set():
hparams = transformer.transformer_base()
hparams.hidden_size = 1024
...

In ~/usr/t2t_usr/init.py

import my_registrations
t2t-trainer --t2t_usr_dir=~/usr/t2t_usr --registry_help
You'll see under the registered HParams your transformer_my_very_own_hparams_set, which you can directly use on the command line with the --hparams_set flag.

do the same, but I could not find "transformer_my_very_own_hparams_set" from the result, here is the log:

[@nmyjs_160_20 t2t_usr]# t2t-trainer --t2t_usr_dir=~/usr/t2t_usr --registry_help
INFO:tensorflow:
Registry contents:

Models: ['attention_lm', 'attention_lm_moe', 'baseline_lstm_seq2seq', 'byte_net', 'diagonal_neural_gpu', 'multi_model', 'neural_gpu', 'slice_net', 'transformer', 'xception']

HParams (by model):
* attention: ['attention_lm_base', 'attention_lm_moe_base', 'attention_lm_moe_large', 'attention_lm_moe_small']
* basic: ['basic_1']
* bytenet: ['bytenet_base']
* multimodel: ['multimodel_1p8']
* neuralgpu: ['neuralgpu_1']
* slicenet: ['slicenet_1', 'slicenet_1noam', 'slicenet_1tiny']
* transformer: ['transformer_base', 'transformer_base_single_gpu', 'transformer_big', 'transformer_big_dr1', 'transformer_big_dr2', 'transformer_big_enfr', 'transformer_big_single_gpu', 'transformer_dr0', 'transformer_dr2', 'transformer_ff1024', 'transformer_ff4096', 'transformer_h1', 'transformer_h16', 'transformer_h32', 'transformer_h4', 'transformer_hs1024', 'transformer_hs256', 'transformer_k128', 'transformer_k256', 'transformer_l2', 'transformer_l4', 'transformer_l8', 'transformer_ls0', 'transformer_ls2', 'transformer_parsing_base', 'transformer_parsing_big', 'transformer_tiny']
* xception: ['xception_base']

RangedHParams: ['basic1', 'slicenet1', 'transformer_big_single_gpu']

Modalities: ['audio:audio_spectral_modality', 'audio:default', 'audio:identity', 'class_label:class_label_2d', 'class_label:default', 'class_label:identity', 'generic:default', 'image:default', 'image:identity', 'image:small_image_modality', 'symbol:default', 'symbol:identity']

[@nmyjs_160_20 t2t_usr]# ls
init.py my_registrations.py

is there anyone can help me ?

Lack of LSTM(RNN)-CNN classification problem

For current problems, we have NMT problems and Image Classification problems.
But it's lack of the LSTM-CNN's classification problems.
I've implemented a toy data set which will generate random sequences to represent people names.
Some are boys' names while others are girls.
If all names in the sequence are boys' names, it's classified as 1.
If all names in the sequence are girls' names, it's classified as 2.
Otherwise, it's classified as 3.
Now I'm training my toy dataset using the default Transformer model.
Do you guys think it's good to pull this problem to the master?

Evaluate metrics in WMT task

Hi,
I read the paper Attention is all you need. The results of wmt tasks are really exciting.

But I found that there's no detailed explanation about what exact metrics was used in wmt translation task in the paper.

What I really mean by detailed explanation:

What evaluation script was used? For example, mteval-v11b.pl, or multi-bleu.perl
Is the evaluation case sensitive or insensitive?
Do we need to de-tokenize the output before evaluating?

update

a tiny mis-spelling here
deocding -> decoding

Thank you so much

Walkthrough training error

when run Walkthrough
t2t-trainer
--data_dir=$DATA_DIR
--problems=$PROBLEM
--model=$MODEL
--hparams_set=$HPARAMS
--output_dir=$TRAIN_DIR
--train_steps=0
--eval_steps=0
--decode_beam_size=$BEAM_SIZE
--decode_alpha=$ALPHA
--decode_from_file=$DECODE_FILE

the error infomation is as follows:

INFO:tensorflow:Creating experiment, storing model files in /root/t2t_train/wmt_ende_tokens_32k/transformer-transformer_base
INFO:tensorflow:datashard_devices: ['gpu:0']
INFO:tensorflow:caching_devices: None
Traceback (most recent call last):
File "/usr/local/bin/t2t-trainer", line 83, in
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "/usr/local/bin/t2t-trainer", line 79, in main
schedule=FLAGS.schedule)
File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/trainer_utils.py", line 240, in run
run_locally(exp_fn(output_dir))
File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/trainer_utils.py", line 126, in experiment_fn
eval_steps=eval_steps)
File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/trainer_utils.py", line 138, in create_experiment
model_name=model_name)
File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/trainer_utils.py", line 174, in create_experiment_components
keep_checkpoint_max=FLAGS.keep_checkpoint_max))
TypeError: init() got an unexpected keyword argument 'session_config'
INFO:tensorflow:Creating experiment, storing model files in /root/t2t_train/wmt_ende_tokens_32k/transformer-transformer_base
INFO:tensorflow:datashard_devices: ['gpu:0']
INFO:tensorflow:caching_devices: None
Traceback (most recent call last):
File "/usr/local/bin/t2t-trainer", line 83, in
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "/usr/local/bin/t2t-trainer", line 79, in main
schedule=FLAGS.schedule)
File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/trainer_utils.py", line 240, in run
run_locally(exp_fn(output_dir))
File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/trainer_utils.py", line 126, in experiment_fn
eval_steps=eval_steps)
File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/trainer_utils.py", line 138, in create_experiment
model_name=model_name)
File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/trainer_utils.py", line 174, in create_experiment_components
keep_checkpoint_max=FLAGS.keep_checkpoint_max))
TypeError: init() got an unexpected keyword argument 'session_config'

Data download corrupted when running demo

When running the demo (also in README: English-to-German translation model using the Transformer model from Attention Is All You Need on WMT data.), downloading the data, gives a corrupted version.
Eventually this causes the tokenizer to run into errors.

`PROBLEM=wmt_ende_tokens_32k
MODEL=transformer
HPARAMS=transformer_base

DATA_DIR=$HOME/t2t_data
TMP_DIR=/tmp/t2t_datagen
TRAIN_DIR=$HOME/t2t_train/$PROBLEM/$MODEL-$HPARAMS

mkdir -p $DATA_DIR $TMP_DIR $TRAIN_DIR

Generate data

t2t-datagen
--data_dir=$DATA_DIR
--tmp_dir=$TMP_DIR
--num_shards=100 `

The output of the previous data generation commands:

INFO:tensorflow:Generating training data for wmt_ende_tokens_32k.
INFO:tensorflow:Downloading http://data.statmt.org/wmt16/translation-task/training-parallel-nc-v11.tgz to /tmp/t2t_datagen/training-parallel-nc-v11.tgz
INFO:tensorflow:Succesfully downloaded training-parallel-nc-v11.tgz, 75178032 bytes.
INFO:tensorflow:Reading file: training-parallel-nc-v11/news-commentary-v11.de-en.en
INFO:tensorflow:Reading file: training-parallel-nc-v11/news-commentary-v11.de-en.de
INFO:tensorflow:Downloading http://www.statmt.org/wmt13/training-parallel-commoncrawl.tgz to /tmp/t2t_datagen/training-parallel-commoncrawl.tgz

At this point, the download just keeps hanging eventhough the data has been downloaded succesfully (checked in /tmp/t2t_datagen) and I abort with CTRL-C. When trying again it gives the following error:

Traceback (most recent call last):
File "/usr/local/bin/t2t-datagen", line 361, in
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "/usr/local/bin/t2t-datagen", line 344, in main
training_gen(), FLAGS.problem + UNSHUFFLED_SUFFIX + "-train",
File "/usr/local/bin/t2t-datagen", line 140, in
lambda: wmt.ende_wordpiece_token_generator(FLAGS.tmp_dir, True, 2**15),
File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/data_generators/wmt.py", line 224, in ende_wordpiece_token_generator
tmp_dir, "tokens.vocab.%d" % vocab_size, vocab_size)
File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/data_generators/generator_utils.py", line 220, in get_or_generate_vocab
corpus_tar.extractall(tmp_dir)
File "/usr/lib/python2.7/tarfile.py", line 2079, in extractall
self.extract(tarinfo, path)
File "/usr/lib/python2.7/tarfile.py", line 2116, in extract
self._extract_member(tarinfo, os.path.join(path, tarinfo.name))
File "/usr/lib/python2.7/tarfile.py", line 2192, in _extract_member
self.makefile(tarinfo, targetpath)
File "/usr/lib/python2.7/tarfile.py", line 2233, in makefile
copyfileobj(source, target)
File "/usr/lib/python2.7/tarfile.py", line 266, in copyfileobj
shutil.copyfileobj(src, dst)
File "/usr/lib/python2.7/shutil.py", line 49, in copyfileobj
buf = fsrc.read(length)
File "/usr/lib/python2.7/tarfile.py", line 831, in read
buf += self.fileobj.read(size - len(buf))
File "/usr/lib/python2.7/tarfile.py", line 743, in read
return self.readnormal(size)
File "/usr/lib/python2.7/tarfile.py", line 758, in readnormal
return self.__read(size)
File "/usr/lib/python2.7/tarfile.py", line 748, in __read
buf = self.fileobj.read(size)
File "/usr/lib/python2.7/gzip.py", line 268, in read
self._read(readsize)
File "/usr/lib/python2.7/gzip.py", line 315, in _read
self._read_eof()
File "/usr/lib/python2.7/gzip.py", line 354, in _read_eof
hex(self.crc)))
IOError: CRC check failed 0x75d9e49c != 0xd122220fL

One strategy might be to manually download the final tar.gz from http://www.statmt.org/wmt13/training-parallel-commoncrawl.tgz and unpack it in /tmp/t2t_data. When trying, download is extremely slow, approx. 2 hours for 876MB...

Results of manual download:
Working:

INFO:tensorflow:Not downloading, file already found: /tmp/t2t_datagen/training-parallel-commoncrawl.tgz
INFO:tensorflow:Reading file: commoncrawl.de-en.en
INFO:tensorflow:Reading file: commoncrawl.de-en.de
INFO:tensorflow:Reading file: commoncrawl.fr-en.en
INFO:tensorflow:Reading file: commoncrawl.fr-en.fr

Next in line for (hopefully not too slow) download:

INFO:tensorflow:Downloading
http://www.statmt.org/wmt13/training-parallel-europarl-v7.tgz to /tmp/t2t_datagen/training-parallel-europarl-v7.tgz

The above as an FYI or possible issue to be resolved.

proper size of wmt_ende_tokens_32k-{dev, train}* file?

I got quite low performance compared to the paper.

So, i did some research, and I found the sizes of wmt_ende_tokens_32k-{dev, train}* are too small as follows.
444K wmt_ende_tokens_32k-dev-00000-of-00001
730M wmt_ende_tokens_32k-train-00000-of-00001

I ran t2t_datagen again, then i got the following sizes. (with 100 split option)
820K wmt_ende_tokens_32k-dev-00000-of-00001
14M wmt_ende_tokens_32k-train-00000-of-00100
....
(total 1400M)

what is the proper size of wmt_ende_tokens_32k-* file?

generate_calculus_integrate_sample failure - Exceptions and/or ComplexInfinity

When called from t2t-datagen --problem=algorithmic_calculus_integrate ..., generate_calculus_integrate_sample() in tensor2tensor/data_generators/algorithmic_math.py raises exception, or causes subsequent KeyError exception in int_encoder().

The reason is an attempt to integrate expressions like "(b-d)/(a-a)" w.r.t. "b" - it leads to sympy.polys.polyerrors.PolynomialDivisionFailed or builds expresions with ComplexInfinity (aka zoo), confusing int_encoder().

The straightforward fix would be to put retry loop into calculus_integrate(). Alternatively, random_expr_with_required_var() could be refined.

How to run the reverse_demical40 task?

Here is my run script:

[g@pc:/home/g/Desktop/tensor2tensor/reverse]$ cat run.sh 
PROBLEM=algorithmic_reverse_decimal40
MODEL=baseline_lstm_seq2seq
HPARAMS=basic1
DATA_DIR=./t2t_data
TMP_DIR=./t2t_datagen
TRAIN_DIR=./t2t_train/$PROBLEM/$MODEL-$HPARAMS

mkdir -p $DATA_DIR $TMP_DIR $TRAIN_DIR

# Generate data
t2t-datagen \
  --data_dir=$DATA_DIR \
  --tmp_dir=$TMP_DIR \
  --problem=$PROBLEM

# mv $TMP_DIR/tokens.vocab.32768 $DATA_DIR

# Train
t2t-trainer \
  --data_dir=$DATA_DIR \
  --problems=$PROBLEM \
  --model=$MODEL \
  --hparams_set=$HPARAMS \
  --output_dir=$TRAIN_DIR

# Decode

DECODE_FILE=$DATA_DIR/decode_this.txt
echo "Hello world" >> $DECODE_FILE
echo "Goodbye world" >> $DECODE_FILE

BEAM_SIZE=4
ALPHA=0.6

t2t-trainer \
  --data_dir=$DATA_DIR \
  --problems=$PROBLEM \
  --model=$MODEL \
  --hparams_set=$HPARAMS \
  --output_dir=$TRAIN_DIR \
  --train_steps=0 \
  --eval_steps=0 \
  --beam_size=$BEAM_SIZE \
  --alpha=$ALPHA \
  --decode_from_file=$DECODE_FILE

cat $DECODE_FILE.$MODEL.$HPARAMS.beam$BEAM_SIZE.alpha$ALPHA.decodes

Output:

[g@pc:/home/g/Desktop/tensor2tensor/reverse]$ bash run.sh 
INFO:tensorflow:Generating training data for algorithmic_reverse_decimal40.
INFO:tensorflow:Generating case 0 for algorithmic_reverse_decimal40-unshuffled-train.
INFO:tensorflow:Generating development data for algorithmic_reverse_decimal40.
INFO:tensorflow:Generating case 0 for algorithmic_reverse_decimal40-unshuffled-dev.
INFO:tensorflow:Shuffling data...
INFO:tensorflow:read: 10000
INFO:tensorflow:read: 20000
INFO:tensorflow:read: 30000
INFO:tensorflow:read: 40000
INFO:tensorflow:read: 50000
INFO:tensorflow:read: 60000
INFO:tensorflow:read: 70000
INFO:tensorflow:read: 80000
INFO:tensorflow:read: 90000
INFO:tensorflow:read: 100000
INFO:tensorflow:write: 0
INFO:tensorflow:write: 10000
INFO:tensorflow:write: 20000
INFO:tensorflow:write: 30000
INFO:tensorflow:write: 40000
INFO:tensorflow:write: 50000
INFO:tensorflow:write: 60000
INFO:tensorflow:write: 70000
INFO:tensorflow:write: 80000
INFO:tensorflow:write: 90000
INFO:tensorflow:read: 10000
INFO:tensorflow:write: 0
INFO:tensorflow:Registry contents:

  Models: ['multi_model', 'baseline_lstm_seq2seq', 'slice_net', 'diagonal_neural_gpu', 'byte_net', 'transformer', 'attention_lm', 'neural_gpu', 'xception']

  HParams: ['transformer_h32', 'transformer_big_dr2', 'transformer_big_dr3', 'transformer_big_dr1', 'slicenet1', 'transformer_tiny', 'xception_base', 'transformer_dr2', 'transformer_parsing_base_dr6', 'basic1', 'transformer_k256', 'transformer_h16', 'transformer_ff1024', 'transformer_k128', 'slicenet1tiny', 'transformer_big_enfr', 'multimodel1p8', 'transformer_dr0', 'transformer_base', 'transformer_l8', 'transformer_parsing_big', 'transformer_hs1024', 'slicenet1noam', 'transformer_big_single_gpu', 'attention_lm_base', 'transformer_ff4096', 'transformer_single_gpu', 'transformer_ls2', 'transformer_ls0', 'transformer_hs256', 'neural_gpu1', 'transformer_h1', 'transformer_h4', 'transformer_l4', 'transformer_l2', 'bytenet_base']

  RangedHParams: ['transformer_big_single_gpu', 'basic1', 'slicenet1']
  
INFO:tensorflow:Using config: {'_save_checkpoints_secs': 600, '_num_ps_replicas': 0, '_keep_checkpoint_max': 20, '_task_type': None, '_is_chief': True, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f670e058c10>, '_model_dir': './t2t_train/algorithmic_reverse_decimal40/baseline_lstm_seq2seq-basic1', '_save_checkpoints_steps': None, '_keep_checkpoint_every_n_hours': 10000, '_session_config': allow_soft_placement: true
graph_options {
  optimizer_options {
  }
}
, '_tf_random_seed': None, '_environment': 'local', '_num_worker_replicas': 0, '_task_id': 0, '_save_summary_steps': 100, '_tf_config': gpu_options {
  per_process_gpu_memory_fraction: 1.0
}
, '_evaluation_master': '', '_master': ''}
INFO:tensorflow:datashard_devices: ['gpu:0']
INFO:tensorflow:caching_devices: None
INFO:tensorflow:Performing local training.
INFO:tensorflow:datashard_devices: ['gpu:0']
INFO:tensorflow:caching_devices: None
INFO:tensorflow:Doing model_fn_body took 0.418 sec.
INFO:tensorflow:This model_fn took 0.649 sec.
INFO:tensorflow:Weight    body/lstm_seq2seq/decoder/rnn/multi_rnn_cell/cell_0/basic_lstm_cell/bias              shape    (256,)                 size    256
INFO:tensorflow:Weight    body/lstm_seq2seq/decoder/rnn/multi_rnn_cell/cell_0/basic_lstm_cell/kernel            shape    (128, 256)             size    32768
INFO:tensorflow:Weight    body/lstm_seq2seq/decoder/rnn/multi_rnn_cell/cell_1/basic_lstm_cell/bias              shape    (256,)                 size    256
INFO:tensorflow:Weight    body/lstm_seq2seq/decoder/rnn/multi_rnn_cell/cell_1/basic_lstm_cell/kernel            shape    (128, 256)             size    32768
INFO:tensorflow:Weight    body/lstm_seq2seq/decoder/rnn/multi_rnn_cell/cell_2/basic_lstm_cell/bias              shape    (256,)                 size    256
INFO:tensorflow:Weight    body/lstm_seq2seq/decoder/rnn/multi_rnn_cell/cell_2/basic_lstm_cell/kernel            shape    (128, 256)             size    32768
INFO:tensorflow:Weight    body/lstm_seq2seq/decoder/rnn/multi_rnn_cell/cell_3/basic_lstm_cell/bias              shape    (256,)                 size    256
INFO:tensorflow:Weight    body/lstm_seq2seq/decoder/rnn/multi_rnn_cell/cell_3/basic_lstm_cell/kernel            shape    (128, 256)             size    32768
INFO:tensorflow:Weight    body/lstm_seq2seq/encoder/rnn/multi_rnn_cell/cell_0/basic_lstm_cell/bias              shape    (256,)                 size    256
INFO:tensorflow:Weight    body/lstm_seq2seq/encoder/rnn/multi_rnn_cell/cell_0/basic_lstm_cell/kernel            shape    (128, 256)             size    32768
INFO:tensorflow:Weight    body/lstm_seq2seq/encoder/rnn/multi_rnn_cell/cell_1/basic_lstm_cell/bias              shape    (256,)                 size    256
INFO:tensorflow:Weight    body/lstm_seq2seq/encoder/rnn/multi_rnn_cell/cell_1/basic_lstm_cell/kernel            shape    (128, 256)             size    32768
INFO:tensorflow:Weight    body/lstm_seq2seq/encoder/rnn/multi_rnn_cell/cell_2/basic_lstm_cell/bias              shape    (256,)                 size    256
INFO:tensorflow:Weight    body/lstm_seq2seq/encoder/rnn/multi_rnn_cell/cell_2/basic_lstm_cell/kernel            shape    (128, 256)             size    32768
INFO:tensorflow:Weight    body/lstm_seq2seq/encoder/rnn/multi_rnn_cell/cell_3/basic_lstm_cell/bias              shape    (256,)                 size    256
INFO:tensorflow:Weight    body/lstm_seq2seq/encoder/rnn/multi_rnn_cell/cell_3/basic_lstm_cell/kernel            shape    (128, 256)             size    32768
INFO:tensorflow:Weight    symbol_modality_11_64/input_emb/weights_0                                             shape    (1, 64)                size    64
INFO:tensorflow:Weight    symbol_modality_11_64/input_emb/weights_10                                            shape    (1, 64)                size    64
INFO:tensorflow:Weight    symbol_modality_11_64/input_emb/weights_11                                            shape    (0, 64)                size    0
INFO:tensorflow:Weight    symbol_modality_11_64/input_emb/weights_12                                            shape    (0, 64)                size    0
INFO:tensorflow:Weight    symbol_modality_11_64/input_emb/weights_13                                            shape    (0, 64)                size    0
INFO:tensorflow:Weight    symbol_modality_11_64/input_emb/weights_14                                            shape    (0, 64)                size    0
INFO:tensorflow:Weight    symbol_modality_11_64/input_emb/weights_15                                            shape    (0, 64)                size    0
INFO:tensorflow:Weight    symbol_modality_11_64/input_emb/weights_1                                             shape    (1, 64)                size    64
INFO:tensorflow:Weight    symbol_modality_11_64/input_emb/weights_2                                             shape    (1, 64)                size    64
INFO:tensorflow:Weight    symbol_modality_11_64/input_emb/weights_3                                             shape    (1, 64)                size    64
INFO:tensorflow:Weight    symbol_modality_11_64/input_emb/weights_4                                             shape    (1, 64)                size    64
INFO:tensorflow:Weight    symbol_modality_11_64/input_emb/weights_5                                             shape    (1, 64)                size    64
INFO:tensorflow:Weight    symbol_modality_11_64/input_emb/weights_6                                             shape    (1, 64)                size    64
INFO:tensorflow:Weight    symbol_modality_11_64/input_emb/weights_7                                             shape    (1, 64)                size    64
INFO:tensorflow:Weight    symbol_modality_11_64/input_emb/weights_8                                             shape    (1, 64)                size    64
INFO:tensorflow:Weight    symbol_modality_11_64/input_emb/weights_9                                             shape    (1, 64)                size    64
INFO:tensorflow:Weight    symbol_modality_11_64/softmax/weights_0                                               shape    (1, 64)                size    64
INFO:tensorflow:Weight    symbol_modality_11_64/softmax/weights_10                                              shape    (1, 64)                size    64
INFO:tensorflow:Weight    symbol_modality_11_64/softmax/weights_11                                              shape    (0, 64)                size    0
INFO:tensorflow:Weight    symbol_modality_11_64/softmax/weights_12                                              shape    (0, 64)                size    0
INFO:tensorflow:Weight    symbol_modality_11_64/softmax/weights_13                                              shape    (0, 64)                size    0
INFO:tensorflow:Weight    symbol_modality_11_64/softmax/weights_14                                              shape    (0, 64)                size    0
INFO:tensorflow:Weight    symbol_modality_11_64/softmax/weights_15                                              shape    (0, 64)                size    0
INFO:tensorflow:Weight    symbol_modality_11_64/softmax/weights_1                                               shape    (1, 64)                size    64
INFO:tensorflow:Weight    symbol_modality_11_64/softmax/weights_2                                               shape    (1, 64)                size    64
INFO:tensorflow:Weight    symbol_modality_11_64/softmax/weights_3                                               shape    (1, 64)                size    64
INFO:tensorflow:Weight    symbol_modality_11_64/softmax/weights_4                                               shape    (1, 64)                size    64
INFO:tensorflow:Weight    symbol_modality_11_64/softmax/weights_5                                               shape    (1, 64)                size    64
INFO:tensorflow:Weight    symbol_modality_11_64/softmax/weights_6                                               shape    (1, 64)                size    64
INFO:tensorflow:Weight    symbol_modality_11_64/softmax/weights_7                                               shape    (1, 64)                size    64
INFO:tensorflow:Weight    symbol_modality_11_64/softmax/weights_8                                               shape    (1, 64)                size    64
INFO:tensorflow:Weight    symbol_modality_11_64/softmax/weights_9                                               shape    (1, 64)                size    64
INFO:tensorflow:Weight    symbol_modality_11_64/target_emb/weights_0                                            shape    (1, 64)                size    64
INFO:tensorflow:Weight    symbol_modality_11_64/target_emb/weights_10                                           shape    (1, 64)                size    64
INFO:tensorflow:Weight    symbol_modality_11_64/target_emb/weights_11                                           shape    (0, 64)                size    0
INFO:tensorflow:Weight    symbol_modality_11_64/target_emb/weights_12                                           shape    (0, 64)                size    0
INFO:tensorflow:Weight    symbol_modality_11_64/target_emb/weights_13                                           shape    (0, 64)                size    0
INFO:tensorflow:Weight    symbol_modality_11_64/target_emb/weights_14                                           shape    (0, 64)                size    0
INFO:tensorflow:Weight    symbol_modality_11_64/target_emb/weights_15                                           shape    (0, 64)                size    0
INFO:tensorflow:Weight    symbol_modality_11_64/target_emb/weights_1                                            shape    (1, 64)                size    64
INFO:tensorflow:Weight    symbol_modality_11_64/target_emb/weights_2                                            shape    (1, 64)                size    64
INFO:tensorflow:Weight    symbol_modality_11_64/target_emb/weights_3                                            shape    (1, 64)                size    64
INFO:tensorflow:Weight    symbol_modality_11_64/target_emb/weights_4                                            shape    (1, 64)                size    64
INFO:tensorflow:Weight    symbol_modality_11_64/target_emb/weights_5                                            shape    (1, 64)                size    64
INFO:tensorflow:Weight    symbol_modality_11_64/target_emb/weights_6                                            shape    (1, 64)                size    64
INFO:tensorflow:Weight    symbol_modality_11_64/target_emb/weights_7                                            shape    (1, 64)                size    64
INFO:tensorflow:Weight    symbol_modality_11_64/target_emb/weights_8                                            shape    (1, 64)                size    64
INFO:tensorflow:Weight    symbol_modality_11_64/target_emb/weights_9                                            shape    (1, 64)                size    64
INFO:tensorflow:Total trainable variables size: 266304
INFO:tensorflow:Total embedding variables size: 0
INFO:tensorflow:Total non-embedding variables size: 266304
INFO:tensorflow:Computing gradients for global model_fn.
INFO:tensorflow:Global model_fn finished.
INFO:tensorflow:Create CheckpointSaverHook.
2017-06-18 14:00:13.936233: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-06-18 14:00:13.936254: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-06-18 14:00:13.936261: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-06-18 14:00:13.936270: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-06-18 14:00:13.936276: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2017-06-18 14:00:14.068278: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:893] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2017-06-18 14:00:14.068727: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 0 with properties: 
name: GeForce GTX 1080 Ti
major: 6 minor: 1 memoryClockRate (GHz) 1.582
pciBusID 0000:01:00.0
Total memory: 10.91GiB
Free memory: 10.57GiB
2017-06-18 14:00:14.068740: I tensorflow/core/common_runtime/gpu/gpu_device.cc:961] DMA: 0 
2017-06-18 14:00:14.068744: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0:   Y 
2017-06-18 14:00:14.068749: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0)
2017-06-18 14:00:17.068205: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 4488 get requests, put_count=3034 evicted_count=1000 eviction_rate=0.329598 and unsatisfied allocation rate=0.569073
2017-06-18 14:00:17.068238: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:259] Raising pool_size_limit_ from 100 to 110
INFO:tensorflow:Saving checkpoints for 1 into ./t2t_train/algorithmic_reverse_decimal40/baseline_lstm_seq2seq-basic1/model.ckpt.
INFO:tensorflow:loss = inf, step = 1
ERROR:tensorflow:Model diverged with loss = NaN.
Traceback (most recent call last):
  File "/usr/local/bin/t2t-trainer", line 4, in <module>
    __import__('pkg_resources').run_script('tensor2tensor==1.0.2', 't2t-trainer')
  File "/usr/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 719, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/usr/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 1511, in run_script
    exec(script_code, namespace, namespace)
  File "/usr/local/lib/python2.7/dist-packages/tensor2tensor-1.0.2-py2.7.egg/EGG-INFO/scripts/t2t-trainer", line 55, in <module>
    
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "/usr/local/lib/python2.7/dist-packages/tensor2tensor-1.0.2-py2.7.egg/EGG-INFO/scripts/t2t-trainer", line 51, in main
    
  File "build/bdist.linux-x86_64/egg/tensor2tensor/utils/trainer_utils.py", line 234, in run
  File "build/bdist.linux-x86_64/egg/tensor2tensor/utils/trainer_utils.py", line 562, in run_locally
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/deprecation.py", line 289, in new_func
    return func(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 455, in fit
    loss = self._train_model(input_fn=input_fn, hooks=hooks)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 1007, in _train_model
    _, loss = mon_sess.run([model_fn_ops.train_op, model_fn_ops.loss])
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 505, in run
    run_metadata=run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 842, in run
    run_metadata=run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 798, in run
    return self._sess.run(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 960, in run
    run_metadata=run_metadata))
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/basic_session_run_hooks.py", line 477, in after_run
    raise NanLossDuringTrainingError
tensorflow.python.training.basic_session_run_hooks.NanLossDuringTrainingError: NaN loss during training.
INFO:tensorflow:Registry contents:

  Models: ['multi_model', 'baseline_lstm_seq2seq', 'slice_net', 'diagonal_neural_gpu', 'byte_net', 'transformer', 'attention_lm', 'neural_gpu', 'xception']

  HParams: ['transformer_h32', 'transformer_big_dr2', 'transformer_big_dr3', 'transformer_big_dr1', 'slicenet1', 'transformer_tiny', 'xception_base', 'transformer_dr2', 'transformer_parsing_base_dr6', 'basic1', 'transformer_k256', 'transformer_h16', 'transformer_ff1024', 'transformer_k128', 'slicenet1tiny', 'transformer_big_enfr', 'multimodel1p8', 'transformer_dr0', 'transformer_base', 'transformer_l8', 'transformer_parsing_big', 'transformer_hs1024', 'slicenet1noam', 'transformer_big_single_gpu', 'attention_lm_base', 'transformer_ff4096', 'transformer_single_gpu', 'transformer_ls2', 'transformer_ls0', 'transformer_hs256', 'neural_gpu1', 'transformer_h1', 'transformer_h4', 'transformer_l4', 'transformer_l2', 'bytenet_base']

  RangedHParams: ['transformer_big_single_gpu', 'basic1', 'slicenet1']
  
INFO:tensorflow:Using config: {'_save_checkpoints_secs': 600, '_num_ps_replicas': 0, '_keep_checkpoint_max': 20, '_task_type': None, '_is_chief': True, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f35ad359c10>, '_model_dir': './t2t_train/algorithmic_reverse_decimal40/baseline_lstm_seq2seq-basic1', '_save_checkpoints_steps': None, '_keep_checkpoint_every_n_hours': 10000, '_session_config': allow_soft_placement: true
graph_options {
  optimizer_options {
  }
}
, '_tf_random_seed': None, '_environment': 'local', '_num_worker_replicas': 0, '_task_id': 0, '_save_summary_steps': 100, '_tf_config': gpu_options {
  per_process_gpu_memory_fraction: 1.0
}
, '_evaluation_master': '', '_master': ''}
INFO:tensorflow:datashard_devices: ['gpu:0']
INFO:tensorflow:caching_devices: None
INFO:tensorflow:Performing Decoding from a file.
INFO:tensorflow:Getting sorted inputs
INFO:tensorflow: batch 1
INFO:tensorflow:Deocding batch 0
Traceback (most recent call last):
  File "/usr/local/bin/t2t-trainer", line 4, in <module>
    __import__('pkg_resources').run_script('tensor2tensor==1.0.2', 't2t-trainer')
  File "/usr/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 719, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/usr/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 1511, in run_script
    exec(script_code, namespace, namespace)
  File "/usr/local/lib/python2.7/dist-packages/tensor2tensor-1.0.2-py2.7.egg/EGG-INFO/scripts/t2t-trainer", line 55, in <module>
    
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "/usr/local/lib/python2.7/dist-packages/tensor2tensor-1.0.2-py2.7.egg/EGG-INFO/scripts/t2t-trainer", line 51, in main
    
  File "build/bdist.linux-x86_64/egg/tensor2tensor/utils/trainer_utils.py", line 234, in run
  File "build/bdist.linux-x86_64/egg/tensor2tensor/utils/trainer_utils.py", line 623, in run_locally
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/deprecation.py", line 289, in new_func
    return func(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 590, in predict
    as_iterable=as_iterable)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 883, in _infer_model
    features = self._get_features_from_input_fn(input_fn)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 863, in _get_features_from_input_fn
    result = input_fn()
  File "build/bdist.linux-x86_64/egg/tensor2tensor/utils/trainer_utils.py", line 743, in _decode_batch_input_fn
  File "build/bdist.linux-x86_64/egg/tensor2tensor/data_generators/text_encoder.py", line 60, in encode
ValueError: invalid literal for int() with base 10: 'Goodbye'
cat: ./t2t_data/decode_this.txt.baseline_lstm_seq2seq.basic1.beam4.alpha0.6.decodes: No such file or directory

Out of memory on GPU in wmt_ende_tokens_32k

t2t-trainer runs out of GPU memory when training on a single nVidia 1080 GTX (8 GB) with the following parameters:

PROBLEM=wmt_ende_tokens_32k
MODEL=transformer
HPARAMS=transformer_base_single_gpu

Any hints on this?

More specifically, the exception ResourceExhaustedError is raised, cf. the following dump:

2017-06-22 11:55:49.859461: I tensorflow/core/common_runtime/bfc_allocator.cc:693]      Summary of in-use Chunks by size:
2017-06-22 11:55:49.859468: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 142 Chunks of size 256 totalling 35.5KiB
2017-06-22 11:55:49.859473: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 1280 totalling 1.2KiB
2017-06-22 11:55:49.859477: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 414 Chunks of size 2048 totalling 828.0KiB
2017-06-22 11:55:49.859482: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 24 Chunks of size 4096 totalling 96.0KiB
2017-06-22 11:55:49.859487: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 48 Chunks of size 6144 totalling 288.0KiB
2017-06-22 11:55:49.859491: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 48 Chunks of size 8192 totalling 384.0KiB
2017-06-22 11:55:49.859495: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 5 Chunks of size 32768 totalling 160.0KiB
2017-06-22 11:55:49.859500: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 45824 totalling 44.8KiB
2017-06-22 11:55:49.859505: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 4 Chunks of size 65536 totalling 256.0KiB
2017-06-22 11:55:49.859509: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 96 Chunks of size 1048576 totalling 96.00MiB
2017-06-22 11:55:49.859514: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 24 Chunks of size 2097152 totalling 48.00MiB
2017-06-22 11:55:49.859518: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 46 Chunks of size 3145728 totalling 138.00MiB
2017-06-22 11:55:49.859523: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 65 Chunks of size 4030464 totalling 249.84MiB
2017-06-22 11:55:49.859527: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 97 Chunks of size 4194304 totalling 388.00MiB
2017-06-22 11:55:49.859532: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 227 Chunks of size 16711680 totalling 3.53GiB
2017-06-22 11:55:49.859536: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 43 Chunks of size 20889600 totalling 856.64MiB
2017-06-22 11:55:49.859541: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 64487424 totalling 61.50MiB
2017-06-22 11:55:49.859546: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 12 Chunks of size 66846720 totalling 765.00MiB
2017-06-22 11:55:49.859550: I tensorflow/core/common_runtime/bfc_allocator.cc:696] 1 Chunks of size 1288182528 totalling 1.20GiB
2017-06-22 11:55:49.859554: I tensorflow/core/common_runtime/bfc_allocator.cc:700] Sum Total of in-use chunks: 7.28GiB
2017-06-22 11:55:49.859561: I tensorflow/core/common_runtime/bfc_allocator.cc:702] Stats:
Limit:                  7969800192
InUse:                  7813304576
MaxInUse:               7966874112
NumAllocs:                    1912
MaxAllocSize:           1288182528

2017-06-22 11:55:49.859613: W tensorflow/core/common_runtime/bfc_allocator.cc:277] *************************************************************************************************xxx
2017-06-22 11:55:49.859627: W tensorflow/core/framework/op_kernel.cc:1158] Resource exhausted: OOM when allocating tensor with shape[102,80,1,1,31488]
Traceback (most recent call last):
  File "/home/villi/tf/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1139, in _do_call
    return fn(*args)
  File "/home/villi/tf/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1121, in _run_fn
    status, run_metadata)
  File "/usr/lib/python3.6/contextlib.py", line 89, in __exit__
    next(self.gen)
  File "/home/villi/tf/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status
    pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[102,80,1,1,31488]
         [[Node: symbol_modality_31488_512_2/parallel_0_1/symbol_modality_31488_512/padded_cross_entropy/smoothing_cross_entropy/one_hot = OneHot[T=DT_FLOAT, TI=DT_INT32, axis=-1, _device="/job:localhost/replica:0/task:0/gpu:0"](symbol_modality_31488_512_2/parallel_0_1/symbol_modality_31488_512/padded_cross_entropy/pad_with_zeros/pad_to_same_length/Pad_1/_2643, symbol_modality_31488_512_2/parallel_0_1/symbol_modality_31488_512/strided_slice, symbol_modality_31488_512_2/parallel_0_1/symbol_modality_31488_512/padded_cross_entropy/smoothing_cross_entropy/one_hot/on_value, symbol_modality_31488_512_2/parallel_0_1/symbol_modality_31488_512/padded_cross_entropy/smoothing_cross_entropy/truediv)]]
         [[Node: training/train/update/_2730 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_13171_training/train/update", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]

Loss Decline very slowly in wmt_ende_tokens_32k

training on a single nVidia 1080 GTX (12 GB) with the following parameters:
--problems=wmt_ende_tokens_32k
--model=transformer
--hparams_set=transformer_big_single_gpu
--hparams='batch_size=2048'

When the loss value is around 3, the loss value drops very slowly：
step 0~ 30k：loss drops from 8.4 to 3
INFO:tensorflow:global_step/sec: 2.03158
INFO:tensorflow:loss = 3.62252, step = 29601 (49.221 sec)
INFO:tensorflow:global_step/sec: 2.03539
INFO:tensorflow:loss = 3.64336, step = 29701 (49.130 sec)
INFO:tensorflow:global_step/sec: 2.03153
INFO:tensorflow:loss = 3.58582, step = 29801 (49.226 sec)
INFO:tensorflow:global_step/sec: 2.02831
INFO:tensorflow:loss = 3.38816, step = 29901 (49.301 sec)
INFO:tensorflow:global_step/sec: 2.02674
INFO:tensorflow:loss = 3.40213, step = 30001 (49.340 sec)
INFO:tensorflow:global_step/sec: 2.03157
INFO:tensorflow:loss = 3.44571, step = 30101 (49.235 sec)
INFO:tensorflow:global_step/sec: 2.03308
INFO:tensorflow:loss = 3.15277, step = 30201 (49.175 sec)

step 120k：loss is jitter around 3
INFO:tensorflow:loss = 3.12874, step = 125101 (76.470 sec)
INFO:tensorflow:global_step/sec: 2.04413
INFO:tensorflow:loss = 3.09151, step = 125201 (48.925 sec)
INFO:tensorflow:global_step/sec: 2.03683
INFO:tensorflow:loss = 3.2518, step = 125301 (49.093 sec)
INFO:tensorflow:global_step/sec: 2.03616
INFO:tensorflow:loss = 3.90474, step = 125401 (49.113 sec)
INFO:tensorflow:global_step/sec: 2.04036
INFO:tensorflow:loss = 2.87875, step = 125501 (49.010 sec)
INFO:tensorflow:global_step/sec: 2.0414
INFO:tensorflow:loss = 3.47175, step = 125601 (48.986 sec)
INFO:tensorflow:global_step/sec: 2.03132
INFO:tensorflow:loss = 3.00751, step = 125701 (49.230 sec)
INFO:tensorflow:global_step/sec: 2.0305
INFO:tensorflow:loss = 2.81739, step = 125801 (49.247 sec)
INFO:tensorflow:global_step/sec: 2.03291
INFO:tensorflow:loss = 3.60361, step = 125901 (49.191 sec)
INFO:tensorflow:global_step/sec: 2.03915
INFO:tensorflow:loss = 2.91831, step = 126001 (49.041 sec)
INFO:tensorflow:global_step/sec: 2.02992
INFO:tensorflow:loss = 2.98262, step = 126101 (49.263 sec)
INFO:tensorflow:global_step/sec: 2.03459

Is it normal?
Can you give a reference value?

Wish pretrained models provided...

There are many models supported in t2t, wish some pretrained models to be provided for us~

Session error when running distributed training

When I run distributed training following the guides in https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/docs/distributed_training.md,
I configure with 1 ps and 2 workers. The ps works ok, but all the workers show errors:

tensorflow.python.framework.errors_impl.NotFoundError: No session factory registered for the given session options: {target: "10.150.144.48:1111" config: allow_soft_placement: true graph_options { optimizer_options { } }} Registered factories are {DIRECT_SESSION, GRPC_SESSION}.

The details of this error is as follows:

2017-06-25 06:41:26.914625: E tensorflow/core/common_runtime/session.cc:69] Not found: No session factory registered for the given session options: {target: "10.150.144.48:1111" config: allow_soft_placement: true graph_options { optimizer_options { } }} Registered factories are {DIRECT_SESSION, GRPC_SESSION}. {u'cluster': {u'ps': [u'10.150.144.48:3333'], u'worker': [u'10.150.144.48:1111', u'10.150.144.48:2222']}, u'task': {u'index': 0, u'type': u'worker'}} Traceback (most recent call last): File "/usr/local/bin/t2t-trainer", line 62, in <module> tf.app.run() File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run _sys.exit(main(_sys.argv[:1] + flags_passthrough)) File "/usr/local/bin/t2t-trainer", line 58, in main schedule=FLAGS.schedule) File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/trainer_utils.py", line 247, in run output_dir=FLAGS.output_dir) File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/learn_runner.py", line 210, in run return _execute_schedule(experiment, schedule) File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/learn_runner.py", line 47, in _execute_schedule return task() File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/experiment.py", line 275, in train hooks=self._train_monitors + extra_hooks) File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/experiment.py", line 669, in _call_train monitors=hooks) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/deprecation.py", line 289, in new_func return func(*args, **kwargs) File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 455, in fit loss = self._train_model(input_fn=input_fn, hooks=hooks) File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 1003, in _train_model config=self._session_config File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 352, in MonitoredTrainingSession stop_grace_period_secs=stop_grace_period_secs) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 648, in __init__ stop_grace_period_secs=stop_grace_period_secs) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 477, in __init__ self._sess = _RecoverableSession(self._coordinated_creator) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 822, in __init__ _WrappedSession.__init__(self, self._create_session()) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 827, in _create_session return self._sess_creator.create_session() File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 538, in create_session self.tf_sess = self._session_creator.create_session() File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 412, in create_session init_fn=self._scaffold.init_fn) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/session_manager.py", line 273, in prepare_session config=config) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/session_manager.py", line 178, in _restore_checkpoint sess = session.Session(self._target, graph=self._graph, config=config) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1292, in __init__ super(Session, self).__init__(target, graph, config=config) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 562, in __init__ self._session = tf_session.TF_NewDeprecatedSession(opts, status) File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__ self.gen.next() File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status pywrap_tensorflow.TF_GetCode(status)) tensorflow.python.framework.errors_impl.NotFoundError: No session factory registered for the given session options: {target: "10.150.144.48:1111" config: allow_soft_placement: true graph_options { optimizer_options { } }} Registered factories are {DIRECT_SESSION, GRPC_SESSION}. ERROR:tensorflow:================================== Object was never used (type <class 'tensorflow.python.framework.ops.Tensor'>): <tf.Tensor 'report_uninitialized_variables_1/boolean_mask/Gather:0' shape=(?,) dtype=string> If you want to mark it as used call its "mark_used()" method. It was originally created here: ['File "/usr/local/bin/t2t-trainer", line 62, in <module>\n tf.app.run()', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run\n _sys.exit(main(_sys.argv[:1] + flags_passthrough))', 'File "/usr/local/bin/t2t-trainer", line 58, in main\n schedule=FLAGS.schedule)', 'File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/trainer_utils.py", line 247, in run\n output_dir=FLAGS.output_dir)', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/learn_runner.py", line 210, in run\n return _execute_schedule(experiment, schedule)', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/learn_runner.py", line 47, in _execute_schedule\n return task()', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/experiment.py", line 275, in train\n hooks=self._train_monitors + extra_hooks)', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/experiment.py", line 669, in _call_train\n monitors=hooks)', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/deprecation.py", line 289, in new_func\n return func(*args, **kwargs)', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 455, in fit\n loss = self._train_model(input_fn=input_fn, hooks=hooks)', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 1003, in _train_model\n config=self._session_config', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 352, in MonitoredTrainingSession\n stop_grace_period_secs=stop_grace_period_secs)', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 648, in __init__\n stop_grace_period_secs=stop_grace_period_secs)', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 477, in __init__\n self._sess = _RecoverableSession(self._coordinated_creator)', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 822, in __init__\n _WrappedSession.__init__(self, self._create_session())', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 827, in _create_session\n return self._sess_creator.create_session()', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 538, in create_session\n self.tf_sess = self._session_creator.create_session()', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 403, in create_session\n self._scaffold.finalize()', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 192, in finalize\n default_ready_for_local_init_op)', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 254, in get_or_default\n op = default_constructor()', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 189, in default_ready_for_local_init_op\n variables.global_variables())', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/tf_should_use.py", line 170, in wrapped\n return _add_should_use_warning(fn(*args, **kwargs))', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/tf_should_use.py", line 139, in _add_should_use_warning\n wrapped = TFShouldUseWarningWrapper(x)', 'File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/tf_should_use.py", line 96, in __init__\n stack = [s.strip() for s in traceback.format_stack()]'] ==================================

It seems {DIRECT_SESSION, GRPC_SESSION}.` is not registered, So can you help to see this problem?

ImportError: No module named 'cPickle'， however, cPickle is obsolete。。。

$t2t-datagen --data_dir=$DATA_DIR --tmp_dir=$TMP_DIR --num_shards=100 --problem=$PROBLEM
Traceback (most recent call last):
File "/tf1.2py3_venv/venv/bin/t2t-datagen", line 39, in
from tensor2tensor.data_generators import image
File "/tf1.2py3_venv/venv/lib/python3.6/site-packages/tensor2tensor/data_generators/image.py", line 21, in
import cPickle
ImportError: No module named 'cPickle'

Trying to reproduce wmt_ende_bpe32k

I' trying to reproduce wmt_ende_bpe32k. The data generations fails with the following error, however:

INFO:tensorflow:Generating training data for wmt_ende_bpe32k.
Traceback (most recent call last):
  File "/usr/local/bin/t2t-datagen", line 361, in <module>
    tf.app.run()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "/usr/local/bin/t2t-datagen", line 345, in main
    FLAGS.data_dir, FLAGS.num_shards, FLAGS.max_cases)
  File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/data_generators/generator_utils.py", line 113, in generate_files
    for case in generator:
  File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/data_generators/wmt.py", line 83, in token_generator
    source_ints = token_vocab.encode(source.strip()) + eos_list
  File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/data_generators/text_encoder.py", line 120, in encode
    ret = [self._token_to_id[tok] for tok in sentence.strip().split()]
AttributeError: 'TokenTextEncoder' object has no attribute '_token_to_id'

I'm using the following command:

PROBLEM=wmt_ende_bpe32k
MODEL=transformer
HPARAMS=transformer_base

DATA_DIR=$HOME/t2t_data
TMP_DIR=/tmp/t2t_datagen
TRAIN_DIR=$HOME/t2t_train/$PROBLEM/$MODEL-$HPARAMS

mkdir -p $DATA_DIR $TMP_DIR $TRAIN_DIR

# Generate data
t2t-datagen \
  --data_dir=$DATA_DIR \
  --tmp_dir=$TMP_DIR \
  --num_shards=100 \
  --problem=$PROBLEM

mv $TMP_DIR/vocab.bpe.32000 $DATA_DIR

# Train
# *  If you run out of memory, add --hparams='batch_size=2048' or even 1024.
t2t-trainer \
  --data_dir=$DATA_DIR \
  --problems=$PROBLEM \
  --model=$MODEL \
  --hparams_set=$HPARAMS \
  --output_dir=$TRAIN_DIR \
  --worker_gpu=2 \
  --log_dir logs

# Decode

DECODE_FILE=$DATA_DIR/decode_this.txt
echo "Hello world" >> $DECODE_FILE
echo "Goodbye world" >> $DECODE_FILE

BEAM_SIZE=4
ALPHA=0.6

t2t-trainer \
  --data_dir=$DATA_DIR \
  --problems=$PROBLEM \
  --model=$MODEL \
  --hparams_set=$HPARAMS \
  --output_dir=$TRAIN_DIR \
  --train_steps=0 \
  --eval_steps=0 \
  --decode_beam_size=$BEAM_SIZE \
  --decode_alpha=$ALPHA \
  --decode_from_file=$DECODE_FILE

cat $DECODE_FILE.$MODEL.$HPARAMS.beam$BEAM_SIZE.alpha$ALPHA.decodes

I've downloaded wmt16_en_de.tar.gz and placed it in /tmp/t2t_datagen as specified in wmt.py:

def _get_wmt_ende_dataset(directory, filename):
  """Extract the WMT en-de corpus `filename` to directory unless it's there."""
  train_path = os.path.join(directory, filename)
  if not (tf.gfile.Exists(train_path + ".de") and
          tf.gfile.Exists(train_path + ".en")):
    # We expect that this file has been downloaded from:
    # https://drive.google.com/open?id=0B_bZck-ksdkpM25jRUN2X2UxMm8 and placed
    # in `directory`.
    corpus_file = os.path.join(directory, "wmt16_en_de.tar.gz")
    with tarfile.open(corpus_file, "r:gz") as corpus_tar:
      corpus_tar.extractall(directory)
return train_path

WSJ parsing can parse one child only

It seems that words_and_tags_from_wsj_tree can parse wsj trees (lists) like
(A (B (C c)))
but not trees like
(A (B (C c d) e))

This is because it either assumes opening or closing parenthesis for each token.

SubwordTextEncoder should be bytes-based

@lukaszkaiser @vthorsteinsson @nshazeer

@vthorsteinsson's recent PR improved the compatibility between Python 2 and 3 but we seem to have lost some valuable functionality.

We want to have a SubwordTextEncoder that is fully invertible with a limited vocabulary and it should be able to encode anything. i.e. it should operate on bytes exclusively, so that the vocabulary only needs to grow by <= 256 entries. So if the input is Unicode (utf-8 encoded, or otherwise), it will be read in as individual bytes (and not Unicode characters), which means that decoding might break (i.e. the decoder might produce a sequence of bytes that is invalid unicode).

For datasets or tasks that wish to handle Unicode characters directly as part of the vocabulary, there can be a different version of the SubwordTextEncoder that does that (e.g. the one that is currently checked-in).

So the suggestion is to have 2 SubwordTextEncoders, one for just bytes, and another that deals with unicode (pretty much the one that's checked-in).

I may be misunderstanding the current functionality so please correct my mental model where it's wrong.

algorithmic_reverse_decimal40 with baseline_lstm_seq2seq model produce Error NaN

Steps to reproduce

Before training on a new generator(nlplike) i have tried to train on the baseline_lstm_seq2seq model to see how it works on the algorithmic_reverse_decimal40, this is the result(i am inside a Docker container):

root@df1a91a7be96:/t2t# PROBLEM=algorithmic_reverse_decimal40
root@df1a91a7be96:/t2t# MODEL=baseline_lstm_seq2seq
root@df1a91a7be96:/t2t# HPARAMS=basic_1
root@df1a91a7be96:/t2t# DATA_DIR=/tmp/t2t_data
root@df1a91a7be96:/t2t# TMP_DIR=/tmp/t2t_datagen
root@df1a91a7be96:/t2t# TRAIN_DIR=/tmp/t2t_train/$PROBLEM/$MODEL-$HPARAMS
root@df1a91a7be96:/t2t# mkdir -p $DATA_DIR $TMP_DIR $TRAIN_DIR
root@df1a91a7be96:/t2t# # Generate data
root@df1a91a7be96:/t2t# t2t-datagen \
>   --data_dir=$DATA_DIR \
>   --tmp_dir=$TMP_DIR \
>   --problem=$PROBLEM

INFO:tensorflow:Generating training data for algorithmic_reverse_decimal40.
INFO:tensorflow:Generating case 0 for algorithmic_reverse_decimal40-unshuffled-train.
INFO:tensorflow:Generating development data for algorithmic_reverse_decimal40.
INFO:tensorflow:Generating case 0 for algorithmic_reverse_decimal40-unshuffled-dev.
INFO:tensorflow:Shuffling data...
INFO:tensorflow:read: 10000
INFO:tensorflow:write: 0
INFO:tensorflow:read: 10000
INFO:tensorflow:write: 0
INFO:tensorflow:read: 10000
INFO:tensorflow:write: 0
INFO:tensorflow:read: 10000
INFO:tensorflow:write: 0
INFO:tensorflow:read: 10000
INFO:tensorflow:write: 0
INFO:tensorflow:read: 10000
INFO:tensorflow:write: 0
INFO:tensorflow:read: 10000
INFO:tensorflow:write: 0
INFO:tensorflow:read: 10000
INFO:tensorflow:write: 0
INFO:tensorflow:read: 10000
INFO:tensorflow:write: 0
INFO:tensorflow:read: 10000
INFO:tensorflow:write: 0
INFO:tensorflow:read: 10000
INFO:tensorflow:write: 0
root@df1a91a7be96:/t2t# 
root@df1a91a7be96:/t2t# t2t-trainer \
>   --data_dir=$DATA_DIR \
>   --problems=$PROBLEM \
>   --model=$MODEL \
>   --hparams_set=$HPARAMS \
>   --output_dir=$TRAIN_DIR
INFO:tensorflow:Creating experiment, storing model files in /tmp/t2t_train/algorithmic_reverse_decimal40/baseline_lstm_seq2seq-basic_1
INFO:tensorflow:datashard_devices: ['gpu:0']
INFO:tensorflow:caching_devices: None
INFO:tensorflow:Using config: {'_model_dir': '/tmp/t2t_train/algorithmic_reverse_decimal40/baseline_lstm_seq2seq-basic_1', '_save_checkpoints_secs': 600, '_num_ps_replicas': 0, '_keep_checkpoint_max': 20, '_tf_random_seed': None, '_task_type': None, '_environment': 'local', '_is_chief': True, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fdc4e2b5390>, '_tf_config': gpu_options {
  per_process_gpu_memory_fraction: 1.0
}
, '_num_worker_replicas': 0, '_task_id': 0, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_evaluation_master': '', '_keep_checkpoint_every_n_hours': 10000, '_master': '', '_session_config': allow_soft_placement: true
graph_options {
  optimizer_options {
  }
}
}
INFO:tensorflow:Performing local training.
INFO:tensorflow:datashard_devices: ['gpu:0']
INFO:tensorflow:caching_devices: None
INFO:tensorflow:Doing model_fn_body took 0.668 sec.
INFO:tensorflow:This model_fn took 0.983 sec.
INFO:tensorflow:Weight    body/lstm_seq2seq/decoder/rnn/multi_rnn_cell/cell_0/basic_lstm_cell/bias        	shape    (256,)              	size    256
INFO:tensorflow:Weight    body/lstm_seq2seq/decoder/rnn/multi_rnn_cell/cell_0/basic_lstm_cell/kernel      	shape    (128, 256)          	size    32768
INFO:tensorflow:Weight    body/lstm_seq2seq/decoder/rnn/multi_rnn_cell/cell_1/basic_lstm_cell/bias        	shape    (256,)              	size    256
INFO:tensorflow:Weight    body/lstm_seq2seq/decoder/rnn/multi_rnn_cell/cell_1/basic_lstm_cell/kernel      	shape    (128, 256)          	size    32768
INFO:tensorflow:Weight    body/lstm_seq2seq/decoder/rnn/multi_rnn_cell/cell_2/basic_lstm_cell/bias        	shape    (256,)              	size    256
INFO:tensorflow:Weight    body/lstm_seq2seq/decoder/rnn/multi_rnn_cell/cell_2/basic_lstm_cell/kernel      	shape    (128, 256)          	size    32768
INFO:tensorflow:Weight    body/lstm_seq2seq/decoder/rnn/multi_rnn_cell/cell_3/basic_lstm_cell/bias        	shape    (256,)              	size    256
INFO:tensorflow:Weight    body/lstm_seq2seq/decoder/rnn/multi_rnn_cell/cell_3/basic_lstm_cell/kernel      	shape    (128, 256)          	size    32768
INFO:tensorflow:Weight    body/lstm_seq2seq/encoder/rnn/multi_rnn_cell/cell_0/basic_lstm_cell/bias        	shape    (256,)              	size    256
INFO:tensorflow:Weight    body/lstm_seq2seq/encoder/rnn/multi_rnn_cell/cell_0/basic_lstm_cell/kernel      	shape    (128, 256)          	size    32768
INFO:tensorflow:Weight    body/lstm_seq2seq/encoder/rnn/multi_rnn_cell/cell_1/basic_lstm_cell/bias        	shape    (256,)              	size    256
INFO:tensorflow:Weight    body/lstm_seq2seq/encoder/rnn/multi_rnn_cell/cell_1/basic_lstm_cell/kernel      	shape    (128, 256)          	size    32768
INFO:tensorflow:Weight    body/lstm_seq2seq/encoder/rnn/multi_rnn_cell/cell_2/basic_lstm_cell/bias        	shape    (256,)              	size    256
INFO:tensorflow:Weight    body/lstm_seq2seq/encoder/rnn/multi_rnn_cell/cell_2/basic_lstm_cell/kernel      	shape    (128, 256)          	size    32768
INFO:tensorflow:Weight    body/lstm_seq2seq/encoder/rnn/multi_rnn_cell/cell_3/basic_lstm_cell/bias        	shape    (256,)              	size    256
INFO:tensorflow:Weight    body/lstm_seq2seq/encoder/rnn/multi_rnn_cell/cell_3/basic_lstm_cell/kernel      	shape    (128, 256)          	size    32768
INFO:tensorflow:Weight    symbol_modality_12_64/input_emb/weights_0                                       	shape    (1, 64)             	size    64
INFO:tensorflow:Weight    symbol_modality_12_64/input_emb/weights_10                                      	shape    (1, 64)             	size    64
INFO:tensorflow:Weight    symbol_modality_12_64/input_emb/weights_11                                      	shape    (1, 64)             	size    64
INFO:tensorflow:Weight    symbol_modality_12_64/input_emb/weights_12                                      	shape    (0, 64)             	size    0
INFO:tensorflow:Weight    symbol_modality_12_64/input_emb/weights_13                                      	shape    (0, 64)             	size    0
INFO:tensorflow:Weight    symbol_modality_12_64/input_emb/weights_14                                      	shape    (0, 64)             	size    0
INFO:tensorflow:Weight    symbol_modality_12_64/input_emb/weights_15                                      	shape    (0, 64)             	size    0
INFO:tensorflow:Weight    symbol_modality_12_64/input_emb/weights_1                                       	shape    (1, 64)             	size    64
INFO:tensorflow:Weight    symbol_modality_12_64/input_emb/weights_2                                       	shape    (1, 64)             	size    64
INFO:tensorflow:Weight    symbol_modality_12_64/input_emb/weights_3                                       	shape    (1, 64)             	size    64
INFO:tensorflow:Weight    symbol_modality_12_64/input_emb/weights_4                                       	shape    (1, 64)             	size    64
INFO:tensorflow:Weight    symbol_modality_12_64/input_emb/weights_5                                       	shape    (1, 64)             	size    64
INFO:tensorflow:Weight    symbol_modality_12_64/input_emb/weights_6                                       	shape    (1, 64)             	size    64
INFO:tensorflow:Weight    symbol_modality_12_64/input_emb/weights_7                                       	shape    (1, 64)             	size    64
INFO:tensorflow:Weight    symbol_modality_12_64/input_emb/weights_8                                       	shape    (1, 64)             	size    64
INFO:tensorflow:Weight    symbol_modality_12_64/input_emb/weights_9                                       	shape    (1, 64)             	size    64
INFO:tensorflow:Weight    symbol_modality_12_64/softmax/weights_0                                         	shape    (1, 64)             	size    64
INFO:tensorflow:Weight    symbol_modality_12_64/softmax/weights_10                                        	shape    (1, 64)             	size    64
INFO:tensorflow:Weight    symbol_modality_12_64/softmax/weights_11                                        	shape    (1, 64)             	size    64
INFO:tensorflow:Weight    symbol_modality_12_64/softmax/weights_12                                        	shape    (0, 64)             	size    0
INFO:tensorflow:Weight    symbol_modality_12_64/softmax/weights_13                                        	shape    (0, 64)             	size    0
INFO:tensorflow:Weight    symbol_modality_12_64/softmax/weights_14                                        	shape    (0, 64)             	size    0
INFO:tensorflow:Weight    symbol_modality_12_64/softmax/weights_15                                        	shape    (0, 64)             	size    0
INFO:tensorflow:Weight    symbol_modality_12_64/softmax/weights_1                                         	shape    (1, 64)             	size    64
INFO:tensorflow:Weight    symbol_modality_12_64/softmax/weights_2                                         	shape    (1, 64)             	size    64
INFO:tensorflow:Weight    symbol_modality_12_64/softmax/weights_3                                         	shape    (1, 64)             	size    64
INFO:tensorflow:Weight    symbol_modality_12_64/softmax/weights_4                                         	shape    (1, 64)             	size    64
INFO:tensorflow:Weight    symbol_modality_12_64/softmax/weights_5                                         	shape    (1, 64)             	size    64
INFO:tensorflow:Weight    symbol_modality_12_64/softmax/weights_6                                         	shape    (1, 64)             	size    64
INFO:tensorflow:Weight    symbol_modality_12_64/softmax/weights_7                                         	shape    (1, 64)             	size    64
INFO:tensorflow:Weight    symbol_modality_12_64/softmax/weights_8                                         	shape    (1, 64)             	size    64
INFO:tensorflow:Weight    symbol_modality_12_64/softmax/weights_9                                         	shape    (1, 64)             	size    64
INFO:tensorflow:Weight    symbol_modality_12_64/target_emb/weights_0                                      	shape    (1, 64)             	size    64
INFO:tensorflow:Weight    symbol_modality_12_64/target_emb/weights_10                                     	shape    (1, 64)             	size    64
INFO:tensorflow:Weight    symbol_modality_12_64/target_emb/weights_11                                     	shape    (1, 64)             	size    64
INFO:tensorflow:Weight    symbol_modality_12_64/target_emb/weights_12                                     	shape    (0, 64)             	size    0
INFO:tensorflow:Weight    symbol_modality_12_64/target_emb/weights_13                                     	shape    (0, 64)             	size    0
INFO:tensorflow:Weight    symbol_modality_12_64/target_emb/weights_14                                     	shape    (0, 64)             	size    0
INFO:tensorflow:Weight    symbol_modality_12_64/target_emb/weights_15                                     	shape    (0, 64)             	size    0
INFO:tensorflow:Weight    symbol_modality_12_64/target_emb/weights_1                                      	shape    (1, 64)             	size    64
INFO:tensorflow:Weight    symbol_modality_12_64/target_emb/weights_2                                      	shape    (1, 64)             	size    64
INFO:tensorflow:Weight    symbol_modality_12_64/target_emb/weights_3                                      	shape    (1, 64)             	size    64
INFO:tensorflow:Weight    symbol_modality_12_64/target_emb/weights_4                                      	shape    (1, 64)             	size    64
INFO:tensorflow:Weight    symbol_modality_12_64/target_emb/weights_5                                      	shape    (1, 64)             	size    64
INFO:tensorflow:Weight    symbol_modality_12_64/target_emb/weights_6                                      	shape    (1, 64)             	size    64
INFO:tensorflow:Weight    symbol_modality_12_64/target_emb/weights_7                                      	shape    (1, 64)             	size    64
INFO:tensorflow:Weight    symbol_modality_12_64/target_emb/weights_8                                      	shape    (1, 64)             	size    64
INFO:tensorflow:Weight    symbol_modality_12_64/target_emb/weights_9                                      	shape    (1, 64)             	size    64
INFO:tensorflow:Total trainable variables size: 266496
INFO:tensorflow:Total embedding variables size: 0
INFO:tensorflow:Total non-embedding variables size: 266496
INFO:tensorflow:Computing gradients for global model_fn.
INFO:tensorflow:Global model_fn finished.
INFO:tensorflow:Create CheckpointSaverHook.
2017-06-23 11:14:29.014420: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-06-23 11:14:29.014488: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-06-23 11:14:29.014519: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-06-23 11:14:29.076780: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:893] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2017-06-23 11:14:29.077245: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 0 with properties: 
name: GeForce GTX 670MX
major: 3 minor: 0 memoryClockRate (GHz) 0.601
pciBusID 0000:01:00.0
Total memory: 2.94GiB
Free memory: 2.62GiB
2017-06-23 11:14:29.077338: I tensorflow/core/common_runtime/gpu/gpu_device.cc:961] DMA: 0 
2017-06-23 11:14:29.077374: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0:   Y 
2017-06-23 11:14:29.084741: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 670MX, pci bus id: 0000:01:00.0)
2017-06-23 11:14:38.368302: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:247] PoolAllocator: After 5370 get requests, put_count=3301 evicted_count=1000 eviction_rate=0.302939 and unsatisfied allocation rate=0.59013
2017-06-23 11:14:38.368375: I tensorflow/core/common_runtime/gpu/pool_allocator.cc:259] Raising pool_size_limit_ from 100 to 110
INFO:tensorflow:Saving checkpoints for 1 into /tmp/t2t_train/algorithmic_reverse_decimal40/baseline_lstm_seq2seq-basic_1/model.ckpt.
ERROR:tensorflow:Model diverged with loss = NaN.
Traceback (most recent call last):
  File "/usr/local/bin/t2t-trainer", line 83, in <module>
    tf.app.run()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "/usr/local/bin/t2t-trainer", line 79, in main
    schedule=FLAGS.schedule)
  File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/trainer_utils.py", line 240, in run
    run_locally(exp_fn(output_dir))
  File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/trainer_utils.py", line 531, in run_locally
    exp.train()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/experiment.py", line 275, in train
    hooks=self._train_monitors + extra_hooks)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/experiment.py", line 665, in _call_train
    monitors=hooks)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/deprecation.py", line 289, in new_func
    return func(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 455, in fit
    loss = self._train_model(input_fn=input_fn, hooks=hooks)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 1007, in _train_model
    _, loss = mon_sess.run([model_fn_ops.train_op, model_fn_ops.loss])
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 505, in run
    run_metadata=run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 842, in run
    run_metadata=run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 798, in run
    return self._sess.run(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 960, in run
    run_metadata=run_metadata))
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/basic_session_run_hooks.py", line 477, in after_run
    raise NanLossDuringTrainingError
tensorflow.python.training.basic_session_run_hooks.NanLossDuringTrainingError: NaN loss during training.

I have missed something during configuration? I have also tried the same problem on the transformer model and the training seems fine, but during inference it doesn't reproduce reverse input!!!(Later i'll post the output of this last command and my configuration).

Edit: With transformer everything is ok.

Kinetics dataset and I3C models integration

I think that would be very cool and attractive adding the support for the Kinetics Dataset and the last Inflated 3D ConvNet(I3C) models to tensor2tensor.

Python3 compatibility

There are some holes in the Python 3 compatibility of the Tensor2tensor code. For instance:

In data_generators/generator_utils.py, import urllib needs to be:

import sys
if sys.version_info[0] >= 3:
  import urllib.request as urllib
else:
  import urllib

In data_generators/image.py, import cPickle needs to be:

try:
  import cPickle
except ImportError:
  import pickle as cPickle

Finally, data_generators/tokenizer.py needs to be revised as it assumes that a char ordinal is always in the range (0, 256), which is not a safe assumption in Python 3. A better solution uses a set instead of array subscripts based on char ordinals. Would you like me to submit a revised version in a pull request?

Precompiled binaries are Unix-only, no Win32 version exists

Something wrong with the decoder result of Walkthrough example

I trained the model on two Tesla M60s, each of which is 8G. I did not modify any hyper-parameter. The loss seems not change after 50000 steps.

INFO:tensorflow:Saving checkpoints for 54877 into /data/t2t/t2t_train/wmt_ende_tokens_32k/transformer-transformer_base/model.ckpt.
INFO:tensorflow:global_step/sec: 0.858429
INFO:tensorflow:loss = 1.96779, step = 54901 (116.492 sec)
INFO:tensorflow:global_step/sec: 0.876813
INFO:tensorflow:loss = 1.96174, step = 55001 (114.049 sec)
INFO:tensorflow:global_step/sec: 0.864947
INFO:tensorflow:loss = 1.98628, step = 55101 (115.614 sec)
INFO:tensorflow:global_step/sec: 0.860629
INFO:tensorflow:loss = 2.26156, step = 55201 (116.195 sec)
INFO:tensorflow:global_step/sec: 0.864128
INFO:tensorflow:loss = 1.98318, step = 55301 (115.723 sec)
INFO:tensorflow:Saving checkpoints for 55396 into /data/t2t/t2t_train/wmt_ende_tokens_32k/transformer-transformer_base/model.ckpt.
INFO:tensorflow:global_step/sec: 0.852946
INFO:tensorflow:loss = 2.30657, step = 55401 (117.241 sec)
INFO:tensorflow:global_step/sec: 0.870939
INFO:tensorflow:loss = 2.11571, step = 55501 (114.819 sec)
INFO:tensorflow:global_step/sec: 0.86979
INFO:tensorflow:loss = 1.99461, step = 55601 (114.970 sec)
INFO:tensorflow:global_step/sec: 0.86269
INFO:tensorflow:loss = 2.01496, step = 55701 (115.916 sec)
INFO:tensorflow:global_step/sec: 0.869183
INFO:tensorflow:loss = 1.98261, step = 55801 (115.051 sec)
INFO:tensorflow:global_step/sec: 0.862935
INFO:tensorflow:loss = 1.88075, step = 55901 (115.883 sec)
INFO:tensorflow:Saving checkpoints for 55915 into /data/t2t/t2t_train/wmt_ende_tokens_32k/transformer-transformer_base/model.ckpt.
INFO:tensorflow:global_step/sec: 0.855085
INFO:tensorflow:loss = 1.9415, step = 56001 (116.948 sec)
INFO:tensorflow:global_step/sec: 0.86353
INFO:tensorflow:loss = 2.26614, step = 56101 (115.804 sec)
INFO:tensorflow:global_step/sec: 0.871136
INFO:tensorflow:loss = 2.14308, step = 56201 (114.793 sec)
INFO:tensorflow:global_step/sec: 0.860752
INFO:tensorflow:loss = 1.96734, step = 56301 (116.178 sec)
INFO:tensorflow:global_step/sec: 0.871609
INFO:tensorflow:loss = 1.98928, step = 56401 (114.730 sec)

However, the decoder result does not make any sense. Anyone knows the reason?

INFO:tensorflow:Inference results INPUT: Goodbye world
INFO:tensorflow:Inference results OUTPUT: Esconnectentareaconnectentkannconnectent
INFO:tensorflow:Inference results INPUT: Hello world
INFO:tensorflow:Inference results OUTPUT: Esconnectentareaconnectentkannconnectent

Adding a new dataset + problem

I've tried to add a new text dataset for a basic text classification task: the data generator works flawlessly. I then tried to add my task in problem_hparams.py, but it crashes with:

Variable symbol_modality_13_512/shared/weights_0 does not exist

Why ? I've got the feeling that I added all required hyperparameters. Should I add something else elsewhere ?

Thanks !

Here's my problem hparms:

def txtclassif_tokens(model_hparams):
    p = default_problem_hparams()

    class DetTextEncoder():
        def __init__(self, vocfile):
            import codecs
            # just stupid code that builds a vocabulary dict
            self.v = {}
            with codecs.open(vocfile,"r","utf-8") as f:
                for l in f:
                    s=l.strip()
                    if len(s)>0:
                        i=s.rfind(' ')
                        self.v[s[0:i]]=int(s[i+1:])

        def encode(self, sentence):
            """Converts a space-separated string of tokens to a list of ids."""
            ret = []
            for tok in sentence.strip().split():
                if tok in self.v: ret.append(self.v[tok])
                else: ret.append(self.v['UNK'])
            if self._reverse: ret = ret[::-1]
            return ret

        def decode(self, ids):
            if self._reverse: ids = ids[::-1]
            toks=[]
            for i in ids:
                for w in self.v.keys():
                    if self.v[w]==i:
                        toks.append(w)
                        break
            return ' '.join(toks)
        @property
        def vocab_size(self):
            return len(self.v)

    wvoc = DetTextEncoder(model_hparams.data_dir+"/voc.txt")
    lvoc = DetTextEncoder(model_hparams.data_dir+"/voclab.txt")
    p.input_modality = {
      "inputs": (registry.Modalities.SYMBOL, wvoc.vocab_size)
    }
    p.target_modality = (registry.Modalities.SYMBOL, lvoc.vocab_size)

    p.vocabulary = {
      "inputs": wvoc,
      "targets": lvoc,
    }
    p.input_space_id = 3
    p.target_space_id = 3
    return p

Distributed training configs (Kubernetes, Docker Swarm, etc.)

The tensorflow/ecosystem kubernetes config might be a good place to start.

Ideally, it'd be trivial to launch a distributed tensor2tensor job with Kubernetes using an arbitrary number of GPUs/TPUs.

We already have some documentation for distributed training with tensor2tensor.

If there's someone with experience with Kubernetes or GCP, we'd very much welcome the contribution.

Decoding problem for char-based translation

Hi,

I modified the wmt_ende_characters to translate Macedonian to English (bleu-score after training was 0.526888).

The input sentence is:

Kosovskiot proces na privatizaciјa se ispituva

Then the t2t_trainer command shows some weird output:

INFO:tensorflow:Restoring parameters from t2t_train/model.ckpt-250000
INFO:tensorflow:Inference results INPUT: Mquqxumkqv"rtqegu"pc"rtkxcvk|cekӚc"ug"kurkvwxc
INFO:tensorflow:Inference results OUTPUT: Mukwak.cwave.gurk.fe.ce.sce.gurkwe.ce.ce
INFO:tensorflow:Writing decodes into test.txt.transformer.transformer_base.beam4.alpha0.6.decodes

Tested with version 1.0.5 and 1.0.7. Is this a bug?

Decoding speed per sentence

Hi,

I have trained a transformer_big model for the wmt_ende_tokens_32k problem.
After 37118 steps, I found that it gives a decent result:

INFO:tensorflow:Saving dict for global step 37118:
	global_step = 37118,
	loss = 0.980365,
	metrics-wmt_ende_tokens_32k/accuracy = 0.789868,
	metrics-wmt_ende_tokens_32k/accuracy_per_sequence = 0.0,
	metrics-wmt_ende_tokens_32k/accuracy_top5 = 0.90224,
	metrics-wmt_ende_tokens_32k/bleu_score = 0.493593,
	metrics-wmt_ende_tokens_32k/neg_log_perplexity = -1.11336,
	metrics/accuracy = 0.789868,
	metrics/accuracy_per_sequence = 0.0,
	metrics/accuracy_top5 = 0.90224,
	metrics/bleu_score = 0.493593,
	metrics/neg_log_perplexity = -1.11336

I then tried to translate a newstest2014-deen-src.en file which consists of 10008 lines.
I followed the default HPARAMS setting for transformer_big, and set BEAM_SIZE=3 and ALPHA=0.6.

However, as the decoding process seemed to be taking forever, I re-tried the same process with a smaller file that consisted of just 10 lines. This time, the decoding took approx. 30 seconds after the loading of the learned model parameters.
Taking one second to decode a source sentence seems to be too long as this would suggest that translating a newstest2014-deen-src.en file would take a couple of hours.

Am I missing some options here?

attention_lm based rescoring

@lukaszkaiser Can you please help me with an inference script that basically takes a lot of hypothesis sentences and gives a score for each sentence using tensor2tensor approach. The current rnn based lm approach is quite slow. Meanwhile, I will try training a character level language model using the same technique.

Thanks

Thanks a lot
Sigrid

Data Importer: Faces in the wild

[up-for-grabs] maybe?

tensorflow / tensor2tensor Goto Github PK

tensor2tensor's Issues

decoding script

exception info

In ~/usr/t2t_usr/my_registrations.py

In ~/usr/t2t_usr/init.py

[@nmyjs_160_20 t2t_usr]# t2t-trainer --t2t_usr_dir=~/usr/t2t_usr --registry_help INFO:tensorflow: Registry contents:

update

Generate data

Steps to reproduce

Recommend Projects

Recommend Topics

Recommend Org

[@nmyjs_160_20 t2t_usr]# t2t-trainer --t2t_usr_dir=~/usr/t2t_usr --registry_help
INFO:tensorflow:
Registry contents: