I downloaded and ran the tensorflow docker, then started following the walkthough by installing tensor2tensor with pip install, setting the environment variables, and running t2t-datagen.
Next. I ran the t2t-trainer:
t2t-trainer --data_dir=$DATA_DIR --problems=$PROBLEM --model=$MODEL --hparams_set=$HPARAMS --output_dir=$TRAIN_DIR
It looked like it was training for a minute, until it failed with:
t2t-trainer --data_dir=$DATA_DIR --problems=$PROBLEM --model=$MODEL --hparams_set=$HPARAMS --output_dir=$TRAIN_DIR
INFO:tensorflow:Creating experiment, storing model files in /root/t2t_train/wmt_ende_tokens_32k/transformer-transformer_base
INFO:tensorflow:datashard_devices: ['gpu:0']
INFO:tensorflow:caching_devices: None
INFO:tensorflow:Using config: {'_model_dir': '/root/t2t_train/wmt_ende_tokens_32k/transformer-transformer_base', '_save_checkpoints_secs': 600, '_num_ps_replicas': 0, '_keep_checkpoint_max': 20, '_tf_random_seed': None, '_task_type': None, '_environment': 'local', '_is_chief': True, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f054c81bb50>, '_tf_config': gpu_options {
per_process_gpu_memory_fraction: 1.0
}
, '_num_worker_replicas': 0, '_task_id': 0, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_evaluation_master': '', '_keep_checkpoint_every_n_hours': 10000, '_master': '', '_session_config': allow_soft_placement: true
graph_options {
optimizer_options {
}
}
}
INFO:tensorflow:Performing local training.
INFO:tensorflow:datashard_devices: ['gpu:0']
INFO:tensorflow:caching_devices: None
INFO:tensorflow:Doing model_fn_body took 2.320 sec.
INFO:tensorflow:This model_fn took 2.521 sec.
INFO:tensorflow:Weight body/decoder/layer_0/conv_hidden_relu/conv1_single/bias shape (2048,) size 2048
INFO:tensorflow:Weight body/decoder/layer_0/conv_hidden_relu/conv1_single/kernel shape (1, 1, 512, 2048) size 1048576
INFO:tensorflow:Weight body/decoder/layer_0/conv_hidden_relu/conv2_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_0/conv_hidden_relu/conv2_single/kernel shape (1, 1, 2048, 512) size 1048576
INFO:tensorflow:Weight body/decoder/layer_0/decoder_self_attention/output_transform_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_0/decoder_self_attention/output_transform_single/kernel shape (1, 1, 512, 512) size 262144
INFO:tensorflow:Weight body/decoder/layer_0/decoder_self_attention/qkv_transform_single/bias shape (1536,) size 1536
INFO:tensorflow:Weight body/decoder/layer_0/decoder_self_attention/qkv_transform_single/kernel shape (1, 1, 512, 1536) size 786432
INFO:tensorflow:Weight body/decoder/layer_0/encdec_attention/kv_transform_single/bias shape (1024,) size 1024
INFO:tensorflow:Weight body/decoder/layer_0/encdec_attention/kv_transform_single/kernel shape (1, 1, 512, 1024) size 524288
INFO:tensorflow:Weight body/decoder/layer_0/encdec_attention/output_transform_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_0/encdec_attention/output_transform_single/kernel shape (1, 1, 512, 512) size 262144
INFO:tensorflow:Weight body/decoder/layer_0/encdec_attention/q_transform_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_0/encdec_attention/q_transform_single/kernel shape (1, 1, 512, 512) size 262144
INFO:tensorflow:Weight body/decoder/layer_0/layer_norm/layer_norm_bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_0/layer_norm/layer_norm_scale shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_0/layer_norm_1/layer_norm_bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_0/layer_norm_1/layer_norm_scale shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_0/layer_norm_2/layer_norm_bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_0/layer_norm_2/layer_norm_scale shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_1/conv_hidden_relu/conv1_single/bias shape (2048,) size 2048
INFO:tensorflow:Weight body/decoder/layer_1/conv_hidden_relu/conv1_single/kernel shape (1, 1, 512, 2048) size 1048576
INFO:tensorflow:Weight body/decoder/layer_1/conv_hidden_relu/conv2_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_1/conv_hidden_relu/conv2_single/kernel shape (1, 1, 2048, 512) size 1048576
INFO:tensorflow:Weight body/decoder/layer_1/decoder_self_attention/output_transform_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_1/decoder_self_attention/output_transform_single/kernel shape (1, 1, 512, 512) size 262144
INFO:tensorflow:Weight body/decoder/layer_1/decoder_self_attention/qkv_transform_single/bias shape (1536,) size 1536
INFO:tensorflow:Weight body/decoder/layer_1/decoder_self_attention/qkv_transform_single/kernel shape (1, 1, 512, 1536) size 786432
INFO:tensorflow:Weight body/decoder/layer_1/encdec_attention/kv_transform_single/bias shape (1024,) size 1024
INFO:tensorflow:Weight body/decoder/layer_1/encdec_attention/kv_transform_single/kernel shape (1, 1, 512, 1024) size 524288
INFO:tensorflow:Weight body/decoder/layer_1/encdec_attention/output_transform_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_1/encdec_attention/output_transform_single/kernel shape (1, 1, 512, 512) size 262144
INFO:tensorflow:Weight body/decoder/layer_1/encdec_attention/q_transform_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_1/encdec_attention/q_transform_single/kernel shape (1, 1, 512, 512) size 262144
INFO:tensorflow:Weight body/decoder/layer_1/layer_norm/layer_norm_bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_1/layer_norm/layer_norm_scale shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_1/layer_norm_1/layer_norm_bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_1/layer_norm_1/layer_norm_scale shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_1/layer_norm_2/layer_norm_bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_1/layer_norm_2/layer_norm_scale shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_2/conv_hidden_relu/conv1_single/bias shape (2048,) size 2048
INFO:tensorflow:Weight body/decoder/layer_2/conv_hidden_relu/conv1_single/kernel shape (1, 1, 512, 2048) size 1048576
INFO:tensorflow:Weight body/decoder/layer_2/conv_hidden_relu/conv2_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_2/conv_hidden_relu/conv2_single/kernel shape (1, 1, 2048, 512) size 1048576
INFO:tensorflow:Weight body/decoder/layer_2/decoder_self_attention/output_transform_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_2/decoder_self_attention/output_transform_single/kernel shape (1, 1, 512, 512) size 262144
INFO:tensorflow:Weight body/decoder/layer_2/decoder_self_attention/qkv_transform_single/bias shape (1536,) size 1536
INFO:tensorflow:Weight body/decoder/layer_2/decoder_self_attention/qkv_transform_single/kernel shape (1, 1, 512, 1536) size 786432
INFO:tensorflow:Weight body/decoder/layer_2/encdec_attention/kv_transform_single/bias shape (1024,) size 1024
INFO:tensorflow:Weight body/decoder/layer_2/encdec_attention/kv_transform_single/kernel shape (1, 1, 512, 1024) size 524288
INFO:tensorflow:Weight body/decoder/layer_2/encdec_attention/output_transform_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_2/encdec_attention/output_transform_single/kernel shape (1, 1, 512, 512) size 262144
INFO:tensorflow:Weight body/decoder/layer_2/encdec_attention/q_transform_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_2/encdec_attention/q_transform_single/kernel shape (1, 1, 512, 512) size 262144
INFO:tensorflow:Weight body/decoder/layer_2/layer_norm/layer_norm_bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_2/layer_norm/layer_norm_scale shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_2/layer_norm_1/layer_norm_bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_2/layer_norm_1/layer_norm_scale shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_2/layer_norm_2/layer_norm_bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_2/layer_norm_2/layer_norm_scale shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_3/conv_hidden_relu/conv1_single/bias shape (2048,) size 2048
INFO:tensorflow:Weight body/decoder/layer_3/conv_hidden_relu/conv1_single/kernel shape (1, 1, 512, 2048) size 1048576
INFO:tensorflow:Weight body/decoder/layer_3/conv_hidden_relu/conv2_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_3/conv_hidden_relu/conv2_single/kernel shape (1, 1, 2048, 512) size 1048576
INFO:tensorflow:Weight body/decoder/layer_3/decoder_self_attention/output_transform_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_3/decoder_self_attention/output_transform_single/kernel shape (1, 1, 512, 512) size 262144
INFO:tensorflow:Weight body/decoder/layer_3/decoder_self_attention/qkv_transform_single/bias shape (1536,) size 1536
INFO:tensorflow:Weight body/decoder/layer_3/decoder_self_attention/qkv_transform_single/kernel shape (1, 1, 512, 1536) size 786432
INFO:tensorflow:Weight body/decoder/layer_3/encdec_attention/kv_transform_single/bias shape (1024,) size 1024
INFO:tensorflow:Weight body/decoder/layer_3/encdec_attention/kv_transform_single/kernel shape (1, 1, 512, 1024) size 524288
INFO:tensorflow:Weight body/decoder/layer_3/encdec_attention/output_transform_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_3/encdec_attention/output_transform_single/kernel shape (1, 1, 512, 512) size 262144
INFO:tensorflow:Weight body/decoder/layer_3/encdec_attention/q_transform_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_3/encdec_attention/q_transform_single/kernel shape (1, 1, 512, 512) size 262144
INFO:tensorflow:Weight body/decoder/layer_3/layer_norm/layer_norm_bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_3/layer_norm/layer_norm_scale shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_3/layer_norm_1/layer_norm_bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_3/layer_norm_1/layer_norm_scale shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_3/layer_norm_2/layer_norm_bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_3/layer_norm_2/layer_norm_scale shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_4/conv_hidden_relu/conv1_single/bias shape (2048,) size 2048
INFO:tensorflow:Weight body/decoder/layer_4/conv_hidden_relu/conv1_single/kernel shape (1, 1, 512, 2048) size 1048576
INFO:tensorflow:Weight body/decoder/layer_4/conv_hidden_relu/conv2_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_4/conv_hidden_relu/conv2_single/kernel shape (1, 1, 2048, 512) size 1048576
INFO:tensorflow:Weight body/decoder/layer_4/decoder_self_attention/output_transform_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_4/decoder_self_attention/output_transform_single/kernel shape (1, 1, 512, 512) size 262144
INFO:tensorflow:Weight body/decoder/layer_4/decoder_self_attention/qkv_transform_single/bias shape (1536,) size 1536
INFO:tensorflow:Weight body/decoder/layer_4/decoder_self_attention/qkv_transform_single/kernel shape (1, 1, 512, 1536) size 786432
INFO:tensorflow:Weight body/decoder/layer_4/encdec_attention/kv_transform_single/bias shape (1024,) size 1024
INFO:tensorflow:Weight body/decoder/layer_4/encdec_attention/kv_transform_single/kernel shape (1, 1, 512, 1024) size 524288
INFO:tensorflow:Weight body/decoder/layer_4/encdec_attention/output_transform_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_4/encdec_attention/output_transform_single/kernel shape (1, 1, 512, 512) size 262144
INFO:tensorflow:Weight body/decoder/layer_4/encdec_attention/q_transform_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_4/encdec_attention/q_transform_single/kernel shape (1, 1, 512, 512) size 262144
INFO:tensorflow:Weight body/decoder/layer_4/layer_norm/layer_norm_bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_4/layer_norm/layer_norm_scale shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_4/layer_norm_1/layer_norm_bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_4/layer_norm_1/layer_norm_scale shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_4/layer_norm_2/layer_norm_bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_4/layer_norm_2/layer_norm_scale shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_5/conv_hidden_relu/conv1_single/bias shape (2048,) size 2048
INFO:tensorflow:Weight body/decoder/layer_5/conv_hidden_relu/conv1_single/kernel shape (1, 1, 512, 2048) size 1048576
INFO:tensorflow:Weight body/decoder/layer_5/conv_hidden_relu/conv2_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_5/conv_hidden_relu/conv2_single/kernel shape (1, 1, 2048, 512) size 1048576
INFO:tensorflow:Weight body/decoder/layer_5/decoder_self_attention/output_transform_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_5/decoder_self_attention/output_transform_single/kernel shape (1, 1, 512, 512) size 262144
INFO:tensorflow:Weight body/decoder/layer_5/decoder_self_attention/qkv_transform_single/bias shape (1536,) size 1536
INFO:tensorflow:Weight body/decoder/layer_5/decoder_self_attention/qkv_transform_single/kernel shape (1, 1, 512, 1536) size 786432
INFO:tensorflow:Weight body/decoder/layer_5/encdec_attention/kv_transform_single/bias shape (1024,) size 1024
INFO:tensorflow:Weight body/decoder/layer_5/encdec_attention/kv_transform_single/kernel shape (1, 1, 512, 1024) size 524288
INFO:tensorflow:Weight body/decoder/layer_5/encdec_attention/output_transform_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_5/encdec_attention/output_transform_single/kernel shape (1, 1, 512, 512) size 262144
INFO:tensorflow:Weight body/decoder/layer_5/encdec_attention/q_transform_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_5/encdec_attention/q_transform_single/kernel shape (1, 1, 512, 512) size 262144
INFO:tensorflow:Weight body/decoder/layer_5/layer_norm/layer_norm_bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_5/layer_norm/layer_norm_scale shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_5/layer_norm_1/layer_norm_bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_5/layer_norm_1/layer_norm_scale shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_5/layer_norm_2/layer_norm_bias shape (512,) size 512
INFO:tensorflow:Weight body/decoder/layer_5/layer_norm_2/layer_norm_scale shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_0/conv_hidden_relu/conv1_single/bias shape (2048,) size 2048
INFO:tensorflow:Weight body/encoder/layer_0/conv_hidden_relu/conv1_single/kernel shape (1, 1, 512, 2048) size 1048576
INFO:tensorflow:Weight body/encoder/layer_0/conv_hidden_relu/conv2_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_0/conv_hidden_relu/conv2_single/kernel shape (1, 1, 2048, 512) size 1048576
INFO:tensorflow:Weight body/encoder/layer_0/encoder_self_attention/output_transform_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_0/encoder_self_attention/output_transform_single/kernel shape (1, 1, 512, 512) size 262144
INFO:tensorflow:Weight body/encoder/layer_0/encoder_self_attention/qkv_transform_single/bias shape (1536,) size 1536
INFO:tensorflow:Weight body/encoder/layer_0/encoder_self_attention/qkv_transform_single/kernel shape (1, 1, 512, 1536) size 786432
INFO:tensorflow:Weight body/encoder/layer_0/layer_norm/layer_norm_bias shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_0/layer_norm/layer_norm_scale shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_0/layer_norm_1/layer_norm_bias shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_0/layer_norm_1/layer_norm_scale shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_1/conv_hidden_relu/conv1_single/bias shape (2048,) size 2048
INFO:tensorflow:Weight body/encoder/layer_1/conv_hidden_relu/conv1_single/kernel shape (1, 1, 512, 2048) size 1048576
INFO:tensorflow:Weight body/encoder/layer_1/conv_hidden_relu/conv2_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_1/conv_hidden_relu/conv2_single/kernel shape (1, 1, 2048, 512) size 1048576
INFO:tensorflow:Weight body/encoder/layer_1/encoder_self_attention/output_transform_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_1/encoder_self_attention/output_transform_single/kernel shape (1, 1, 512, 512) size 262144
INFO:tensorflow:Weight body/encoder/layer_1/encoder_self_attention/qkv_transform_single/bias shape (1536,) size 1536
INFO:tensorflow:Weight body/encoder/layer_1/encoder_self_attention/qkv_transform_single/kernel shape (1, 1, 512, 1536) size 786432
INFO:tensorflow:Weight body/encoder/layer_1/layer_norm/layer_norm_bias shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_1/layer_norm/layer_norm_scale shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_1/layer_norm_1/layer_norm_bias shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_1/layer_norm_1/layer_norm_scale shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_2/conv_hidden_relu/conv1_single/bias shape (2048,) size 2048
INFO:tensorflow:Weight body/encoder/layer_2/conv_hidden_relu/conv1_single/kernel shape (1, 1, 512, 2048) size 1048576
INFO:tensorflow:Weight body/encoder/layer_2/conv_hidden_relu/conv2_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_2/conv_hidden_relu/conv2_single/kernel shape (1, 1, 2048, 512) size 1048576
INFO:tensorflow:Weight body/encoder/layer_2/encoder_self_attention/output_transform_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_2/encoder_self_attention/output_transform_single/kernel shape (1, 1, 512, 512) size 262144
INFO:tensorflow:Weight body/encoder/layer_2/encoder_self_attention/qkv_transform_single/bias shape (1536,) size 1536
INFO:tensorflow:Weight body/encoder/layer_2/encoder_self_attention/qkv_transform_single/kernel shape (1, 1, 512, 1536) size 786432
INFO:tensorflow:Weight body/encoder/layer_2/layer_norm/layer_norm_bias shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_2/layer_norm/layer_norm_scale shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_2/layer_norm_1/layer_norm_bias shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_2/layer_norm_1/layer_norm_scale shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_3/conv_hidden_relu/conv1_single/bias shape (2048,) size 2048
INFO:tensorflow:Weight body/encoder/layer_3/conv_hidden_relu/conv1_single/kernel shape (1, 1, 512, 2048) size 1048576
INFO:tensorflow:Weight body/encoder/layer_3/conv_hidden_relu/conv2_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_3/conv_hidden_relu/conv2_single/kernel shape (1, 1, 2048, 512) size 1048576
INFO:tensorflow:Weight body/encoder/layer_3/encoder_self_attention/output_transform_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_3/encoder_self_attention/output_transform_single/kernel shape (1, 1, 512, 512) size 262144
INFO:tensorflow:Weight body/encoder/layer_3/encoder_self_attention/qkv_transform_single/bias shape (1536,) size 1536
INFO:tensorflow:Weight body/encoder/layer_3/encoder_self_attention/qkv_transform_single/kernel shape (1, 1, 512, 1536) size 786432
INFO:tensorflow:Weight body/encoder/layer_3/layer_norm/layer_norm_bias shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_3/layer_norm/layer_norm_scale shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_3/layer_norm_1/layer_norm_bias shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_3/layer_norm_1/layer_norm_scale shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_4/conv_hidden_relu/conv1_single/bias shape (2048,) size 2048
INFO:tensorflow:Weight body/encoder/layer_4/conv_hidden_relu/conv1_single/kernel shape (1, 1, 512, 2048) size 1048576
INFO:tensorflow:Weight body/encoder/layer_4/conv_hidden_relu/conv2_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_4/conv_hidden_relu/conv2_single/kernel shape (1, 1, 2048, 512) size 1048576
INFO:tensorflow:Weight body/encoder/layer_4/encoder_self_attention/output_transform_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_4/encoder_self_attention/output_transform_single/kernel shape (1, 1, 512, 512) size 262144
INFO:tensorflow:Weight body/encoder/layer_4/encoder_self_attention/qkv_transform_single/bias shape (1536,) size 1536
INFO:tensorflow:Weight body/encoder/layer_4/encoder_self_attention/qkv_transform_single/kernel shape (1, 1, 512, 1536) size 786432
INFO:tensorflow:Weight body/encoder/layer_4/layer_norm/layer_norm_bias shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_4/layer_norm/layer_norm_scale shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_4/layer_norm_1/layer_norm_bias shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_4/layer_norm_1/layer_norm_scale shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_5/conv_hidden_relu/conv1_single/bias shape (2048,) size 2048
INFO:tensorflow:Weight body/encoder/layer_5/conv_hidden_relu/conv1_single/kernel shape (1, 1, 512, 2048) size 1048576
INFO:tensorflow:Weight body/encoder/layer_5/conv_hidden_relu/conv2_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_5/conv_hidden_relu/conv2_single/kernel shape (1, 1, 2048, 512) size 1048576
INFO:tensorflow:Weight body/encoder/layer_5/encoder_self_attention/output_transform_single/bias shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_5/encoder_self_attention/output_transform_single/kernel shape (1, 1, 512, 512) size 262144
INFO:tensorflow:Weight body/encoder/layer_5/encoder_self_attention/qkv_transform_single/bias shape (1536,) size 1536
INFO:tensorflow:Weight body/encoder/layer_5/encoder_self_attention/qkv_transform_single/kernel shape (1, 1, 512, 1536) size 786432
INFO:tensorflow:Weight body/encoder/layer_5/layer_norm/layer_norm_bias shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_5/layer_norm/layer_norm_scale shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_5/layer_norm_1/layer_norm_bias shape (512,) size 512
INFO:tensorflow:Weight body/encoder/layer_5/layer_norm_1/layer_norm_scale shape (512,) size 512
INFO:tensorflow:Weight body/target_space_embedding/kernel shape (32, 512) size 16384
INFO:tensorflow:Weight symbol_modality_31236_512/shared/weights_0 shape (1953, 512) size 999936
INFO:tensorflow:Weight symbol_modality_31236_512/shared/weights_10 shape (1952, 512) size 999424
INFO:tensorflow:Weight symbol_modality_31236_512/shared/weights_11 shape (1952, 512) size 999424
INFO:tensorflow:Weight symbol_modality_31236_512/shared/weights_12 shape (1952, 512) size 999424
INFO:tensorflow:Weight symbol_modality_31236_512/shared/weights_13 shape (1952, 512) size 999424
INFO:tensorflow:Weight symbol_modality_31236_512/shared/weights_14 shape (1952, 512) size 999424
INFO:tensorflow:Weight symbol_modality_31236_512/shared/weights_15 shape (1952, 512) size 999424
INFO:tensorflow:Weight symbol_modality_31236_512/shared/weights_1 shape (1953, 512) size 999936
INFO:tensorflow:Weight symbol_modality_31236_512/shared/weights_2 shape (1953, 512) size 999936
INFO:tensorflow:Weight symbol_modality_31236_512/shared/weights_3 shape (1953, 512) size 999936
INFO:tensorflow:Weight symbol_modality_31236_512/shared/weights_4 shape (1952, 512) size 999424
INFO:tensorflow:Weight symbol_modality_31236_512/shared/weights_5 shape (1952, 512) size 999424
INFO:tensorflow:Weight symbol_modality_31236_512/shared/weights_6 shape (1952, 512) size 999424
INFO:tensorflow:Weight symbol_modality_31236_512/shared/weights_7 shape (1952, 512) size 999424
INFO:tensorflow:Weight symbol_modality_31236_512/shared/weights_8 shape (1952, 512) size 999424
INFO:tensorflow:Weight symbol_modality_31236_512/shared/weights_9 shape (1952, 512) size 999424
INFO:tensorflow:Total trainable variables size: 60147712
INFO:tensorflow:Total embedding variables size: 16384
INFO:tensorflow:Total non-embedding variables size: 60131328
INFO:tensorflow:Computing gradients for global model_fn.
INFO:tensorflow:Global model_fn finished.
INFO:tensorflow:Create CheckpointSaverHook.
2017-06-27 04:34:58.910748: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-06-27 04:34:58.910798: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-06-27 04:34:58.910821: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
INFO:tensorflow:Saving checkpoints for 1 into /root/t2t_train/wmt_ende_tokens_32k/transformer-transformer_base/model.ckpt.
INFO:tensorflow:loss = 8.79561, step = 1
Traceback (most recent call last):
File "/usr/local/bin/t2t-trainer", line 83, in
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "/usr/local/bin/t2t-trainer", line 79, in main
schedule=FLAGS.schedule)
File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/trainer_utils.py", line 240, in run
run_locally(exp_fn(output_dir))
File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/trainer_utils.py", line 531, in run_locally
exp.train()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/experiment.py", line 275, in train
hooks=self._train_monitors + extra_hooks)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/experiment.py", line 665, in _call_train
monitors=hooks)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/deprecation.py", line 289, in new_func
return func(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 455, in fit
loss = self._train_model(input_fn=input_fn, hooks=hooks)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 1007, in _train_model
_, loss = mon_sess.run([model_fn_ops.train_op, model_fn_ops.loss])
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 505, in run
run_metadata=run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 842, in run
run_metadata=run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 798, in run
return self._sess.run(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 952, in run
run_metadata=run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 798, in run
return self._sess.run(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 789, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 997, in _run
feed_dict_string, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1132, in _do_run
target_list, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1152, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[16,1,0] = -1 is not in [0, 31236)
[[Node: symbol_modality_31236_512_1/parallel_0/symbol_modality_31236_512/shared/Gather = Gather[Tindices=DT_INT32, Tparams=DT_FLOAT, validate_indices=true, _device="/job:localhost/replica:0/task:0/cpu:0"](symbol_modality_31236_512_1/parallel_0/symbol_modality_31236_512/shared/ConvertGradientToTensor_cc661786, symbol_modality_31236_512_1/parallel_0/symbol_modality_31236_512/shared/Squeeze)]]
Caused by op u'symbol_modality_31236_512_1/parallel_0/symbol_modality_31236_512/shared/Gather', defined at:
File "/usr/local/bin/t2t-trainer", line 83, in
tf.app.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "/usr/local/bin/t2t-trainer", line 79, in main
schedule=FLAGS.schedule)
File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/trainer_utils.py", line 240, in run
run_locally(exp_fn(output_dir))
File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/trainer_utils.py", line 531, in run_locally
exp.train()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/experiment.py", line 275, in train
hooks=self._train_monitors + extra_hooks)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/experiment.py", line 665, in _call_train
monitors=hooks)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/deprecation.py", line 289, in new_func
return func(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 455, in fit
loss = self._train_model(input_fn=input_fn, hooks=hooks)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 955, in _train_model
model_fn_ops = self._get_train_ops(features, labels)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 1162, in _get_train_ops
return self._call_model_fn(features, labels, model_fn_lib.ModeKeys.TRAIN)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 1133, in _call_model_fn
model_fn_results = self._model_fn(features, labels, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/trainer_utils.py", line 423, in model_fn
len(hparams.problems) - 1)
File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/trainer_utils.py", line 748, in _cond_on_index
return fn(cur_idx)
File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/trainer_utils.py", line 405, in nth_model
features, train, skip=(skipping_is_on and skip_this_one))
File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/t2t_model.py", line 387, in model_fn
sharded_features["targets"], dp)
File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/modality.py", line 115, in targets_bottom_sharded
return data_parallelism(self.targets_bottom, xs)
File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/expert_utils.py", line 294, in call
outputs.append(fns[i](*my_args[i], **my_kwargs[i]))
File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/models/modalities.py", line 94, in targets_bottom
return self.bottom_simple(x, "shared", reuse=True)
File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/models/modalities.py", line 80, in bottom_simple
ret = tf.gather(var, x)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_array_ops.py", line 1179, in gather
validate_indices=validate_indices, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2506, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1269, in init
self._traceback = _extract_stack()
InvalidArgumentError (see above for traceback): indices[16,1,0] = -1 is not in [0, 31236)
[[Node: symbol_modality_31236_512_1/parallel_0/symbol_modality_31236_512/shared/Gather = Gather[Tindices=DT_INT32, Tparams=DT_FLOAT, validate_indices=true, _device="/job:localhost/replica:0/task:0/cpu:0"](symbol_modality_31236_512_1/parallel_0/symbol_modality_31236_512/shared/ConvertGradientToTensor_cc661786, symbol_modality_31236_512_1/parallel_0/symbol_modality_31236_512/shared/Squeeze)]]