Giter Club home page Giter Club logo

asyml / texar Goto Github PK

View Code? Open in Web Editor NEW
2.4K 78.0 373.0 13.91 MB

Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow. This is part of the CASL project: http://casl-project.ai/

Home Page: https://asyml.io

License: Apache License 2.0

Python 99.63% Perl 0.30% Shell 0.08%
machine-learning natural-language-processing tensorflow deep-learning text-generation python machine-translation dialog-systems texar bert

texar's People

Contributors

abiu123 avatar avinashbukkittu avatar devsinghsachan avatar digo avatar eridgd avatar gpengzhi avatar guotong1988 avatar hadifar avatar haoransh avatar hunterhector avatar huzecong avatar jueliangguke avatar jxhe avatar michaelpulsewidth avatar qkaren avatar shyaoni avatar snakeztc avatar swapnull7 avatar tanyuqian avatar tianzhiliang avatar tomnong avatar vegb avatar weiwei718 avatar wwt17 avatar xuezhemax avatar zcyang avatar zhitinghu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

texar's Issues

why seq2seq_rl loss = -0.000000, Is this normal?

2018-10-09 09:52:43.044248: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 21534 MB memory) -> physicalGPU (device: 2, name: Tesla P40, pci bus id: 0000:07:00.0, compute capability: 6.1)
2018-10-09 09:52:43.386811: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:3 with 19354 MB memory) -> physicalGPU (device: 3, name: Tesla P40, pci bus id: 0000:08:00.0, compute capability: 6.1)
2018-10-09 09:52:43.724525: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:4 with 21534 MB memory) -> physicalGPU (device: 4, name: Tesla P40, pci bus id: 0000:0c:00.0, compute capability: 6.1)
2018-10-09 09:52:44.066564: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:5 with 3867 MB memory) -> physical GPU (device: 5, name: Tesla P40, pci bus id: 0000:0d:00.0, compute capability: 6.1)
2018-10-09 09:52:44.130317: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:6 with 7327 MB memory) -> physical GPU (device: 6, name: Tesla P40, pci bus id: 0000:0e:00.0, compute capability: 6.1)
2018-10-09 09:52:44.248129: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1097] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:7 with 21534 MB memory) -> physicalGPU (device: 7, name: Tesla P40, pci bus id: 0000:0f:00.0, compute capability: 6.1)
step=1, loss=-25.0021, reward=7.1089
step=10, loss=-0.0015, reward=0.0912
step=20, loss=-0.0000, reward=0.0994
step=30, loss=-0.0000, reward=0.0591
step=40, loss=-0.0000, reward=0.0631
step=50, loss=-0.0000, reward=0.0736
step=60, loss=-0.0000, reward=0.0486
step=70, loss=-0.0000, reward=0.0852
step=80, loss=-0.0000, reward=0.1246
step=90, loss=-0.0000, reward=0.0655
step=100, loss=-0.0000, reward=0.1117
step=110, loss=-0.0000, reward=0.1609
step=120, loss=-0.0000, reward=0.1163
step=130, loss=-0.0000, reward=0.0672
step=140, loss=-0.0000, reward=0.0499

BLEU scores suddenly drops while training interpolation

Hi, thanks for the great work.
I've tried training an NMT model on IWSLT 14 with interpolation algorithm, (https://github.com/asyml/texar/tree/master/examples/seq2seq_exposure_bias) but while training I found that the BLEU suddenly dropped to 0.0000 at about 11 epoch.

Below is the training log I ran into:

training epoch=9, lambdas=[0.04, 0.06, 0.0]
step=0, loss=48.1200, lambdas=[0.04, 0.06, 0.0]
step=500, loss=50.3024, lambdas=[0.04, 0.06, 0.0]
step=1000, loss=49.0209, lambdas=[0.04, 0.06, 0.0]
step=1500, loss=44.0876, lambdas=[0.04, 0.06, 0.0]
step=2000, loss=54.4154, lambdas=[0.04, 0.06, 0.0]
step=2500, loss=53.7328, lambdas=[0.04, 0.06, 0.0]
step=3000, loss=54.9698, lambdas=[0.04, 0.06, 0.0]
step=3500, loss=67.9883, lambdas=[0.04, 0.06, 0.0]
step=4000, loss=51.2655, lambdas=[0.04, 0.06, 0.0]
step=4500, loss=56.5977, lambdas=[0.04, 0.06, 0.0]
val epoch=9, BLEU=27.1300; best-ever=27.1300
test epoch=9, BLEU=25.2700
==================================================
training epoch=10, lambdas=[0.04, 0.06, 0.0]
step=0, loss=60.8326, lambdas=[0.04, 0.06, 0.0]
step=500, loss=39.8571, lambdas=[0.04, 0.06, 0.0]
step=1000, loss=52.8363, lambdas=[0.04, 0.06, 0.0]
step=1500, loss=47.0654, lambdas=[0.04, 0.06, 0.0]
step=2000, loss=62.2711, lambdas=[0.04, 0.06, 0.0]
step=2500, loss=64.2932, lambdas=[0.04, 0.06, 0.0]
step=3000, loss=49.2814, lambdas=[0.04, 0.06, 0.0]
step=3500, loss=53.3860, lambdas=[0.04, 0.06, 0.0]
step=4000, loss=52.4406, lambdas=[0.04, 0.06, 0.0]
step=4500, loss=53.0982, lambdas=[0.04, 0.06, 0.0]
val epoch=10, BLEU=27.0600; best-ever=27.1300
test epoch=10, BLEU=25.3000
==================================================
training epoch=11, lambdas=[0.1, 0.0, 0.0]
step=0, loss=43.5935, lambdas=[0.1, 0.0, 0.0]
step=500, loss=6.5808, lambdas=[0.1, 0.0, 0.0]
step=1000, loss=3.1541, lambdas=[0.1, 0.0, 0.0]
step=1500, loss=2.2091, lambdas=[0.1, 0.0, 0.0]
step=2000, loss=2.9512, lambdas=[0.1, 0.0, 0.0]
step=2500, loss=1.2280, lambdas=[0.1, 0.0, 0.0]
step=3000, loss=1.1169, lambdas=[0.1, 0.0, 0.0]
step=3500, loss=1.3231, lambdas=[0.1, 0.0, 0.0]
step=4000, loss=1.2344, lambdas=[0.1, 0.0, 0.0]
step=4500, loss=1.1418, lambdas=[0.1, 0.0, 0.0]
val epoch=11, BLEU=0.0000; best-ever=27.1300
test epoch=11, BLEU=0.0000  // <-- BLEU suddenly dropped!
==================================================
training epoch=12, lambdas=[0.1, 0.0, 0.0]
step=0, loss=1.7246, lambdas=[0.1, 0.0, 0.0]
step=500, loss=1.3470, lambdas=[0.1, 0.0, 0.0]
step=1000, loss=1.0208, lambdas=[0.1, 0.0, 0.0]
step=1500, loss=1.6566, lambdas=[0.1, 0.0, 0.0]
step=2000, loss=1.4075, lambdas=[0.1, 0.0, 0.0]
step=2500, loss=1.5193, lambdas=[0.1, 0.0, 0.0]
step=3000, loss=1.1760, lambdas=[0.1, 0.0, 0.0]
step=3500, loss=0.8260, lambdas=[0.1, 0.0, 0.0]
step=4000, loss=2.0769, lambdas=[0.1, 0.0, 0.0]
step=4500, loss=1.1434, lambdas=[0.1, 0.0, 0.0]
val epoch=12, BLEU=0.0000; best-ever=27.1300
test epoch=12, BLEU=0.0000

And the test_results10.txt is like:

you know , one of the intense pleasures of travel and one of the delights of ethnographic research is the opportunity to live amongst those who have not forgotten the old ways , who still feel their past in the wind , touch it in stones polished by rain , taste it in the bitter leaves of plants . ||| you know , one of the great <UNK> travel in travel , and one of the pleasure of the <UNK> research is to live with the people who remember remember the old days , they can feel their past , they <UNK> the the <UNK> of the plants .
just to know that jaguar shamans still journey beyond the milky way , or the myths of the inuit elders still resonate with meaning , or that in the himalaya , the buddhists still pursue the breath of the dharma , is to really remember the central revelation of anthropology , and that is the idea that the world in which we live does not exist in some absolute sense , but is just one model of reality , the consequence of one particular set of adaptive choices that our lineage made , albeit successfully , many generations ago . ||| just the know that <UNK> still still beyond the milky way , or the importance of the council of the inuit , is full of the the the the the the the the world , which is the the world that the world that we &apos;re in ,
and of course , we all share the same adaptive imperatives . ||| and of course , we all share the same <UNK> .
we &apos;re all born . we all bring our children into the world . ||| we &apos;re all born . we &apos;re bringing kids to the world .
we go through initiation rites . ||| we go through <UNK> .

And the test_results11.txt (when the BLEU dropped) is like:

you know , one of the intense pleasures of travel and one of the delights of ethnographic research is the opportunity to live amongst those who have not forgotten the old ways , who still feel their past in the wind , touch it in stones polished by rain , taste it in the bitter leaves of plants . ||| you
just to know that jaguar shamans still journey beyond the milky way , or the myths of the inuit elders still resonate with meaning , or that in the himalaya , the buddhists still pursue the breath of the dharma , is to really remember the central revelation of anthropology , and that is the idea that the world in which we live does not exist in some absolute sense , but is just one model of reality , the consequence of one particular set of adaptive choices that our lineage made , albeit successfully , many generations ago . ||| just
and of course , we all share the same adaptive imperatives . ||| and
we &apos;re all born . we all bring our children into the world . ||| we
we go through initiation rites . ||| we

I guess it's something to do with the lambda value that changed, but I have no idea right now.
I've only modified configs to set batch_size as 32 (from 64), and using python v3.5.2 with tensorflow-gpu v1.8.0.
Could you guess any reason why? Thanks.

gradient can't flow in pg_losses

In pg_losses_with_logits function, the code
actions = tf.stop_gradient(actions)
will stop the gradient flows to the Encoder-Decoder framework which generates the actions(samples).
So in the example: seq2seq_attn_pg.py , the whole network can't get trained
I am not sure if in your design you only want to update the Discriminator, but the Generator will never get updated. Maybe a better implementation is to specify the different train_variables?
Correct me if I am wrong.

Running texar on CPU

Hi,
I'm unable to run text_style_transfer/prepare_data.py since I'm running this program on a AMD Radeon 8670M graphics card. Is it possible to make Texar use tensorflow-cpu ?
Thanks

Minor attribute error in language model example

Traceback (most recent call last):
File "./examples/language_model.py", line 119, in
tf.app.run(main=_main)
File "/home/devendra/yes/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 124, in run
_sys.exit(main(argv))
File "./examples/language_model.py", line 58, in _main
embedding=embedder)
File "/home/devendra/txtgen/texar/module_base.py", line 70, in call
return self._template(*args, **kwargs)
File "/home/devendra/yes/lib/python3.6/site-packages/tensorflow/python/ops/template.py", line 273, in call
result = self._call_func(args, kwargs, check_for_new_variables=False)
File "/home/devendra/yes/lib/python3.6/site-packages/tensorflow/python/ops/template.py", line 222, in _call_func
result = self._func(*args, **kwargs)
File "/home/devendra/txtgen/texar/modules/decoders/rnn_decoder_base.py", line 296, in _build
output_time_major=output_time_major)
File "/home/devendra/yes/lib/python3.6/site-packages/tensorflow/contrib/seq2seq/python/ops/decoder.py", line 203, in dynamic_decode
zero_outputs = _create_zero_outputs(decoder.output_size,
File "/home/devendra/txtgen/texar/modules/decoders/rnn_decoders.py", line 205, in output_size
logits=self._rnn_output_size(),
File "/home/devendra/txtgen/texar/modules/decoders/rnn_decoder_base.py", line 325, in _rnn_output_size
layer_output_shape = self._output_layer.compute_output_shape(
AttributeError: 'Dense' object has no attribute 'compute_output_shape'

originally defined at:
File "./examples/language_model.py", line 53, in main
decoder = tx.modules.BasicRNNDecoder(vocab_size=train_data.vocab.size)
File "/home/devendra/txtgen/texar/modules/decoders/rnn_decoders.py", line 119, in init
self, cell, vocab_size, output_layer, cell_dropout_mode, hparams)
File "/home/devendra/txtgen/texar/modules/decoders/rnn_decoder_base.py", line 43, in init
ModuleBase.init(self, hparams)
File "/home/devendra/txtgen/texar/module_base.py", line 33, in init
create_scope_now
=True)

Label for text_style_transfer

I can not read in non-integer label like 0.456 for text_style_transfer code:
iterator = tx.data.FeedableDataIterator({'train_g': train_data,'val': val_data, 'test': test_data})

vae_train.py

Excuse me. I want to ask you some questions.Could you help me solve them?
Firstly,When I run the vae_train.py,I encounter a problem like that"ImportError: No module named 'config'",
so i copy the file named config.py ,which under the text_style_transfer, into the vae_text .
After that, i run the program again. But there comes another problem like that
" File "G:/NLP/texar-master/texar-master/examples/vae_text/vae_train.py", line 64, in _main train_data = tx.data.MonoTextData(config.train_data_hparams) AttributeError: module 'config' has no attribute 'train_data_hparams'"

what should I do next? thank you.

AssertionErrors when test in tensorflow 1.13-rc0

Hi,

There are 3 AssertionErrors occur when running tests, the environment is tensorflow 1.13-rc0, Python 3.6.8 (same error message as Python 2.7) and Ubuntu 16.04, the Errors are shown as below:


================================================================================================= FAILURES =================================================================================================
_________________________________________________________________________________ MergeLayerTest.test_trainable_variables __________________________________________________________________________________

self = <texar.core.layers_test.MergeLayerTest testMethod=test_trainable_variables>

    def test_trainable_variables(self):
        """Test the trainable_variables of the layer.
        """
        layers_ = []
        layers_.append(tf.layers.Conv1D(filters=200, kernel_size=3))
        layers_.append(tf.layers.Conv1D(filters=200, kernel_size=4))
        layers_.append(tf.layers.Conv1D(filters=200, kernel_size=5))
        layers_.append(tf.layers.Dense(200))
        layers_.append(tf.layers.Dense(200))
        m_layer = layers.MergeLayer(layers_)

        inputs = tf.zeros([64, 16, 1024], dtype=tf.float32)
        _ = m_layer(inputs)

        num_vars = sum([len(layer.trainable_variables) for layer in layers_])
>       self.assertEqual(num_vars, len(m_layer.trainable_variables))
E       AssertionError: 10 != 20

texar/core/layers_test.py:302: AssertionError
____________________________________________________________________________________ SequentialLayerTest.test_seq_layer ____________________________________________________________________________________

self = <texar.core.layers_test.SequentialLayerTest testMethod=test_seq_layer>

    def test_seq_layer(self):
        """Test sequential layer.
        """
        layers_ = []
        layers_.append(tf.layers.Dense(100))
        layers_.append(tf.layers.Dense(200))
        seq_layer = layers.SequentialLayer(layers_)

        output_shape = seq_layer.compute_output_shape([None, 10])
        self.assertEqual(output_shape[1].value, 200)

        inputs = tf.zeros([10, 20], dtype=tf.float32)
        outputs = seq_layer(inputs)

        num_vars = sum([len(layer.trainable_variables) for layer in layers_])
>       self.assertEqual(num_vars, len(seq_layer.trainable_variables))
E       AssertionError: 4 != 8

texar/core/layers_test.py:323: AssertionError
______________________________________________________________________________ AttentionRNNDecoderTest.test_beam_search_cell _______________________________________________________________________________

self = <texar.modules.decoders.rnn_decoders_test.AttentionRNNDecoderTest testMethod=test_beam_search_cell>

    def test_beam_search_cell(self):
        """Tests :meth:`texar.modules.AttentionRNNDecoder._get_beam_search_cell`
        """
        seq_length = np.random.randint(
            self._max_time, size=[self._batch_size]) + 1
        encoder_values_length = tf.constant(seq_length)
        hparams = {
            "attention": {
                "kwargs": {
                    "num_units": self._attention_dim,
                    "probability_fn": "sparsemax"
                }
            }
        }
        decoder = AttentionRNNDecoder(
            memory=self._encoder_output,
            memory_sequence_length=encoder_values_length,
            vocab_size=self._vocab_size,
            hparams=hparams)

        helper_train = get_helper(
            decoder.hparams.helper_train.type,
            inputs=self._inputs,
            sequence_length=[self._max_time]*self._batch_size,
            **decoder.hparams.helper_train.kwargs.todict())

        _, _, _ = decoder(helper=helper_train)

        ## 4+1 trainable variables: cell-kernel, cell-bias,
        ## fc-weight, fc-bias, and
        ## memory_layer: For LuongAttention, we only transform the memory layer;
        ## thus num_units *must* match the expected query depth.
        self.assertEqual(len(decoder.trainable_variables), 5)

        beam_width = 3
        beam_cell = decoder._get_beam_search_cell(beam_width)
        cell_input = tf.random_uniform([self._batch_size * beam_width,
                                        self._emb_dim])
        cell_state = beam_cell.zero_state(self._batch_size * beam_width,
                                          tf.float32)
        _ = beam_cell(cell_input, cell_state)
        # Test if beam_cell is sharing variables with decoder cell.
>       self.assertEqual(len(beam_cell.trainable_variables), 0)
E       AssertionError: 2 != 0

texar/modules/decoders/rnn_decoders_test.py:368: AssertionError

Hope these info be helpful in debug, thanks

A Question About the Example -- sentence_classifier

I am a new user of texar, when I try one of the examples named "sentence_classifier"(https://github.com/asyml/texar/tree/master/examples/sentence_classifier), I find some problems hard to resolve.
In the example, the language of the data is english however I want to try it in chinese. I just replace the SSTdata with my data.
Here is the form of the data I use:
2 但是您的所作所为不合适。
1 我觉得这个苹果味道一般。
0 今天终于修正了一个错误。
About the vocab ,I use my own word segmenter for chinese sentence.
The program displays an error:
image
I guess it is because the size of the batch. So I change the batch size from 50 to 64 in config_kim.py.
Then there is another error appears:
image
I don't know how to explain it so I hope someone can tell me if I use Texar incorrectly or there are other problems exist.

wrong dependency in bleu_moses

cur_dir = os.path.dirname(os.path.realpath(__file__))

I believe the purpose of this line is to get the path of something like path/to/texar-repo/texar/evals/bleu_moses.py, and then locate the bleu script in the bin directory in this repo.

But when we use texar as a lib, this file is actually located at something like ....../pythonX/site-packages/texar/evals/bleu_moses.py rather than the cloned repo. But we don't and shouldn't have a bin folder in the site-packages directory.

The bug can be reproduced by running the example of seq2seq_attn with toy_copy dataset, which produces something like

FileNotFoundError: [Errno 2] No such file or directory: '....../python3.6/site-packages/bin/utils/multi-bleu.perl': '....../site-packages/bin/utils/multi-bleu.perl'

, and I guess other examples depending on the bleu_moses can reproduce the bug too.

Distributed Training On horovod

I Am trying to run the transformer model on multiple GPUS(4),i see that model is replicated 4 times instead of running in parallel .I see each iteration is run 4 times . is this expected?

hierarchical_dialog doesn't work

(.venv) ub16c9@ub16c9-gpu:~/ub16_prj/texar/examples/hierarchical_dialog$ python3 hred.py --config_model config_model_biminor
Traceback (most recent call last):
File "hred.py", line 243, in
main()
File "hred.py", line 82, in main
sequence_length_major=data_batch['source_utterance_cnt'])
File "/home/ub16c9/ub16_prj/texar/texar/module_base.py", line 117, in call
return self._template(*args, **kwargs)
File "/home/ub16c9/ub16_prj/texar/.venv/lib/python3.6/site-packages/tensorflow/python/ops/template.py", line 360, in call
return self._call_func(args, kwargs)
File "/home/ub16c9/ub16_prj/texar/.venv/lib/python3.6/site-packages/tensorflow/python/ops/template.py", line 311, in _call_func
result = self._func(*args, **kwargs)
File "/home/ub16c9/ub16_prj/texar/texar/modules/encoders/hierarchical_encoders.py", line 253, in _build
_, states_minor = self._encoder_minor(inputs, **kwargs_minor)
File "/home/ub16c9/ub16_prj/texar/texar/module_base.py", line 117, in call
return self._template(*args, **kwargs)
File "/home/ub16c9/ub16_prj/texar/.venv/lib/python3.6/site-packages/tensorflow/python/ops/template.py", line 360, in call
return self._call_func(args, kwargs)
File "/home/ub16c9/ub16_prj/texar/.venv/lib/python3.6/site-packages/tensorflow/python/ops/template.py", line 311, in _call_func
result = self._func(*args, **kwargs)
File "/home/ub16c9/ub16_prj/texar/texar/modules/encoders/rnn_encoders.py", line 829, in _build
**kwargs)
File "/home/ub16c9/ub16_prj/texar/.venv/lib/python3.6/site-packages/tensorflow/python/ops/rnn.py", line 439, in bidirectional_dynamic_rnn
time_major=time_major, scope=fw_scope)
File "/home/ub16c9/ub16_prj/texar/.venv/lib/python3.6/site-packages/tensorflow/python/ops/rnn.py", line 623, in dynamic_rnn
"but saw shape: %s" % sequence_length.get_shape())
ValueError: sequence_length must be a vector of length batch_size, but saw shape: (?, ?)

originally defined at:
File "hred.py", line 69, in main
encoder = tx.modules.HierarchicalRNNEncoder(hparams=encoder_hparams)
File "/home/ub16c9/ub16_prj/texar/texar/modules/encoders/hierarchical_encoders.py", line 100, in init
['texar.modules.encoders', 'texar.custom'])
File "/home/ub16c9/ub16_prj/texar/texar/utils/utils.py", line 237, in check_or_get_instance
ret = get_instance(ret, kwargs, module_paths)
File "/home/ub16c9/ub16_prj/texar/texar/utils/utils.py", line 281, in get_instance
return class_(**kwargs)
File "/home/ub16c9/ub16_prj/texar/texar/modules/encoders/rnn_encoders.py", line 604, in init
RNNEncoderBase.init(self, hparams)
File "/home/ub16c9/ub16_prj/texar/texar/modules/encoders/rnn_encoders.py", line 212, in init
EncoderBase.init(self, hparams)
File "/home/ub16c9/ub16_prj/texar/texar/modules/encoders/encoder_base.py", line 33, in init
ModuleBase.init(self, hparams)
File "/home/ub16c9/ub16_prj/texar/texar/module_base.py", line 74, in init
create_scope_now_=True)
File "/home/ub16c9/ub16_prj/texar/.venv/lib/python3.6/site-packages/tensorflow/python/ops/template.py", line 154, in make_template
**kwargs)

originally defined at:
File "hred.py", line 69, in main
encoder = tx.modules.HierarchicalRNNEncoder(hparams=encoder_hparams)
File "/home/ub16c9/ub16_prj/texar/texar/modules/encoders/hierarchical_encoders.py", line 68, in init
EncoderBase.init(self, hparams)
File "/home/ub16c9/ub16_prj/texar/texar/modules/encoders/encoder_base.py", line 33, in init
ModuleBase.init(self, hparams)
File "/home/ub16c9/ub16_prj/texar/texar/module_base.py", line 74, in init
create_scope_now_=True)
File "/home/ub16c9/ub16_prj/texar/.venv/lib/python3.6/site-packages/tensorflow/python/ops/template.py", line 154, in make_template
**kwargs)

(.venv) ub16c9@ub16c9-gpu:/ub16_prj/texar/examples/hierarchical_dialog$ python3 hred.py --config_model config_model_uniminor
Traceback (most recent call last):
File "hred.py", line 243, in
main()
File "hred.py", line 69, in main
encoder = tx.modules.HierarchicalRNNEncoder(hparams=encoder_hparams)
File "/home/ub16c9/ub16_prj/texar/texar/modules/encoders/hierarchical_encoders.py", line 100, in init
['texar.modules.encoders', 'texar.custom'])
File "/home/ub16c9/ub16_prj/texar/texar/utils/utils.py", line 237, in check_or_get_instance
ret = get_instance(ret, kwargs, module_paths)
File "/home/ub16c9/ub16_prj/texar/texar/utils/utils.py", line 281, in get_instance
return class_(**kwargs)
File "/home/ub16c9/ub16_prj/texar/texar/modules/encoders/rnn_encoders.py", line 285, in init
RNNEncoderBase.init(self, hparams)
File "/home/ub16c9/ub16_prj/texar/texar/modules/encoders/rnn_encoders.py", line 212, in init
EncoderBase.init(self, hparams)
File "/home/ub16c9/ub16_prj/texar/texar/modules/encoders/encoder_base.py", line 33, in init
ModuleBase.init(self, hparams)
File "/home/ub16c9/ub16_prj/texar/texar/module_base.py", line 72, in init
self._hparams = HParams(hparams, self.default_hparams())
File "/home/ub16c9/ub16_prj/texar/texar/hyperparams.py", line 156, in init
hparams, default_hparams, allow_new_hparam)
File "/home/ub16c9/ub16_prj/texar/texar/hyperparams.py", line 225, in _parse
"entries undefined in default hyperparameters." % name)
ValueError: Unknown hyperparameter: rnn_cell_fw. Only hyperparameters named 'kwargs' hyperparameters can contain new entries undefined in default hyperparameters.
(.venv) ub16c9@ub16c9-gpu:
/ub16_prj/texar/examples/hierarchical_dialog$

Problem with the prepare_data.py in text_style_transfer

Hi!
I'm learning the example "text_style_transfer".When I run the prepare_data.py,i got the problem:
//
_`Traceback (most recent call last):
File "/data/linshuai/anaconda3/envs/tf018/lib/python3.5/site-packages/urllib3/connection.py", line 159, in _new_conn
(self._dns_host, self.port), self.timeout, **extra_kw)
File "/data/linshuai/anaconda3/envs/tf018/lib/python3.5/site-packages/urllib3/util/connection.py", line 80, in create_connection
raise err
File "/data/linshuai/anaconda3/envs/tf018/lib/python3.5/site-packages/urllib3/util/connection.py", line 70, in create_connection
sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/data/linshuai/anaconda3/envs/tf018/lib/python3.5/site-packages/urllib3/connectionpool.py", line 600, in urlopen
chunked=chunked)
File "/data/linshuai/anaconda3/envs/tf018/lib/python3.5/site-packages/urllib3/connectionpool.py", line 343, in _make_request
self._validate_conn(conn)
File "/data/linshuai/anaconda3/envs/tf018/lib/python3.5/site-packages/urllib3/connectionpool.py", line 839, in _validate_conn
conn.connect()
File "/data/linshuai/anaconda3/envs/tf018/lib/python3.5/site-packages/urllib3/connection.py", line 301, in connect
conn = self._new_conn()
File "/data/linshuai/anaconda3/envs/tf018/lib/python3.5/site-packages/urllib3/connection.py", line 168, in _new_conn
self, "Failed to establish a new connection: %s" % e)
urllib3.exceptions.NewConnectionError: <urllib3.connection.VerifiedHTTPSConnection object at 0x7fdefd25a630>: Failed to establish a new connection: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/data/linshuai/anaconda3/envs/tf018/lib/python3.5/site-packages/requests/adapters.py", line 449, in send
timeout=timeout
File "/data/linshuai/anaconda3/envs/tf018/lib/python3.5/site-packages/urllib3/connectionpool.py", line 638, in urlopen
_stacktrace=sys.exc_info()[2])
File "/data/linshuai/anaconda3/envs/tf018/lib/python3.5/site-packages/urllib3/util/retry.py", line 398, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='docs.google.com', port=443): Max retries exceeded with url: /uc?export=download&id=1HaUKEYDBEk6GlJGmXwqYteB-4rS9q8Lg (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7fdefd25a630>: Failed to establish a new connection: [Errno 111] Connection refused',))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "prepare_data.py", line 35, in
main()
File "prepare_data.py", line 32, in main
prepare_data()
File "prepare_data.py", line 27, in prepare_data
extract=True)
File "/data/linshuai/texar/texar/data/data_utils.py", line 90, in maybe_download
filepath = _download_from_google_drive(url, filename, path)
File "/data/linshuai/texar/texar/data/data_utils.py", line 143, in _download_from_google_drive
response = sess.get(gurl, params={'id': file_id}, stream=True)
File "/data/linshuai/anaconda3/envs/tf018/lib/python3.5/site-packages/requests/sessions.py", line 546, in get
return self.request('GET', url, **kwargs)
File "/data/linshuai/anaconda3/envs/tf018/lib/python3.5/site-packages/requests/sessions.py", line 533, in request
resp = self.send(prep, **send_kwargs)
File "/data/linshuai/anaconda3/envs/tf018/lib/python3.5/site-packages/requests/sessions.py", line 646, in send
r = adapter.send(request, **kwargs)
File "/data/linshuai/anaconda3/envs/tf018/lib/python3.5/site-packages/requests/adapters.py", line 516, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='docs.google.com', port=443): Max retries exceeded with url: /uc?export=download&id=1HaUKEYDBEk6GlJGmXwqYteB-4rS9q8Lg (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7fdefd25a630>: Failed to establish a new connection: [Errno 111] Connection refused',))
`
//
could you tell me how to solve this problem?Thanks a lot!

About KL annealing issue

Hi, Zhiting.

I noticed that you used KL annealing strategy like this in vae_text.py

anneal_r = 1.0 / (config.kl_anneal_hparams["warm_up"] * (train_data.dataset_size() / config.batch_size))
opt_vars["kl_weight"] + anneal_r

Is this a common way to do like this? or you just chose one way? Different kl strategy is likely to affect the final performance.

best,
Dong Qian

transformer_main.py

Excuse me, I followed the example of transformer with the IWSLT'15 EN-VI data, after I ran python transformer_main.py --run_mode=train_and_evaluate --config_model=config_model --config_data=config_iwslt15 , I got error:

Traceback (most recent call last): File "transformer_main.py", line 271, in <module> main() File "transformer_main.py", line 89, in main encoder = TransformerEncoder(hparams=config_model.encoder) File "/home/nongshibiao/Documents/texar/texar/modules/encoders/transformer_encoders.py", line 128, in __init__ layers.get_initializer(self._hparams.initializer)) File "/home/nongshibiao/Documents/texar/texar/core/layers.py", line 413, in get_initializer modules) File "/home/nongshibiao/Documents/texar/texar/utils/utils.py", line 197, in check_or_get_instance ret = get_instance(ret, kwargs, module_paths) File "/home/nongshibiao/Documents/texar/texar/utils/utils.py", line 239, in get_instance (class_.__module__, class_.__name__, key, list(class_args))) ValueError: Invalid argument for class tensorflow.python.ops.init_ops.VarianceScaling: distribution, valid args: []

I go back to the code and it seems the valid args could be not empty. Is it because I missed loading the instance at some point? Thank you.

text style transfer discriminator only trained on real samples?

Hello, I'm having trouble understanding how your "text style transfer" model works, given that you changed it from discriminating between generated and real samples (i.e. the "discriminator" is only trained on real data). Please explain, that and the independency constraint which you removed. After 6 epochs of training (approximately 18 hours), all the samples are merely duplications. How should I interpret this and how many epochs of training must I complete before I expect the samples to represent what you show in the paper?

Update: you can expect generated output to begin diverging around epoch 12 - for me a day and a half of training.

text_style_transfer multy GPU

When I want to use more than 1 gpu running program, no matter how I add it, I cannot work at the same time with multiple gpus. I don't know why, but where can I modify it to ensure that multiple gpus can be used at the same time

InvalidArgumentError when running seqgan example

Hello,

thanks for your library. I've been trying seqgan example in the library. The pretraining works normally. However, when training adversarially several iterations, I got the following error:

......
......
......
G train epoch 17, step 9401: mean_reward: 137.219879, expect_reward_loss:-137.225128, update_loss: -10006.482422
D train epoch 17, step 0: dis_total_loss: 2737.095215, r_loss: 0.000000, f_loss: 2737.095215
2018-09-08 16:12:16.843339: W tensorflow/core/framework/op_kernel.cc:1275] OP_REQUIRES failed at strided_slice_op.cc:105 : Invalid argument: slice index 1 of dimension 0 out of bounds.
Traceback (most recent call last):
File "seqgan_train.py", line 332, in
tf.app.run(main=_main)
File "/home/ycharn/anaconda2/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "seqgan_train.py", line 321, in _main
_g_train_epoch(sess, cur_epoch, 'train')
File "seqgan_train.py", line 180, in _g_train_epoch
rtns = sess.run(fetches)
File "/home/ycharn/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 877, in run
run_metadata_ptr)
File "/home/ycharn/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1100, in _run
feed_dict_tensor, options, run_metadata)
File "/home/ycharn/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1272, in _do_run
run_metadata)
File "/home/ycharn/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1291, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: slice index 1 of dimension 0 out of bounds.
[[Node: strided_slice_6 = StridedSlice[Index=DT_INT32, T=DT_INT32, begin_mask=0, ellipsis_mask=0, end_mask=0, new_axis_mask=0, shrink_axis_mask=1, _device="/job:localhost/replica:0/task:0/device:GPU:0"](OptimizeLoss_3/gradients/mul_1_grad/Shape, OptimizeLoss_3/gradients/Softmax_grad/Sum/reduction_indices, Slice/begin, OptimizeLoss_3/gradients/Softmax_grad/Sum/reduction_indices)]]
[[Node: OptimizeLoss_3/train/update/_450 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_2189_OptimizeLoss_3/train/update", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

Caused by op u'strided_slice_6', defined at:
File "seqgan_train.py", line 332, in
tf.app.run(main=_main)
File "/home/ycharn/anaconda2/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "seqgan_train.py", line 146, in _main
reward, sequence_length=tf.squeeze(sequence_length), tensor_rank=2)
File "/home/ycharn/texar/texar/losses/rewards.py", line 96, in discount_reward
reward, sequence_length, discount, dtype)
File "/home/ycharn/texar/texar/losses/rewards.py", line 195, in _discount_reward_tensor_2d
reward, sequence_length, dtype=dtype, tensor_rank=2)
File "/home/ycharn/texar/texar/utils/shapes.py", line 132, in mask_sequences
sequence, sequence_length, dtype, time_major, tensor_rank)
File "/home/ycharn/texar/texar/utils/shapes.py", line 178, in _mask_sequences_tensor
max_time = tf.to_int32(tf.shape(sequence)[1])
File "/home/ycharn/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/array_ops.py", line 524, in _slice_helper
name=name)
File "/home/ycharn/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/array_ops.py", line 690, in strided_slice
shrink_axis_mask=shrink_axis_mask)
File "/home/ycharn/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/gen_array_ops.py", line 8232, in strided_slice
name=name)
File "/home/ycharn/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/ycharn/anaconda2/lib/python2.7/site-packages/tensorflow/python/util/deprecation.py", line 454, in new_func
return func(*args, **kwargs)
File "/home/ycharn/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 3155, in create_op
op_def=op_def)
File "/home/ycharn/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1717, in init
self._traceback = tf_stack.extract_stack()

InvalidArgumentError (see above for traceback): slice index 1 of dimension 0 out of bounds.
[[Node: strided_slice_6 = StridedSlice[Index=DT_INT32, T=DT_INT32, begin_mask=0, ellipsis_mask=0, end_mask=0, new_axis_mask=0, shrink_axis_mask=1, _device="/job:localhost/replica:0/task:0/device:GPU:0"](OptimizeLoss_3/gradients/mul_1_grad/Shape, OptimizeLoss_3/gradients/Softmax_grad/Sum/reduction_indices, Slice/begin, OptimizeLoss_3/gradients/Softmax_grad/Sum/reduction_indices)]]
[[Node: OptimizeLoss_3/train/update/_450 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_2189_OptimizeLoss_3/train/update", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

My tensorflow version is 1.10.1, does this error happens because of tensorflow's version?

problem with 'allow_smaller_final_batch' and 'beam_search_decode'

Hi there, I encountered an error when trying the seq2seq_attn example with iwslt14 dataset. The full error log is appended at the end of the post.

The error occurs when operating this line:

sess.run(fetches, feed_dict=feed_dict)

It indicates that the training stage is ok, and the bug occurs in the validation stage.

Here are two pieces of error logs:

2018-09-11 16:04:10.567407: W tensorflow/core/framework/op_kernel.cc:1318] OP_REQUIRES failed at tensor_array_ops.cc:1122 : Invalid argument: TensorArray bidirectional_rnn_encoder_2/bidirectional_rnn/fw/fw/dynamic_rnn/input_0_32362: Could not write to TensorArray index 0 because the value shape is [23,256] which is incompatible with the TensorArray's inferred element shape: [32,256] (consider setting infer_shape=False).
2018-09-11 16:04:10.567928: W tensorflow/core/framework/op_kernel.cc:1318] OP_REQUIRES failed at tensor_array_ops.cc:1122 : Invalid argument: TensorArray bidirectional_rnn_encoder_2/bidirectional_rnn/bw/bw/dynamic_rnn/input_0_32364: Could not write to TensorArray index 0 because the value shape is [23,256] which is incompatible with the TensorArray's inferred element shape: [32,256] (consider setting infer_shape=False).
......
Caused by op 'attention_rnn_decoder_5/tile_batch_1/Reshape', defined at:
  File "seq2seq_attn.py", line 161, in <module>
    main()
  File "seq2seq_attn.py", line 93, in main
    train_op, infer_outputs = build_model(batch, train_data)
  File "seq2seq_attn.py", line 77, in build_model
    max_decoding_length=60)
  File "/home/luban/miniconda3/envs/tf18/lib/python3.6/site-packages/texar/modules/decoders/beam_search_decode.py", line 193, in beam_search_decode
    cell = decoder_or_cell._get_beam_search_cell(beam_width=beam_width)
  File "/home/luban/miniconda3/envs/tf18/lib/python3.6/site-packages/texar/modules/decoders/rnn_decoders.py", line 545, in _get_beam_search_cell
    memory_seq_length, beam_width)
  File "/home/luban/miniconda3/envs/tf18/lib/python3.6/site-packages/tensorflow/contrib/seq2seq/python/ops/beam_search_decoder.py", line 122, in tile_batch
    return nest.map_structure(lambda t_: _tile_batch(t_, multiplier), t)
......

As there is no problem in training stage, I guess there might be something wrong in the implementation of beam_search_decode.

The error can be described as: the tensor expects a dimension of 32 while we feed 23 instead.
And I find that the batch size is 32, and there are 887 validation examples in valid.de, where 887 % 32 == 23.
Also, add 'allow_smaller_final_batch': False to the val and test item of config_iwslt14.py can get rid of the error.

But this "fix" is not what we really want. Theoretically, we are supposed to run validation and test on all the dev/test samples.

I am using tensorflow 1.8.
Please let me know if I need to provide any other environment information.

The full logs:

$ python seq2seq_attn.py --config_model config_model --config_data config_iwslt14
2018-09-11 15:50:03.793617: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2018-09-11 15:50:04.038875: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties:
name: Tesla P40 major: 6 minor: 1 memoryClockRate(GHz): 1.531
pciBusID: 0000:02:00.0
totalMemory: 22.38GiB freeMemory: 22.21GiB
2018-09-11 15:50:04.038945: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2018-09-11 15:50:04.384688: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-09-11 15:50:04.384752: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929]      0
2018-09-11 15:50:04.385091: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0:   N
2018-09-11 15:50:04.385631: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 21549 MB memory) -> physical GPU (device: 0, name: Tesla P40, pci bus id: 0000:02:00.0, compute capability: 6.1)
step=0, loss=481.5847
step=500, loss=101.2404
step=1000, loss=75.9185
step=1500, loss=102.7388
step=2000, loss=81.9897
step=2500, loss=64.7623
step=3000, loss=76.1445
step=3500, loss=81.1186
step=4000, loss=48.0918
step=4500, loss=54.7355
step=5000, loss=74.8126
2018-09-11 16:04:10.567407: W tensorflow/core/framework/op_kernel.cc:1318] OP_REQUIRES failed at tensor_array_ops.cc:1122 : Invalid argument: TensorArray bidirectional_rnn_encoder_2/bidirectional_rnn/fw/fw/dynamic_rnn/input_0_32362: Could not write to TensorArray index 0 because the value shape is [23,256] which is incompatible with the TensorArray's inferred element shape: [32,256] (consider setting infer_shape=False).
2018-09-11 16:04:10.567928: W tensorflow/core/framework/op_kernel.cc:1318] OP_REQUIRES failed at tensor_array_ops.cc:1122 : Invalid argument: TensorArray bidirectional_rnn_encoder_2/bidirectional_rnn/bw/bw/dynamic_rnn/input_0_32364: Could not write to TensorArray index 0 because the value shape is [23,256] which is incompatible with the TensorArray's inferred element shape: [32,256] (consider setting infer_shape=False).
Traceback (most recent call last):
  File "/home/luban/miniconda3/envs/tf18/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1322, in _do_call
    return fn(*args)
  File "/home/luban/miniconda3/envs/tf18/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1307, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/home/luban/miniconda3/envs/tf18/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1409, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Input to reshape is a tensor with 230 values, but the requested shape has 320
	 [[Node: attention_rnn_decoder_5/tile_batch_1/Reshape = Reshape[T=DT_INT32, Tshape=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](attention_rnn_decoder_5/tile_batch_1/Tile/_4547, attention_rnn_decoder_5/tile_batch_1/concat)]]
	 [[Node: attention_rnn_decoder_6/decoder/while/LoopCond/_4655 = _HostRecv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_790_attention_rnn_decoder_6/decoder/while/LoopCond", tensor_type=DT_BOOL, _device="/job:localhost/replica:0/task:0/device:CPU:0"](^_cloopattention_rnn_decoder_6/decoder/while/BeamSearchDecoderStep/next_beam_word_ids/y/_4501)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "seq2seq_attn.py", line 161, in <module>
    main()
  File "seq2seq_attn.py", line 149, in main
    val_bleu = _eval_epoch(sess, 'val')
  File "seq2seq_attn.py", line 125, in _eval_epoch
    sess.run(fetches, feed_dict=feed_dict)
  File "/home/luban/miniconda3/envs/tf18/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 900, in run
    run_metadata_ptr)
  File "/home/luban/miniconda3/envs/tf18/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1135, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/luban/miniconda3/envs/tf18/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1316, in _do_run
    run_metadata)
  File "/home/luban/miniconda3/envs/tf18/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1335, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Input to reshape is a tensor with 230 values, but the requested shape has 320
	 [[Node: attention_rnn_decoder_5/tile_batch_1/Reshape = Reshape[T=DT_INT32, Tshape=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](attention_rnn_decoder_5/tile_batch_1/Tile/_4547, attention_rnn_decoder_5/tile_batch_1/concat)]]
	 [[Node: attention_rnn_decoder_6/decoder/while/LoopCond/_4655 = _HostRecv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_790_attention_rnn_decoder_6/decoder/while/LoopCond", tensor_type=DT_BOOL, _device="/job:localhost/replica:0/task:0/device:CPU:0"](^_cloopattention_rnn_decoder_6/decoder/while/BeamSearchDecoderStep/next_beam_word_ids/y/_4501)]]

Caused by op 'attention_rnn_decoder_5/tile_batch_1/Reshape', defined at:
  File "seq2seq_attn.py", line 161, in <module>
    main()
  File "seq2seq_attn.py", line 93, in main
    train_op, infer_outputs = build_model(batch, train_data)
  File "seq2seq_attn.py", line 77, in build_model
    max_decoding_length=60)
  File "/home/luban/miniconda3/envs/tf18/lib/python3.6/site-packages/texar/modules/decoders/beam_search_decode.py", line 193, in beam_search_decode
    cell = decoder_or_cell._get_beam_search_cell(beam_width=beam_width)
  File "/home/luban/miniconda3/envs/tf18/lib/python3.6/site-packages/texar/modules/decoders/rnn_decoders.py", line 545, in _get_beam_search_cell
    memory_seq_length, beam_width)
  File "/home/luban/miniconda3/envs/tf18/lib/python3.6/site-packages/tensorflow/contrib/seq2seq/python/ops/beam_search_decoder.py", line 122, in tile_batch
    return nest.map_structure(lambda t_: _tile_batch(t_, multiplier), t)
  File "/home/luban/miniconda3/envs/tf18/lib/python3.6/site-packages/tensorflow/python/util/nest.py", line 375, in map_structure
    structure[0], [func(*x) for x in entries])
  File "/home/luban/miniconda3/envs/tf18/lib/python3.6/site-packages/tensorflow/python/util/nest.py", line 375, in <listcomp>
    structure[0], [func(*x) for x in entries])
  File "/home/luban/miniconda3/envs/tf18/lib/python3.6/site-packages/tensorflow/contrib/seq2seq/python/ops/beam_search_decoder.py", line 122, in <lambda>
    return nest.map_structure(lambda t_: _tile_batch(t_, multiplier), t)
  File "/home/luban/miniconda3/envs/tf18/lib/python3.6/site-packages/tensorflow/contrib/seq2seq/python/ops/beam_search_decoder.py", line 90, in _tile_batch
    ([shape_t[0] * multiplier], shape_t[1:]), 0))
  File "/home/luban/miniconda3/envs/tf18/lib/python3.6/site-packages/tensorflow/python/ops/gen_array_ops.py", line 6113, in reshape
    "Reshape", tensor=tensor, shape=shape, name=name)
  File "/home/luban/miniconda3/envs/tf18/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/home/luban/miniconda3/envs/tf18/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3392, in create_op
    op_def=op_def)
  File "/home/luban/miniconda3/envs/tf18/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1718, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): Input to reshape is a tensor with 230 values, but the requested shape has 320
	 [[Node: attention_rnn_decoder_5/tile_batch_1/Reshape = Reshape[T=DT_INT32, Tshape=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](attention_rnn_decoder_5/tile_batch_1/Tile/_4547, attention_rnn_decoder_5/tile_batch_1/concat)]]
	 [[Node: attention_rnn_decoder_6/decoder/while/LoopCond/_4655 = _HostRecv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_790_attention_rnn_decoder_6/decoder/while/LoopCond", tensor_type=DT_BOOL, _device="/job:localhost/replica:0/task:0/device:CPU:0"](^_cloopattention_rnn_decoder_6/decoder/while/BeamSearchDecoderStep/next_beam_word_ids/y/_4501)]]

no seq2seq_gan example

There is no example code for gan training with seq2seq, I know it is somewhat like a combination of seq2seq_attn and seqgan, but still not sure about how to do it specifically. Can someone show me some code to do it properly?
Thanks!!

Fixing Discriminator?

In README, it says the attribute classifier (discriminator) is pre-trained for the first 10 epoch and is fixed later. But I think in the code, the classifier is not fixed, because
train_epoch(sess, gamma, lambda_g_, epoch)
continues to train the classifier along with the generator.

shape mismatch bug in hred and cannot reproduce the result

Hi,

I am trying to reproduce the result of hred. But I encountered the following error
"ValueError: sequence_length must be a vector of length batch_size, but saw shape
: (?, ?)"

I think the bug lies in hred.py line 81, the related code block from line 78 - 82 is :
ecdr_states = encoder( context_embed, medium=['flatten', _add_source_speaker_token], sequence_length_minor=data_batch['source_length'], sequence_length_major=data_batch['source_utterance_cnt'])

I fixed it by reshaping the data_batch['source_length'] to a vector and the program can run. However the result is far worse than either the paper or the reported chart. Here is my result:

-- bleu-1 prec=0.2850111431585633, recall=0.28461374427599595
-- bleu-2 prec=0.1638788596563101, recall=0.16767654177822935
-- bleu-3 prec=0.10135466013603585, recall=0.10516786599271233
-- bleu-4 prec=0.050318838607421, recall=0.05182164173830952

Also is the n-gram bleu score here individual or cummulative(i.e. the average of 1-gram,...n-gram)? Thanks.

texar.modules.beam_search_decode raises an error when decoding with output_layer=tf.identity

texar directly builds a tensorflow BeamSearchDecoder, using the decoder's output_layer. (see Here.) However, BeamSearchDecoder only allows layer instance, therefore tf.identity causes an error.
The solution is simple: we insert an if statement to treat it specially (don't feed output_layer when it is tf.identity). This also enables beam_search_decode to use tf.identity directly. If so, its document about output_layer should be modified.

Massive data

Is it possible to provide methods for using tfrecord to do large-scale models,bacause use GAN model,will spend an unimaginable amount of time

text_style_transfer

Hi.

I'm using the implementation of Toward Controlled Generation of Text i.e. text_style_transfer to see what happens if we try to generate sentences describing situations in soccer. I have two different kinds of situations i.e. a binary c. My training data is structured as follows:

UNK UNK gör UNK första mål på straff ! --> label 0
UNK UNK utökar UNK ledning med sin andra straff för matchen . --> label 0
hörnvariant där en omarkerad UNK får avlossa skott . --> label 1

To make things easy and see if things are working out, I'm using the training set as the validation set. However, I'm seeing some interesting behaviour in the training pipeline, see code below. The crashes seem to be sporadic as they can occur in the _eval_epoch after 1 epoch, 2 epochs or 5 epochs or in the _train_epoch at seemingly arbitrary steps.

I've tried to remove data shuffling to isolate any errors related to data structure but the error persists. This is the error, where the shape [64 21] also occurs as [64 20] between runs.

 tensorflow.python.framework.errors_impl.InvalidArgumentError: assertion failed: [] [Condition x == y did not hold element-wise:] [x (sequence_sparse_softmax_cross_entropy/SparseSoftmaxCrossEntropyWithLogits/Shape_1:0) = ] [64 23] [y (sequence_sparse_softmax_cross_entropy/SparseSoftmaxCrossEntropyWithLogits/strided_slice:0) = ] [64 21]
	 [[Node: sequence_sparse_softmax_cross_entropy/SparseSoftmaxCrossEntropyWithLogits/assert_equal/Assert/Assert = Assert[T=[DT_STRING, DT_STRING, DT_STRING, DT_INT32, DT_STRING, DT_INT32], summarize=3, _device="/job:localhost/replica:0/task:0/device:CPU:0"](sequence_sparse_softmax_cross_entropy/SparseSoftmaxCrossEntropyWithLogits/assert_equal/All/_329, sequence_sparse_softmax_cross_entropy/SparseSoftmaxCrossEntropyWithLogits/assert_equal/Assert/Assert/data_0, attention_rnn_decoder_6/decoder/AttentionWrapperZeroState/assert_equal/Assert/Assert/data_1, sequence_sparse_softmax_cross_entropy/SparseSoftmaxCrossEntropyWithLogits/assert_equal/Assert/Assert/data_2, sequence_sparse_softmax_cross_entropy/SparseSoftmaxCrossEntropyWithLogits/Shape_1/_331, sequence_sparse_softmax_cross_entropy/SparseSoftmaxCrossEntropyWithLogits/assert_equal/Assert/Assert/data_4, sequence_sparse_softmax_cross_entropy/SparseSoftmaxCrossEntropyWithLogits/strided_slice/_333)]]
	 [[Node: OptimizeLoss/gradients/attention_rnn_decoder_4/TrainingHelper_1/TensorArrayUnstack/TensorArrayScatter/TensorArrayScatterV3_grad/TensorArrayGatherV3/_371 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_2312_...ayGatherV3", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]

As the error occurs inbetween epochs, I'm positive it's not related to how my data is structured. Do you have any idea what I can try next?

Thanks!

PAD token in seq2seq_attn example

I found PAD token has been counted in the vocab, and take part in the training and decoding process, does that mean we may generate a PAD token in the middle of our generated sentence?(before generating the EOS token)
This doesn't make sense for me, as in tensorflow nmt tutorial, the pad ID has not been counted in the vocab and not in the final softmax procedure.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.