desire2020 / cot Goto Github PK

View Code? Open in Web Editor NEW

74.0 74.0 27.0 1.48 MB

(Beta Version!) Experiment Code for Paper ``CoT: Cooperative Training for Generative Modeling of Discrete Data''

License: MIT License

Python 100.00%

cot's People

Stargazers

Watchers

cot's Issues

tf1.6.0 not working, and did not get good NLL oracle result.

Hi,

First I tried tf 1.6.0 there was some complicated tf bug, so I switched to tf 1.13.1

After running for a night, I get the following (see below):
I don't know whether it's over-fitting or other issues.
I would be good if the authors can show (in the README), at what particular batch we can get a good oracle NLL.

Thanks!

batch: 87700 nll_oracle: 9.902623
batch: 87700 nll_test 7.6996207
mediator cooptrain iter#87700, balanced_nll 6.823340
mediator cooptrain iter#87710, balanced_nll 6.853920
mediator cooptrain iter#87720, balanced_nll 6.838597
mediator cooptrain iter#87730, balanced_nll 6.765410
mediator cooptrain iter#87740, balanced_nll 6.852599
mediator cooptrain iter#87750, balanced_nll 6.825665
mediator cooptrain iter#87760, balanced_nll 6.850584
mediator cooptrain iter#87770, balanced_nll 6.827829
mediator cooptrain iter#87780, balanced_nll 6.859410
mediator cooptrain iter#87790, balanced_nll 6.784107
batch: 87800 nll_oracle: 9.896609
batch: 87800 nll_test 7.7063065
mediator cooptrain iter#87800, balanced_nll 6.833647
mediator cooptrain iter#87810, balanced_nll 6.837624
mediator cooptrain iter#87820, balanced_nll 6.833254
cooptrain epoch# 563 jsd 6.7449245
mediator cooptrain iter#87830, balanced_nll 6.858107
mediator cooptrain iter#87840, balanced_nll 6.871158
mediator cooptrain iter#87850, balanced_nll 6.824977
mediator cooptrain iter#87860, balanced_nll 6.804533
mediator cooptrain iter#87870, balanced_nll 6.796575

Why is hidden dimension so small like 64/32?

The paper claimed that the training of CoT is more stable than ordinary GAN, Seq/LeakGAN and MLE in some sense. But the recommended hidden dimension of CoT is 64 for M and 32 for G. This is even smaller than in LeakGAN. Doesn't the stability of training and cheap computational demand allow a larger architecture to be trained?

By the way, it would be very helpful if you can release sample sentences.

Regarding the relative self-BLEU scores you calculated

Isn't RSBLEU being closer to 1.0 much more important than being lower than 1.0, since lack of diversity (mode-collapse) and too much diversity (exposure bias) are equally bad? Aren't MaliGAN and RankGAN superior to CoT in Table 3, though the test loss is much worse?

In fact, I just realized that Texygen has two options: get_bleu_fast and get_bleu, the latter of which uses a whole test data as a reference rather than 500 sentences from it. I hope all the BLEU scores for WMT News published came from get_bleu. It was mentioned in the original BLEU paper by Papineni et. al that using different numbers of reference sentences produces different results. Also, Texygen made all the sentences in lower case, which I hope you did, too. I calculated the self-BLEU-2 of WMT test dataset and obtained 0.862. On the other hand, from the BLEU-2 of MLE in your survey paper and the self-BLEU-2 of MLE in your cot paper, I calculated your self-BLEU-2 of test dataset to be 0.875. This is strange, since the value should match exactly. What do you think is the cause of this discrepancy? If you're fine, could you tell me the self-BLEU-n of test dataset for other n?

TypeError: reduce_max() got an unexpected keyword argument 'keepdims'

Hi ,
Try to python cot.py
File "F:\study\EBGAN\CoT\Cooperative-Training\generator.py", line 57, in _g_recurrence
T_t = tf.stop_gradient(tf.reduce_max(-log_prob, axis=-1, keepdims=True))

TypeError: reduce_max() got an unexpected keyword argument 'keepdims'

Thonk you for your help

desire2020 / cot Goto Github PK

cot's People

Stargazers

Watchers

Forkers

cot's Issues

tf1.6.0 not working, and did not get good NLL oracle result.

Why is hidden dimension so small like 64/32?

Regarding the relative self-BLEU scores you calculated

TypeError: reduce_max() got an unexpected keyword argument 'keepdims'

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent