desire2020 / cot Goto Github PK
View Code? Open in Web Editor NEW(Beta Version!) Experiment Code for Paper ``CoT: Cooperative Training for Generative Modeling of Discrete Data''
License: MIT License
(Beta Version!) Experiment Code for Paper ``CoT: Cooperative Training for Generative Modeling of Discrete Data''
License: MIT License
Hi ,
Try to python cot.py
File "F:\study\EBGAN\CoT\Cooperative-Training\generator.py", line 57, in _g_recurrence
T_t = tf.stop_gradient(tf.reduce_max(-log_prob, axis=-1, keepdims=True))
TypeError: reduce_max() got an unexpected keyword argument 'keepdims'
Thonk you for your help
Hi,
First I tried tf 1.6.0 there was some complicated tf bug, so I switched to tf 1.13.1
After running for a night, I get the following (see below):
I don't know whether it's over-fitting or other issues.
I would be good if the authors can show (in the README), at what particular batch we can get a good oracle NLL.
Thanks!
batch: 87700 nll_oracle: 9.902623
batch: 87700 nll_test 7.6996207
mediator cooptrain iter#87700, balanced_nll 6.823340
mediator cooptrain iter#87710, balanced_nll 6.853920
mediator cooptrain iter#87720, balanced_nll 6.838597
mediator cooptrain iter#87730, balanced_nll 6.765410
mediator cooptrain iter#87740, balanced_nll 6.852599
mediator cooptrain iter#87750, balanced_nll 6.825665
mediator cooptrain iter#87760, balanced_nll 6.850584
mediator cooptrain iter#87770, balanced_nll 6.827829
mediator cooptrain iter#87780, balanced_nll 6.859410
mediator cooptrain iter#87790, balanced_nll 6.784107
batch: 87800 nll_oracle: 9.896609
batch: 87800 nll_test 7.7063065
mediator cooptrain iter#87800, balanced_nll 6.833647
mediator cooptrain iter#87810, balanced_nll 6.837624
mediator cooptrain iter#87820, balanced_nll 6.833254
cooptrain epoch# 563 jsd 6.7449245
mediator cooptrain iter#87830, balanced_nll 6.858107
mediator cooptrain iter#87840, balanced_nll 6.871158
mediator cooptrain iter#87850, balanced_nll 6.824977
mediator cooptrain iter#87860, balanced_nll 6.804533
mediator cooptrain iter#87870, balanced_nll 6.796575
The paper claimed that the training of CoT is more stable than ordinary GAN, Seq/LeakGAN and MLE in some sense. But the recommended hidden dimension of CoT is 64 for M and 32 for G. This is even smaller than in LeakGAN. Doesn't the stability of training and cheap computational demand allow a larger architecture to be trained?
By the way, it would be very helpful if you can release sample sentences.
Isn't RSBLEU being closer to 1.0 much more important than being lower than 1.0, since lack of diversity (mode-collapse) and too much diversity (exposure bias) are equally bad? Aren't MaliGAN and RankGAN superior to CoT in Table 3, though the test loss is much worse?
In fact, I just realized that Texygen has two options: get_bleu_fast and get_bleu, the latter of which uses a whole test data as a reference rather than 500 sentences from it. I hope all the BLEU scores for WMT News published came from get_bleu. It was mentioned in the original BLEU paper by Papineni et. al that using different numbers of reference sentences produces different results. Also, Texygen made all the sentences in lower case, which I hope you did, too. I calculated the self-BLEU-2 of WMT test dataset and obtained 0.862. On the other hand, from the BLEU-2 of MLE in your survey paper and the self-BLEU-2 of MLE in your cot paper, I calculated your self-BLEU-2 of test dataset to be 0.875. This is strange, since the value should match exactly. What do you think is the cause of this discrepancy? If you're fine, could you tell me the self-BLEU-n of test dataset for other n?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.