l-zhe / btmpg Goto Github PK

Code for paper Pushing Paraphrase Away from Original Sentence: A Multi-Round Paraphrase Generation Approach by Zhe Lin, Xiaojun Wan. This paper is accepted by Findings of ACL'21.

License: MIT License

Python 100.00%

acl acl2021 paraphrase paraphrase-generation natural-language-generation natural-language-processing deep-learning paper

btmpg's Issues

Data Split

For Quora, there are actually 149,263 samples in total, rather than the data split reported in the paper (129,263\3k\3k). Is there a reason why not to use the full dataset? Thanks.

Overflow error

Hi,

During training, I get the following error:

Traceback (most recent call last):
  File "train.py", line 182, in <module>
    generation_save_path=args.generation_save_path)
  File "/disk/nfs/ostrom/s1717552/btmpg/utils/run.py", line 133, in __call__
    self.run()
  File "/disk/nfs/ostrom/s1717552/btmpg/utils/run.py", line 100, in run
    max_length=self.max_length)
  File "/disk/nfs/ostrom/s1717552/btmpg/model/VAE.py", line 206, in round
    out_embed = self.embed(self.GS(sentence[:, -1:, :]))
  File "/disk/nfs/ostrom/s1717552/btmpg/btmpgenv/lib64/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/disk/nfs/ostrom/s1717552/btmpg/model/gumbleSoftmax.py", line 17, in forward
    sigma = min(self.tau_max, (self.tau_max ** (self.n / self.N)))
OverflowError: (34, 'Numerical result out of range')

This happens after a few days of training, around epoch 39 for MSCOCO and epoch 77 for Quora.

The command used was:

python train.py --cuda \
                --train_source ./data/qqp_train.src \
                --train_target ./data/qqp_train.tgt \
                --test_source  ./data/qqp_dev.src \
                --test_target  ./data/qqp_dev.tgt \
                --vocab_path ./checkpoints/qqp.vocab \
                --batch_size 8 \
                --epoch 100 \
                --num_rounds 2 \
                --max_length 50 \
                --clip_length 50 \
                --model_save_path ./checkpoints/qqp.model \
                --generation_save_path ./outputs/qqp/

what is the bert-score stand for?

When I run the bert-score script, I can get precision, recall and f score, what is the bert score in paper stand for

could you share the data split of quora and mscoco?

Could you share the data split of quora and mscoco? I need them for a fair comparison, Thanks.

Lower performance with retrained model

When I use a checkpoint that I've trained from scratch instead of the checkpoint downloaded from here, performance is ~2 iBLEU lower. The command used to train the model was:

python train.py --cuda \
                --train_source ./data/qqp_train.src \
                --train_target ./data/qqp_train.tgt \
                --test_source  ./data/qqp_dev.src \
                --test_target  ./data/qqp_dev.tgt \
                --vocab_path ./checkpoints/qqp.vocab \
                --batch_size 8 \
                --epoch 100 \
                --num_rounds 2 \
                --max_length 50 \
                --clip_length 50 \
                --model_save_path ./checkpoints/qqp.model \
                --generation_save_path ./outputs/qqp/

Are there additional hyperparameters that I need to set?

l-zhe / btmpg Goto Github PK

btmpg's Issues

Data Split

Overflow error

what is the bert-score stand for?

could you share the data split of quora and mscoco?

Lower performance with retrained model

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent