Code associated with the "Data Augmentation using Pre-trained Transformer Models" paper

License: Other

Python 80.02% Shell 19.98%

bart bert classification data-augmentation gpt-2 nlp text-augmentation text-classification transformers

transformersdataaugmentation's Introduction

Hi there 👋

I do NLP @ Alexa.

transformersdataaugmentation's People

Contributors

Stargazers

Watchers

Forkers

kiminh 1024er euhkim akkarimi yananchen1989 sz2629 mars-wei lexafaxine ptkjw1997 dumpmemory anonymousdestroyer sonal-511 bigevilking mcps5601 gavingx parul0611 sml8648 sopolat zzrhh nkkkyyy

transformersdataaugmentation's Issues

Could you elaborate more on the extrinsic evaluation?

You mentioned in the paper that you randomly sampled 1% of the training set and 5 of each class for the validation set. I tried to replicate the baseline results on SST-2 by fine-tuning bert-base-uncased (as mentioned in the paper), but the results are much higher than the target numbers.

Your Paper: 59.08 (5.59) [15 trials]
My Attempt: 72.89 (6.36) [9 trials]

I could probably increase the number of trials to see if I was just unlucky, but it is unlikely that statistical variance could deviate the numbers that much. Could you provide more details about your experiments? Did you sample the datasets with different seeds for each trial?

BTW I am using the dataset provided by the authors of CBERT (training set size 6,228). Thanks in advance.

Dev loss does not drop when finetuning

hello, I am reusing the code for finetuning the model , such as CBERT. However, the dev loss does not drop while the train loss on train set seems normal. I wonder if this is OK?

here is the fine tuning log:

07/09/2021 11:30:11 - INFO - main - ***** Running training *****
07/09/2021 11:30:11 - INFO - main - Num examples = 200
07/09/2021 11:30:11 - INFO - main - Batch size = 8
07/09/2021 11:30:11 - INFO - main - Num steps = 250
Epoch: 0%| | 0/10 [00:00<?, ?it/s]/root/yanan/env_cbert/lib/python3.6/site-packages/transformers/optimization.py:155: UserWarning: This overload of add_ is deprecated:
add_(Number alpha, Tensor other)
Consider using one of the following signatures instead:
add_(Tensor other, *, Number alpha) (Triggered internally at /pytorch/torch/csrc/utils/python_arg_parser.cpp:766.)
exp_avg.mul_(beta1).add_(1.0 - beta1, grad)
07/09/2021 11:30:15 - INFO - main - Epoch 0, Dev loss 68.01114630699158
07/09/2021 11:30:15 - INFO - main - Epoch 0, Train loss 30.59469723701477
07/09/2021 11:30:15 - INFO - main - Saving model. Best dev so far 68.01114630699158
Epoch: 10%|█████ | 1/10 [00:06<00:54, 6.06s/it]07/09/2021 11:30:20 - INFO - main - Epoch 1, Dev loss 71.50074660778046
07/09/2021 11:30:20 - INFO - main - Epoch 1, Train loss 7.842194274067879
Epoch: 20%|██████████ | 2/10 [00:09<00:42, 5.27s/it]07/09/2021 11:30:24 - INFO - main - Epoch 2, Dev loss 74.61862516403198
07/09/2021 11:30:24 - INFO - main - Epoch 2, Train loss 1.4141736282035708
Epoch: 30%|███████████████ | 3/10 [00:12<00:33, 4.72s/it]07/09/2021 11:30:27 - INFO - main - Epoch 3, Dev loss 75.86035788059235
07/09/2021 11:30:27 - INFO - main - Epoch 3, Train loss 0.6085959081538022
Epoch: 40%|████████████████████ | 4/10 [00:16<00:26, 4.35s/it]07/09/2021 11:30:31 - INFO - main - Epoch 4, Dev loss 76.05813992023468
07/09/2021 11:30:31 - INFO - main - Epoch 4, Train loss 0.12450884422287345
Epoch: 50%|█████████████████████████ | 5/10 [00:19<00:20, 4.08s/it]07/09/2021 11:30:34 - INFO - main - Epoch 5, Dev loss 76.5591652393341
07/09/2021 11:30:34 - INFO - main - Epoch 5, Train loss 0.0748452718835324
Epoch: 60%|██████████████████████████████ | 6/10 [00:23<00:15, 3.90s/it]07/09/2021 11:30:38 - INFO - main - Epoch 6, Dev loss 77.40109157562256
07/09/2021 11:30:38 - INFO - main - Epoch 6, Train loss 0.09374479297548532
Epoch: 70%|███████████████████████████████████ | 7/10 [00:26<00:11, 3.77s/it]07/09/2021 11:30:41 - INFO - main - Epoch 7, Dev loss 77.90590262413025
07/09/2021 11:30:41 - INFO - main - Epoch 7, Train loss 0.10057421837700531
Epoch: 80%|████████████████████████████████████████ | 8/10 [00:30<00:07, 3.67s/it]07/09/2021 11:30:44 - INFO - main - Epoch 8, Dev loss 77.9272027015686
07/09/2021 11:30:44 - INFO - main - Epoch 8, Train loss 0.03545364388264716
Epoch: 90%|█████████████████████████████████████████████ | 9/10 [00:33<00:03, 3.62s/it]07/09/2021 11:30:48 - INFO - main - Epoch 9, Dev loss 77.97029888629913
07/09/2021 11:30:48 - INFO - main - Epoch 9, Train loss 0.49601738521596417
Epoch: 100%|█████████████████████████████████████████████████| 10/10 [00:37<00:00, 3.72s/it]

Some finetune and augment question for bert and gpt2

Hi, I'm trying to reproduce your experiment.

I trained the 61 examples using BERT classifier as baseline, but it gets 72.49% accuracy, and EDA gets 74.57% accuracy, but BERT-finetune and GPT2-finetune only get the 64% and 66% accuracy

I have some question while doing finetune BERT by MLM and GPT2 by CLM by using this two code: https://github.com/huggingface/transformers/blob/master/examples/language-modeling/run_mlm.py
https://github.com/huggingface/transformers/blob/master/examples/language-modeling/run_clm.py

How do you select the best model when finetuning complete? Is just set the flag --load_best_model_at_end?
How do you mask the tokens when augment by fine-tuned BERT? Is using the DataCollector.py? Masking the whole word or the single tokens?
Can you tell more details about the GPT2 finetune details? Because I get the mini-perplexity for 47 by epochs=10, I am confused about how to get the best fine-tune GPT2 model for augmentation.

Thank you!!!

load model by python

Hello, I loaded the trained model by
model=BARTModel.from_pretrained('./src/utils/datasets/snips/exp_0_10/bart_word_mask_0.40_checkpoints',checkpoint_file='checkpoint_best.pt',data_name_or_path="./src/utils/datasets/snips/exp_0_10/jointdatabin").

but I got this error:
Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/opt/conda/lib/python3.8/site-packages/fairseq/models/bart/model.py", line 115, in from_pretrained x = hub_utils.from_pretrained( File "/opt/conda/lib/python3.8/site-packages/fairseq/hub_utils.py", line 70, in from_pretrained models, args, task = checkpoint_utils.load_model_ensemble_and_task( File "/opt/conda/lib/python3.8/site-packages/fairseq/checkpoint_utils.py", line 279, in load_model_ensemble_and_task state = load_checkpoint_to_cpu(filename, arg_overrides) File "/opt/conda/lib/python3.8/site-packages/fairseq/checkpoint_utils.py", line 232, in load_checkpoint_to_cpu state = _upgrade_state_dict(state) File "/opt/conda/lib/python3.8/site-packages/fairseq/checkpoint_utils.py", line 434, in _upgrade_state_dict registry.set_defaults(state["args"], tasks.TASK_REGISTRY[state["args"].task]) KeyError: 'mask_s2s'

what should I do to load the trained model correctly?

When do you release your code ?Thanks

Could you elaborate more on using pre-trained seq2seq model?

Thanks for your work, and I gained a lot of insight from it!

When trying to implement your methods I have some issues on the seq2seq model, as there is very limited support for BART in existing toolkits and there are not many details in your paper about using BART. Can you provide more information on the hyper-parameters for finetuning including number of epochs and learning rate?

Also for generation, your paper says "beam search with a beam size of 5", but how do you generate the augmented data? The way I can think is to use the sample() method, but I cannot figure out more details.

about BART generation

what is the input of the model when generating？ I feed 'ySEP' as input, but the output is the same as input. Thanks~

Request for experimental details - how many synthetic samples per original sample?

Hi Authors!

I came across your paper and think it's a great contribution! I have a few questions on the experiments, and would appreciate if you could clarify -

Extrinsic evaluation:

Could you share how you combine (in what ratios) the synthetic and original training sets for the results in Table 4? For instance, for GPT2_context you mention in Section 2.1.2 that the prompt is y_i SEP w_1 w_2 .. w_k, and you use nucleus sampling. How many samples do you use for augmentation? Also, since the finetuning set is so small (61 samples or 1% of SST-2), wouldn't these models just re-create the original sequences? Did you face/have to bypass this issue?
What is the base classifier architecture used in extrinsic evaluation to which you feed the augmented set?
During training of this base classifier do you randomly sample examples from the augmented set or are we required to ensure some ratio of pure and synthetic samples?
Does your c-BERT baseline in row 3 of Table 4 use the 10x generation and selection based on model confidence score as done in the original paper (that may lead to unfair comparison with other methods)?

Intrinsic evaluation:

In 3.3.1 you mention that the best classifier is selected based on the performance on the dev partition - does this partition refer to the set of 10 samples from the dev set of a single run?

also, finally, any estimates on when you could release your implementation here? :)

So AE models can't generate text right?

The paper uses 3 kinds of pre-trained models, AR and Seq2Seq both have fine-tuned and text generation phase, so the AE models just only need pre-trained?
And how the AE models generates text? Input like this [y_i, x_1, , ...., x_m]?

varunkumar-dev / transformersdataaugmentation Goto Github PK

transformersdataaugmentation's Introduction

Hi there 👋

transformersdataaugmentation's People

Contributors

Stargazers

Watchers

Forkers

transformersdataaugmentation's Issues

Could you elaborate more on the extrinsic evaluation?

Dev loss does not drop when finetuning

Some finetune and augment question for bert and gpt2

load model by python

When do you release your code ?Thanks

Could you elaborate more on using pre-trained seq2seq model?

about BART generation

Request for experimental details - how many synthetic samples per original sample?

So AE models can't generate text right?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent