Giter Club home page Giter Club logo

transformersdataaugmentation's Introduction

Hi there πŸ‘‹

I do NLP @ Alexa.

transformersdataaugmentation's People

Contributors

varunkumar-dev avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

transformersdataaugmentation's Issues

Could you elaborate more on the extrinsic evaluation?

You mentioned in the paper that you randomly sampled 1% of the training set and 5 of each class for the validation set. I tried to replicate the baseline results on SST-2 by fine-tuning bert-base-uncased (as mentioned in the paper), but the results are much higher than the target numbers.

Your Paper: 59.08 (5.59) [15 trials]
My Attempt: 72.89 (6.36) [9 trials]

I could probably increase the number of trials to see if I was just unlucky, but it is unlikely that statistical variance could deviate the numbers that much. Could you provide more details about your experiments? Did you sample the datasets with different seeds for each trial?

BTW I am using the dataset provided by the authors of CBERT (training set size 6,228). Thanks in advance.

Dev loss does not drop when finetuning

hello, I am reusing the code for finetuning the model , such as CBERT. However, the dev loss does not drop while the train loss on train set seems normal. I wonder if this is OK?

here is the fine tuning log:

07/09/2021 11:30:11 - INFO - main - ***** Running training *****
07/09/2021 11:30:11 - INFO - main - Num examples = 200
07/09/2021 11:30:11 - INFO - main - Batch size = 8
07/09/2021 11:30:11 - INFO - main - Num steps = 250
Epoch: 0%| | 0/10 [00:00<?, ?it/s]/root/yanan/env_cbert/lib/python3.6/site-packages/transformers/optimization.py:155: UserWarning: This overload of add_ is deprecated:
add_(Number alpha, Tensor other)
Consider using one of the following signatures instead:
add_(Tensor other, *, Number alpha) (Triggered internally at /pytorch/torch/csrc/utils/python_arg_parser.cpp:766.)
exp_avg.mul_(beta1).add_(1.0 - beta1, grad)
07/09/2021 11:30:15 - INFO - main - Epoch 0, Dev loss 68.01114630699158
07/09/2021 11:30:15 - INFO - main - Epoch 0, Train loss 30.59469723701477
07/09/2021 11:30:15 - INFO - main - Saving model. Best dev so far 68.01114630699158
Epoch: 10%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1/10 [00:06<00:54, 6.06s/it]07/09/2021 11:30:20 - INFO - main - Epoch 1, Dev loss 71.50074660778046
07/09/2021 11:30:20 - INFO - main - Epoch 1, Train loss 7.842194274067879
Epoch: 20%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2/10 [00:09<00:42, 5.27s/it]07/09/2021 11:30:24 - INFO - main - Epoch 2, Dev loss 74.61862516403198
07/09/2021 11:30:24 - INFO - main - Epoch 2, Train loss 1.4141736282035708
Epoch: 30%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 3/10 [00:12<00:33, 4.72s/it]07/09/2021 11:30:27 - INFO - main - Epoch 3, Dev loss 75.86035788059235
07/09/2021 11:30:27 - INFO - main - Epoch 3, Train loss 0.6085959081538022
Epoch: 40%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 4/10 [00:16<00:26, 4.35s/it]07/09/2021 11:30:31 - INFO - main - Epoch 4, Dev loss 76.05813992023468
07/09/2021 11:30:31 - INFO - main - Epoch 4, Train loss 0.12450884422287345
Epoch: 50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 5/10 [00:19<00:20, 4.08s/it]07/09/2021 11:30:34 - INFO - main - Epoch 5, Dev loss 76.5591652393341
07/09/2021 11:30:34 - INFO - main - Epoch 5, Train loss 0.0748452718835324
Epoch: 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 6/10 [00:23<00:15, 3.90s/it]07/09/2021 11:30:38 - INFO - main - Epoch 6, Dev loss 77.40109157562256
07/09/2021 11:30:38 - INFO - main - Epoch 6, Train loss 0.09374479297548532
Epoch: 70%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 7/10 [00:26<00:11, 3.77s/it]07/09/2021 11:30:41 - INFO - main - Epoch 7, Dev loss 77.90590262413025
07/09/2021 11:30:41 - INFO - main - Epoch 7, Train loss 0.10057421837700531
Epoch: 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 8/10 [00:30<00:07, 3.67s/it]07/09/2021 11:30:44 - INFO - main - Epoch 8, Dev loss 77.9272027015686
07/09/2021 11:30:44 - INFO - main - Epoch 8, Train loss 0.03545364388264716
Epoch: 90%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 9/10 [00:33<00:03, 3.62s/it]07/09/2021 11:30:48 - INFO - main - Epoch 9, Dev loss 77.97029888629913
07/09/2021 11:30:48 - INFO - main - Epoch 9, Train loss 0.49601738521596417
Epoch: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 10/10 [00:37<00:00, 3.72s/it]

Some finetune and augment question for bert and gpt2

Hi, I'm trying to reproduce your experiment.

I trained the 61 examples using BERT classifier as baseline, but it gets 72.49% accuracy, and EDA gets 74.57% accuracy, but BERT-finetune and GPT2-finetune only get the 64% and 66% accuracy

I have some question while doing finetune BERT by MLM and GPT2 by CLM by using this two code: https://github.com/huggingface/transformers/blob/master/examples/language-modeling/run_mlm.py
https://github.com/huggingface/transformers/blob/master/examples/language-modeling/run_clm.py

  1. How do you select the best model when finetuning complete? Is just set the flag --load_best_model_at_end?

  2. How do you mask the tokens when augment by fine-tuned BERT? Is using the DataCollector.py? Masking the whole word or the single tokens?

  3. Can you tell more details about the GPT2 finetune details? Because I get the mini-perplexity for 47 by epochs=10, I am confused about how to get the best fine-tune GPT2 model for augmentation.

Thank you!!!

load model by python

Hello, I loaded the trained model by
model=BARTModel.from_pretrained('./src/utils/datasets/snips/exp_0_10/bart_word_mask_0.40_checkpoints',checkpoint_file='checkpoint_best.pt',data_name_or_path="./src/utils/datasets/snips/exp_0_10/jointdatabin").

but I got this error:
Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/opt/conda/lib/python3.8/site-packages/fairseq/models/bart/model.py", line 115, in from_pretrained x = hub_utils.from_pretrained( File "/opt/conda/lib/python3.8/site-packages/fairseq/hub_utils.py", line 70, in from_pretrained models, args, task = checkpoint_utils.load_model_ensemble_and_task( File "/opt/conda/lib/python3.8/site-packages/fairseq/checkpoint_utils.py", line 279, in load_model_ensemble_and_task state = load_checkpoint_to_cpu(filename, arg_overrides) File "/opt/conda/lib/python3.8/site-packages/fairseq/checkpoint_utils.py", line 232, in load_checkpoint_to_cpu state = _upgrade_state_dict(state) File "/opt/conda/lib/python3.8/site-packages/fairseq/checkpoint_utils.py", line 434, in _upgrade_state_dict registry.set_defaults(state["args"], tasks.TASK_REGISTRY[state["args"].task]) KeyError: 'mask_s2s'

what should I do to load the trained model correctly?

Could you elaborate more on using pre-trained seq2seq model?

Thanks for your work, and I gained a lot of insight from it!

When trying to implement your methods I have some issues on the seq2seq model, as there is very limited support for BART in existing toolkits and there are not many details in your paper about using BART. Can you provide more information on the hyper-parameters for finetuning including number of epochs and learning rate?

Also for generation, your paper says "beam search with a beam size of 5", but how do you generate the augmented data? The way I can think is to use the sample() method, but I cannot figure out more details.

about BART generation

what is the input of the model when generating? I feed 'ySEP' as input, but the output is the same as input. Thanks~

Request for experimental details - how many synthetic samples per original sample?

Hi Authors!

I came across your paper and think it's a great contribution! I have a few questions on the experiments, and would appreciate if you could clarify -

Extrinsic evaluation:

  1. Could you share how you combine (in what ratios) the synthetic and original training sets for the results in Table 4? For instance, for GPT2_context you mention in Section 2.1.2 that the prompt is y_i SEP w_1 w_2 .. w_k, and you use nucleus sampling. How many samples do you use for augmentation? Also, since the finetuning set is so small (61 samples or 1% of SST-2), wouldn't these models just re-create the original sequences? Did you face/have to bypass this issue?

  2. What is the base classifier architecture used in extrinsic evaluation to which you feed the augmented set?

  3. During training of this base classifier do you randomly sample examples from the augmented set or are we required to ensure some ratio of pure and synthetic samples?

  4. Does your c-BERT baseline in row 3 of Table 4 use the 10x generation and selection based on model confidence score as done in the original paper (that may lead to unfair comparison with other methods)?

Intrinsic evaluation:

  1. In 3.3.1 you mention that the best classifier is selected based on the performance on the dev partition - does this partition refer to the set of 10 samples from the dev set of a single run?

also, finally, any estimates on when you could release your implementation here? :)

So AE models can't generate text right?

The paper uses 3 kinds of pre-trained models, AR and Seq2Seq both have fine-tuned and text generation phase, so the AE models just only need pre-trained?
And how the AE models generates text? Input like this [y_i, x_1, , ...., x_m]?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.