crux82 / ganbert-pytorch Goto Github PK

View Code? Open in Web Editor NEW

91.0 4.0 18.0 111 KB

Enhancing the BERT training with Semi-supervised Generative Adversarial Networks in Pytorch/HuggingFace

License: Apache License 2.0

Jupyter Notebook 100.00%

bert huggingface gan generative-adversarial-network semi-supervised-learning pytorch pythonbook text-classification

ganbert-pytorch's People

Contributors

Stargazers

Watchers

Forkers

malyang joywang233 antonmosin igoramli zmskye houangnt arnedefauw b-127 chandru4ni hoangthangta mehdi-mirzapour samarthmm akboddupalli whnhch aprmswra techthiyanes raakesh1305 rahmangithub

ganbert-pytorch's Issues

Model errors when tried to change the dataset

I am trying to run the notebook against a different dataset (QADI) and changed the data format to the same format specified in the example dataset

I just changed the get_qc_examples function to directly read the line and class from the tab-separated file:

def get_qc_examples(input_file):
  """Creates examples for the training and dev sets."""
  examples = []

  with open(input_file, 'r') as f:
      contents = f.read()
      file_as_list = contents.splitlines()
      for line in file_as_list[1:]:
          # split = line.split(" ")
          # question = ' '.join(split[1:])

          # text_a = question
          # inn_split = split[0].split(":")
          # label = inn_split[0] + "_" + inn_split[1]

          split = line.split("\t")
          question = split[1]

          text_a = question
          label = split[0]
          examples.append((text_a, label))
      f.close()

  return examples

I also came with some unlabeled data, set them to the same file format with the label set as UNK_UNK, changed the labels list to:

label_list = ['UNK_UNK','Algeria', 'Bahrain', 'Egypt', 'Iraq', 'Jordan', 'Kuwait', 'Lebanon', 'Libya', 'Morocco', 'Oman', 'Palestine', 'Qatar', 'Saudi_Arabia', 'Sudan', 'Syria', 'Tunisia', 'United_Arab_Emirates', 'Yemen']

but I am getting an error at the end of the epoch (after training the whole epoch):

RuntimeError                              Traceback (most recent call last)
<ipython-input-11-3e8566791cab> in <module>()
    111         # so the loss evaluated for unlabeled data is ignored (masked)
    112         label2one_hot = torch.nn.functional.one_hot(b_labels, len(label_list))
--> 113         per_example_loss = -torch.sum(label2one_hot * log_probs, dim=-1)
    114         per_example_loss = torch.masked_select(per_example_loss, b_label_mask.to(device))
    115         labeled_example_count = per_example_loss.type(torch.float32).numel()

RuntimeError: The size of tensor a (63) must match the size of tensor b (64) at non-singleton dimension 0

Am I doing something wrong? can you help me tackle this issue?

Is it possible to obtain the text representation generated fake examples?

Thank you for this Pytorch implementation! I'm quite new with GANs and curious if it's possible to text representation of the generated examples. Thought this would be possible since GANs when applied in computer vision tasks can generate images (i.e human faces). Hoping to hear from you soon :)

Feature matching loss

It seems there is a tiny mismatch between the paper formula and the code on feature matching loss:

This is not exactly an L2-norm:

g_feat_reg = torch.mean(torch.pow(torch.mean(D_real_features, dim=0) - torch.mean(D_fake_features, dim=0), 2))

It should be like:

g_feat_reg = torch.sum(torch.pow(torch.mean(D_real_features, dim=0) - torch.mean(D_fake_features, dim=0), 2))

But I guess it doesn't change the outcome other than affecting the learning rates.

Is the number of unlabeled examples related to the model accuracy?

Hi,

First, thank for your great contribution. From the paper, I know that unlabeled examples improve the inner representation or generalize the representation. But I wonder that the number of them will improve the model accuracy or not. In the experiment, you use 5343 unlabeled examples and they are relatively big compared to labeled examples and test examples. Have you tried any experiment with this?

Number of discriminator output

Hello,
Thanks for sharing your code. I am trying to understand the GAN-BERT paper. I faced an issue in the code that the reason is not clear to me.

In the python notebook, we have a dataset with 50 classes, in the "label_list" the name of each class is given, also "UNK_UNK" is added for unlabeled data. Therefore wherever len(label_list) is used, it is equal to 51.

On the other side, the discriminator's output is the number of classes +1, since discriminator is not only should discriminate between real and the fake examples, also need to classify real ones. Therefore, if an example is fake, it should be classified as class 51, otherwise, the discriminator assigns a class to the example.

Here is what I do not understand, when we are going to initialize the discriminator, we uselen(label_list) = 51, as the input argument for the number of classes. inside the discriminator also add +1 to the 51, so the discriminator output is 52.

Then, when the supervised learning loss is being calculated in the training phase, we take the logits = D_real_logits[:,0:-1] which means all the output logit except the last one (which is related to the example being fake or real and the last logit is used to calculate the unsupervised loss). In this code, logits = D_real_logits[:,0:-1] length is 51 while we have 50 classes. Also, in the evaluation time, when the test prediction and loss on the test set are calculated, filtered_logits takes all the output logit except the last one has the length of 51 while we have 50 labels. I wondered if there is any problem with the code or I did not fully understand the paper?

Thanks!

Deploy the model to use with live data

I have been trying to test and use the model. I am unable to create a function out of the model to try and generate a prediction for real/new/live data. I have made some tweaks to the model but haven't been able to really use it for testing.

Train Accuracy in evaluation mode is decreased by every epoch

I did the same evaluation after every epoch for test set, but now is for train set. However the training accuracy is decreased while the train loss is increased, whap happened here?

`# ========================================
# TEST ON THE TRAINING DATASET
# ========================================
# After the completion of each training epoch, measure our performance on
# our training set.

    print("")
    print("Running Training...")

    t1 = time.time()
    
    # Tracking variables for train set
    total_train_loss = 0
    all_train_preds = []
    all_train_labels_ids = []
    nll_train_loss = torch.nn.CrossEntropyLoss(ignore_index=-1) #loss

    # Evaluate data for one epoch
    for batch in train_dataloader:
        
        # Unpack this training batch from our dataloader. 
        b_input_ids = batch[0].to(device)
        b_input_mask = batch[1].to(device)
        b_labels = batch[2].to(device)
        
        # Tell pytorch not to bother with constructing the compute graph during
        # the forward pass, since this is only needed for backprop (training).
        with torch.no_grad():        
            model_outputs = transformer(b_input_ids, attention_mask=b_input_mask)
            hidden_states = model_outputs[-1]
            _, logits, probs = discriminator(hidden_states)
            ###log_probs = F.log_softmax(probs[:,1:], dim=-1)
            filtered_logits = logits[:,0:-1]
            # Accumulate the test loss.
            total_train_loss += nll_train_loss(filtered_logits, b_labels)
            
        # Accumulate the predictions and the input labels
        _, preds = torch.max(filtered_logits, 1)
        all_train_preds += preds.detach().cpu()
        all_train_labels_ids += b_labels.detach().cpu()

    # Report the final accuracy for this validation run.
    all_train_preds = torch.stack(all_train_preds).numpy()
    all_train_labels_ids = torch.stack(all_train_labels_ids).numpy()
    train_accuracy = np.sum(all_train_preds == all_train_labels_ids) / len(all_train_preds)
    print("  Train Accuracy: {0:.3f}".format(train_accuracy))

    # Calculate the average loss over all of the batches.
    avg_train_loss = total_train_loss / len(train_dataloader)
    avg_train_loss = avg_train_loss.item()
    
    # Measure how long the validation run took.
    train_time = format_time(time.time() - t1)
    
    print("  Train loss: {0:.3f}".format(avg_train_loss))
    print("  Train took: {:}".format(train_time))`

how do we use the trained model

Hi thanks for the great demo, i am confused how to use the trained model however..

do you have tensorflow code?thankyou

Accuracy problem

I am very happy to see such an excellent work. I am very interested in your article.
But I use your code, without any changes, the running results always change, and it is difficult to have an accuracy of 65.4%, why is this?
Thank you very much

Save and deploy Model

Hi, is there a way to save and deploy the Ganbert model for production?

Gan BERT for Multilabel Intent Classification

Hi,
Thanks for the amazing research and the code implementation, is there any way to use this package for multi label intent classification ?

Saving Best model during training

Hi, this is super useful thanks.
Is there a way to save the best model during the training? I tried to follow your AILC_Lectures_2021_Training_BERT_based_models_in_few_lines_of_code.ipynb, but it is not working here as GAN has generator, discriminator.

Unable to reproduce the results for 20-News dataset

Thanks for the excellent project! I'm new to GAN and was trying to reproduce the results(reported in the paper) for the 20News dataset. However, my testing accuracy stuck at about 5.2% no matter whether I used a 1% labelled or 10% labelled training dataset. (I tried 1%, 2%, 10%-50% but almost got same results.) Also, the generator training loss is extremely big like up to 1343123137304389. I used my own dataset with different ratios of labelled datasets and the highest accuracy I got is only 38%.

Just wondering does anyone was able to reproduce the results or perhaps knows what is going wrong?
I trained 20News dataset for 15 epochs, lr= 5e-6, dropout = 0.1, noise_size = 100, max_seq_length = 256, batch size = 64.
Appreciate your help!

Generator doesn't work

Hi,
How are you?
Thanks for this great work. I have one question: I ran the pytorch code and the generator doesn't work. It means: only model learned and saved, I used generator to generate the random noise and discriminator classify label. However, it always classify to one single or very few labels. I tried so many ways: change noise to normal distribution, add some more layers , but it looks like generator doesn't learn ...

Can please give some guidance or workable code? thanks a lot!

Look forward to hearing from you soon!

Calculate F1

Is it possible to calculate the F1-score of the model?
As far as I can tell currently only accuracy and loss is reported for the completed model.

Normal distibution

The paper says "G inputs consist of noise vectors drawn from a normal distribution N(0, 1)" but in this implementation seems to based on uniform random inputs. Perhaps they have similar results :)

Electra support

Is there any way to use an electra bert model with this?

How to show some illustration after complete train model ?

Support for validation set instead of evaluating on test set directly.

I see you only evaluate the test set in each epoch, can we add a validation set, with early stopping criteria based on the results/loss on this validation set?
this would also require a way to checkpoint the whole model in order to save the best model configuration against the dev set to be used against the test set at the end of training.

Please let me know if we can add that.
1- dev set support with early stopping criteria
2- checkpointing logic, to save and load the model.

One last question: Can you provide a way to train only the base model (BERT-based) without the GAN components, so that I take these numbers as a reference. So I can tell that the BERT-based model only got the following results against these results. And when we added GAN, we got these results.