crux82 / ganbert-pytorch Goto Github PK
View Code? Open in Web Editor NEWEnhancing the BERT training with Semi-supervised Generative Adversarial Networks in Pytorch/HuggingFace
License: Apache License 2.0
Enhancing the BERT training with Semi-supervised Generative Adversarial Networks in Pytorch/HuggingFace
License: Apache License 2.0
I am trying to run the notebook against a different dataset (QADI) and changed the data format to the same format specified in the example dataset
I just changed the get_qc_examples function to directly read the line and class from the tab-separated file:
def get_qc_examples(input_file):
"""Creates examples for the training and dev sets."""
examples = []
with open(input_file, 'r') as f:
contents = f.read()
file_as_list = contents.splitlines()
for line in file_as_list[1:]:
# split = line.split(" ")
# question = ' '.join(split[1:])
# text_a = question
# inn_split = split[0].split(":")
# label = inn_split[0] + "_" + inn_split[1]
split = line.split("\t")
question = split[1]
text_a = question
label = split[0]
examples.append((text_a, label))
f.close()
return examples
I also came with some unlabeled data, set them to the same file format with the label set as UNK_UNK
, changed the labels list to:
label_list = ['UNK_UNK','Algeria', 'Bahrain', 'Egypt', 'Iraq', 'Jordan', 'Kuwait', 'Lebanon', 'Libya', 'Morocco', 'Oman', 'Palestine', 'Qatar', 'Saudi_Arabia', 'Sudan', 'Syria', 'Tunisia', 'United_Arab_Emirates', 'Yemen']
but I am getting an error at the end of the epoch (after training the whole epoch):
RuntimeError Traceback (most recent call last)
<ipython-input-11-3e8566791cab> in <module>()
111 # so the loss evaluated for unlabeled data is ignored (masked)
112 label2one_hot = torch.nn.functional.one_hot(b_labels, len(label_list))
--> 113 per_example_loss = -torch.sum(label2one_hot * log_probs, dim=-1)
114 per_example_loss = torch.masked_select(per_example_loss, b_label_mask.to(device))
115 labeled_example_count = per_example_loss.type(torch.float32).numel()
RuntimeError: The size of tensor a (63) must match the size of tensor b (64) at non-singleton dimension 0
Am I doing something wrong? can you help me tackle this issue?
Thank you for this Pytorch implementation! I'm quite new with GANs and curious if it's possible to text representation of the generated examples. Thought this would be possible since GANs when applied in computer vision tasks can generate images (i.e human faces). Hoping to hear from you soon :)
It seems there is a tiny mismatch between the paper formula and the code on feature matching loss:
This is not exactly an L2-norm:
g_feat_reg = torch.mean(torch.pow(torch.mean(D_real_features, dim=0) - torch.mean(D_fake_features, dim=0), 2))
It should be like:
g_feat_reg = torch.sum(torch.pow(torch.mean(D_real_features, dim=0) - torch.mean(D_fake_features, dim=0), 2))
But I guess it doesn't change the outcome other than affecting the learning rates.
Hi,
First, thank for your great contribution. From the paper, I know that unlabeled examples improve the inner representation or generalize the representation. But I wonder that the number of them will improve the model accuracy or not. In the experiment, you use 5343 unlabeled examples and they are relatively big compared to labeled examples and test examples. Have you tried any experiment with this?
Hello,
Thanks for sharing your code. I am trying to understand the GAN-BERT paper. I faced an issue in the code that the reason is not clear to me.
In the python notebook, we have a dataset with 50 classes, in the "label_list" the name of each class is given, also "UNK_UNK" is added for unlabeled data. Therefore wherever len(label_list) is used, it is equal to 51.
On the other side, the discriminator's output is the number of classes +1, since discriminator is not only should discriminate between real and the fake examples, also need to classify real ones. Therefore, if an example is fake, it should be classified as class 51, otherwise, the discriminator assigns a class to the example.
Here is what I do not understand, when we are going to initialize the discriminator, we uselen(label_list) = 51, as the input argument for the number of classes. inside the discriminator also add +1 to the 51, so the discriminator output is 52.
Then, when the supervised learning loss is being calculated in the training phase, we take the logits = D_real_logits[:,0:-1] which means all the output logit except the last one (which is related to the example being fake or real and the last logit is used to calculate the unsupervised loss). In this code, logits = D_real_logits[:,0:-1] length is 51 while we have 50 classes. Also, in the evaluation time, when the test prediction and loss on the test set are calculated, filtered_logits takes all the output logit except the last one has the length of 51 while we have 50 labels. I wondered if there is any problem with the code or I did not fully understand the paper?
Thanks!
I have been trying to test and use the model. I am unable to create a function out of the model to try and generate a prediction for real/new/live data. I have made some tweaks to the model but haven't been able to really use it for testing.
I did the same evaluation after every epoch for test set, but now is for train set. However the training accuracy is decreased while the train loss is increased, whap happened here?
`# ========================================
# TEST ON THE TRAINING DATASET
# ========================================
# After the completion of each training epoch, measure our performance on
# our training set.
print("")
print("Running Training...")
t1 = time.time()
# Tracking variables for train set
total_train_loss = 0
all_train_preds = []
all_train_labels_ids = []
nll_train_loss = torch.nn.CrossEntropyLoss(ignore_index=-1) #loss
# Evaluate data for one epoch
for batch in train_dataloader:
# Unpack this training batch from our dataloader.
b_input_ids = batch[0].to(device)
b_input_mask = batch[1].to(device)
b_labels = batch[2].to(device)
# Tell pytorch not to bother with constructing the compute graph during
# the forward pass, since this is only needed for backprop (training).
with torch.no_grad():
model_outputs = transformer(b_input_ids, attention_mask=b_input_mask)
hidden_states = model_outputs[-1]
_, logits, probs = discriminator(hidden_states)
###log_probs = F.log_softmax(probs[:,1:], dim=-1)
filtered_logits = logits[:,0:-1]
# Accumulate the test loss.
total_train_loss += nll_train_loss(filtered_logits, b_labels)
# Accumulate the predictions and the input labels
_, preds = torch.max(filtered_logits, 1)
all_train_preds += preds.detach().cpu()
all_train_labels_ids += b_labels.detach().cpu()
# Report the final accuracy for this validation run.
all_train_preds = torch.stack(all_train_preds).numpy()
all_train_labels_ids = torch.stack(all_train_labels_ids).numpy()
train_accuracy = np.sum(all_train_preds == all_train_labels_ids) / len(all_train_preds)
print(" Train Accuracy: {0:.3f}".format(train_accuracy))
# Calculate the average loss over all of the batches.
avg_train_loss = total_train_loss / len(train_dataloader)
avg_train_loss = avg_train_loss.item()
# Measure how long the validation run took.
train_time = format_time(time.time() - t1)
print(" Train loss: {0:.3f}".format(avg_train_loss))
print(" Train took: {:}".format(train_time))`
Hi thanks for the great demo, i am confused how to use the trained model however..
I am very happy to see such an excellent work. I am very interested in your article.
But I use your code, without any changes, the running results always change, and it is difficult to have an accuracy of 65.4%, why is this?
Thank you very much
Hi, is there a way to save and deploy the Ganbert model for production?
Hi,
Thanks for the amazing research and the code implementation, is there any way to use this package for multi label intent classification ?
Hi, this is super useful thanks.
Is there a way to save the best model during the training? I tried to follow your AILC_Lectures_2021_Training_BERT_based_models_in_few_lines_of_code.ipynb, but it is not working here as GAN has generator, discriminator.
Thanks for the excellent project! I'm new to GAN and was trying to reproduce the results(reported in the paper) for the 20News dataset. However, my testing accuracy stuck at about 5.2% no matter whether I used a 1% labelled or 10% labelled training dataset. (I tried 1%, 2%, 10%-50% but almost got same results.) Also, the generator training loss is extremely big like up to 1343123137304389. I used my own dataset with different ratios of labelled datasets and the highest accuracy I got is only 38%.
Just wondering does anyone was able to reproduce the results or perhaps knows what is going wrong?
I trained 20News dataset for 15 epochs, lr= 5e-6, dropout = 0.1, noise_size = 100, max_seq_length = 256, batch size = 64.
Appreciate your help!
Hi,
How are you?
Thanks for this great work. I have one question: I ran the pytorch code and the generator doesn't work. It means: only model learned and saved, I used generator to generate the random noise and discriminator classify label. However, it always classify to one single or very few labels. I tried so many ways: change noise to normal distribution, add some more layers , but it looks like generator doesn't learn ...
Can please give some guidance or workable code? thanks a lot!
Look forward to hearing from you soon!
Is it possible to calculate the F1-score of the model?
As far as I can tell currently only accuracy and loss is reported for the completed model.
The paper says "G inputs consist of noise vectors drawn from a normal distribution N(0, 1)" but in this implementation seems to based on uniform random inputs. Perhaps they have similar results :)
Is there any way to use an electra bert model with this?
I see you only evaluate the test set in each epoch, can we add a validation set, with early stopping criteria based on the results/loss on this validation set?
this would also require a way to checkpoint the whole model in order to save the best model configuration against the dev set to be used against the test set at the end of training.
Please let me know if we can add that.
1- dev set support with early stopping criteria
2- checkpointing logic, to save and load the model.
One last question: Can you provide a way to train only the base model (BERT-based) without the GAN components, so that I take these numbers as a reference. So I can tell that the BERT-based model only got the following results against these results. And when we added GAN, we got these results.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.