hiyouga / dual-contrastive-learning Goto Github PK

View Code? Open in Web Editor NEW

147.0 147.0 25.0 5.95 MB

Code for our paper "Dual Contrastive Learning: Text Classification via Label-Aware Data Augmentation"

Home Page: https://arxiv.org/abs/2201.08702

License: MIT License

Python 100.00%

bert contrastive-learning deep-learning natural-language-processing neural-networks text-classification transformers

dual-contrastive-learning's Introduction

Yaowei Zheng

Ph.D. Student

Beihang University

37 Xueyuan Rd., Haidian Dist.

Beijing, China, 100191

Education

2022.09-Present School of Computer Science and Engineering, Beihang University Ph.D.
2017.09-2021.06 Shen Yuan Honors College, Beihang University B.Eng.

Research Interests

Natural Language Processing
Large Language Models

Skills

Natural Language: Chinese (Native); English (CET-6); Japanese (JLPT-N2)
Programming Language: Python; C++; Java; JavaScript; PHP; Go; Verilog HDL; MATLAB
Typesetting Language: LaTeX; Markdown
Programming Framework: PyTorch; TensorFlow

Publications (Google Scholar, DBLP, Semantic Scholar, ORCID)

Yaowei Zheng, Richong Zhang, Junhao Zhang, Yanhan Ye, Zheyan Luo and Yongqiang Ma: LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models. Preprint. [arXiv]
Junfan Chen, Richong Zhang, Yaowei Zheng, Qianben Chen, Chunming Hu and Yongyi Mao: DualCL: Principled Supervised Contrastive Learning as Mutual Information Maximization for Text Classification. WWW2024. [DOI][arXiv][Code]
Richong Zhang, Qianben Chen, Yaowei Zheng, Samuel Mensah and Yongyi Mao: Aspect-level Sentiment Analysis via a Syntax-based Neural Network. IEEE/ACM Transactions on Audio, Speech, and Language Processing. [DOI]
Xiaohui Guo, Richong Zhang, Yaowei Zheng and Yongyi Mao: Robust Regularization with Adversarial Labelling of Perturbed Samples. IJCAI2021. [DOI][arXiv]
Yaowei Zheng, Richong Zhang and Yongyi Mao: Regularizing Neural Networks via Adversarial Model Perturbation. CVPR2021. [DOI][arXiv][Code][Poster][Video]
Yaowei Zheng, Richong Zhang, Suyuchen Wang, Samuel Mensah and Yongyi Mao: Anchored Model Transfer and Soft Instance Transfer for Cross-Task Cross-Domain Learning: A Study Through Aspect-Level Sentiment Classification. WWW2020. [DOI]
Yaowei Zheng, Richong Zhang, Samuel Mensah and Yongyi Mao: Replicate, Walk, and Stop on Syntax: an Effective Neural Network Model for Aspect-Level Sentiment Classification. AAAI2020. [DOI][Code]

Academic Service

Conference Reviewer: AAAI, EMNLP, NAACL, COLING
Journal Reviewer: Neural Computation

dual-contrastive-learning's People

Contributors

Stargazers

Watchers

dual-contrastive-learning's Issues

Why did the Dual gradient collapse on my own Chinese dataset?

Dear author, your framework is valid on the English dataset, but when I used dual-loss deficiency on my Chinese dataset, gradient collapse occurred. My Chinese label is two characters, is it related to this? Or do I have to adjust somewhere? Thank you very much. Look forward to hearing from you soon

about chinese dataset

hello author
If the model runs on the Chinese dataset，What parts need to be modified and What should be paid attention to？
thank you！

Issue regarding to the evaluation procedure

Hi thank you for your exciting work. I've noticed a potential problem regarding to the evaluation procedure. To best of my knowledge currently the best model is selected based on the test data. However this is not desirable since in real conditions it is not possible to chose the model based on the testing data. One probable issue rather than getting comparable performances is possibility for overfitting. Altough test data is not used for gradient updates, model is chosen based on the best performing test data. Therefore, we have no way of knowing if the proposed model is just better at leaking the information via model selection. One extreme case is if you randomly guess enough times on test set you can get 100%. That's generally why the validation split is used in prior works 1.

There is no dataset

Can you provide the dataset please?

could you please upload the raw dataset?

Problem when saving the model !

Hi, thank you for your exciting work.
When I want to save model get this error:
Transformer object has no "save_pretrained"

How to save the model after we train it with your code?
In fact, I want to save the model and upload it in Huggingface so that I can load and use it later.

For the labels containing multiple words, How to take the mean-pooling?

For the labels containing multiple words, How to take the mean-pooling? Is there this code in files?

Some logical problems

Using the calculation comparison loss method in the source code, the calculated loss may be negative, My input is（batchsizedim, batchsizeclass_num*dim, class_num）,And Lz and Lθ may be negative at the same time

Some questions with baselines

Your work is very good and effective. But I have some questions about the baseline approach. I tried different hyperparameters to adjust supervised contrastivelearning or unsupervised contrastive learning to fine-tune BERT, and then to classify. But I've never been able to do anything better than just Cross-Entropy. I wonder what I didn't take into account? I've seen a lot of papers that contrastive learning can help improve classification results, but here I always get the opposite. Maybe I want to know the hyperparameters you set when you ran the comparison.

FileNotFoundError: [Errno 2] No such file or directory: './datasets_manual/TREC_Train.json'

I just ran python main_polarity.py and I got the following error and don't know how to proceed.

Why mess with the order of tags when using DualCL？

It's a great job. But I have a question, why do you use DualCL to perform out-of-order operation specifically for labels? This operation will not change the real label in binary classification, but it will change the real label in multi-classification. I don't understand the significance of this.

In fact, I followed this setup and then trained it on my own dataset, a binery classification task like dialogue intention recognition, and trained it for 30 epochs using Roberta, with very poor results, isn't DualCl suitable for this kind of task? I hope you can help me to point out my misunderstanding.

tSNE plot visualization

Hi there!
I think these code and paper awesome!
when I run this code, I can see increasing accuracy.

but I want to see moving that the class representation and sentence feature representation too.
Could you please upload the tSNE visualization code to github as well?

have a good day.