Giter Club home page Giter Club logo

dual-contrastive-learning's Introduction

stat

hiyouga

Yaowei Zheng

Ph.D. Student

Beihang University

37 Xueyuan Rd., Haidian Dist.

Beijing, China, 100191

Education

  • 2022.09-Present School of Computer Science and Engineering, Beihang University Ph.D.
  • 2017.09-2021.06 Shen Yuan Honors College, Beihang University B.Eng.

Research Interests

  • Natural Language Processing
  • Large Language Models

Skills

  • Natural Language: Chinese (Native); English (CET-6); Japanese (JLPT-N2)
  • Programming Language: Python; C++; Java; JavaScript; PHP; Go; Verilog HDL; MATLAB
  • Typesetting Language: LaTeX; Markdown
  • Programming Framework: PyTorch; TensorFlow
  1. Yaowei Zheng, Richong Zhang, Junhao Zhang, Yanhan Ye, Zheyan Luo and Yongqiang Ma: LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models. Preprint. [arXiv]
  2. Junfan Chen, Richong Zhang, Yaowei Zheng, Qianben Chen, Chunming Hu and Yongyi Mao: DualCL: Principled Supervised Contrastive Learning as Mutual Information Maximization for Text Classification. WWW2024. [DOI][arXiv][Code]
  3. Richong Zhang, Qianben Chen, Yaowei Zheng, Samuel Mensah and Yongyi Mao: Aspect-level Sentiment Analysis via a Syntax-based Neural Network. IEEE/ACM Transactions on Audio, Speech, and Language Processing. [DOI]
  4. Xiaohui Guo, Richong Zhang, Yaowei Zheng and Yongyi Mao: Robust Regularization with Adversarial Labelling of Perturbed Samples. IJCAI2021. [DOI][arXiv]
  5. Yaowei Zheng, Richong Zhang and Yongyi Mao: Regularizing Neural Networks via Adversarial Model Perturbation. CVPR2021. [DOI][arXiv][Code][Poster][Video]
  6. Yaowei Zheng, Richong Zhang, Suyuchen Wang, Samuel Mensah and Yongyi Mao: Anchored Model Transfer and Soft Instance Transfer for Cross-Task Cross-Domain Learning: A Study Through Aspect-Level Sentiment Classification. WWW2020. [DOI]
  7. Yaowei Zheng, Richong Zhang, Samuel Mensah and Yongyi Mao: Replicate, Walk, and Stop on Syntax: an Effective Neural Network Model for Aspect-Level Sentiment Classification. AAAI2020. [DOI][Code]

Academic Service

  • Conference Reviewer: AAAI, EMNLP, NAACL, COLING
  • Journal Reviewer: Neural Computation

dual-contrastive-learning's People

Contributors

chenqianben avatar hiyouga avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

dual-contrastive-learning's Issues

Why did the Dual gradient collapse on my own Chinese dataset?

Dear author, your framework is valid on the English dataset, but when I used dual-loss deficiency on my Chinese dataset, gradient collapse occurred. My Chinese label is two characters, is it related to this? Or do I have to adjust somewhere? Thank you very much. Look forward to hearing from you soon

about chinese dataset

hello author
If the model runs on the Chinese dataset,What parts need to be modified and What should be paid attention to?
thank you!

Issue regarding to the evaluation procedure

Hi thank you for your exciting work. I've noticed a potential problem regarding to the evaluation procedure. To best of my knowledge currently the best model is selected based on the test data. However this is not desirable since in real conditions it is not possible to chose the model based on the testing data. One probable issue rather than getting comparable performances is possibility for overfitting. Altough test data is not used for gradient updates, model is chosen based on the best performing test data. Therefore, we have no way of knowing if the proposed model is just better at leaking the information via model selection. One extreme case is if you randomly guess enough times on test set you can get 100%. That's generally why the validation split is used in prior works 1.

Problem when saving the model !

Hi, thank you for your exciting work.
When I want to save model get this error:
Transformer object has no "save_pretrained"

How to save the model after we train it with your code?
In fact, I want to save the model and upload it in Huggingface so that I can load and use it later.

Some logical problems

Using the calculation comparison loss method in the source code, the calculated loss may be negative, My input is(batchsizedim, batchsizeclass_num*dim, class_num),And Lz and Lθ may be negative at the same time

Some questions with baselines

Your work is very good and effective. But I have some questions about the baseline approach. I tried different hyperparameters to adjust supervised contrastivelearning or unsupervised contrastive learning to fine-tune BERT, and then to classify. But I've never been able to do anything better than just Cross-Entropy. I wonder what I didn't take into account? I've seen a lot of papers that contrastive learning can help improve classification results, but here I always get the opposite. Maybe I want to know the hyperparameters you set when you ran the comparison.

Why mess with the order of tags when using DualCL?

It's a great job. But I have a question, why do you use DualCL to perform out-of-order operation specifically for labels? This operation will not change the real label in binary classification, but it will change the real label in multi-classification. I don't understand the significance of this.

In fact, I followed this setup and then trained it on my own dataset, a binery classification task like dialogue intention recognition, and trained it for 30 epochs using Roberta, with very poor results, isn't DualCl suitable for this kind of task? I hope you can help me to point out my misunderstanding.

tSNE plot visualization

Hi there!
I think these code and paper awesome!
when I run this code, I can see increasing accuracy.

but I want to see moving that the class representation and sentence feature representation too.
Could you please upload the tSNE visualization code to github as well?

have a good day.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.