Giter Club home page Giter Club logo

emo's Introduction

Hi there 👋

😉 I am Siyu Ren.

🎓 I am a fifth-year Ph.D. student majoring in CS at Shanghai Jiao Tong University.

🔎 Currently, my research interest includes Efficient Methods for NLP/Large Language Models and techniques around mechanistic understanding of LLMs.

📚 For my academic publications, please refer to https://drsy.github.io/.

DRSY's github stats主要使用语言

profile

emo's People

Contributors

drsy avatar eltociear avatar xufangzhi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

emo's Issues

Is it possible for you to cite "Learning with a Wasserstein Loss" in the camera-ready version?

Dear authors:
Thank you for your great work advancing the frontier of language model training.

Learning with a Wasserstein Loss (arxiv)
Learning with a Wasserstein Loss (NeurIPS 2015)

This paper represents the first use of the Wasserstein distance (i.e. earth mover’s distance) as a loss for supervised learning.
It considers the problem of learning to predict a non-negative measure over a finite set.
Language Models are essentially learning to predict a non-negative measure over a finite set.

In summary, Learning with a Wasserstein Loss has used a similar method to solve a similar problem. The core idea and core technique are the same, the problem is the same in principle.

Nonetheless, your great work proposes a tractable and effective upper bound for EMD and verifies EMD's effectiveness in language model fine-tuning, which is nontrivial and impressive.

Could you please, by any chance, cite Learning with a Wasserstein Loss in the camera-ready version paper? I believe that will help readers to find related works.

本来想在OpenReview发个Public Comment,但现在不能发了😂

emo_loss 会变成nan

您好,请教一下。
我再qwen 上使用这个loss,进行的是下游任务的微调,原本的loss 是 mle。
使用的是: EMO/language_modeling/gpt2.py 中的emo 。

他在训练的过程中会出现这样的现象:
image

以及:

image

请问这个情况大概是为什么?

关于emo loss的疑问

gt_q = (q_grad * one_hot).detach()
q_final = q_grad - gt_q

理论上改完之后loss的值应该不变,但现在使用公式$1 − ( EQ_{\theta})^TEP$算出来修改之前和之后的结果不同,又验证了公式$Q_{\theta}^TCP$,修改前后相同,应该是$Q_{\theta}$本身发生了变化导致$Q_{\theta}^TP$的值不等于1了,所以
emo_loss = (1 - torch.sum(p_contextual_repr * q_contextual_repr, dim=-1))是不是需要再减去一项 torch.sum(gt_q * stable_onehot, dim=-1)?

另外,在ground-truth token处因为$1-E^TE$对角线一直为0,梯度本身也会乘以这个值,相当于ground-truth token处的梯度为0,也不需要更新梯度,这两行代码是否还有存在的必要?

Is there a good way to initialize cost matrix when pretraining from-scratch?

Hi, Thank you for sharing this work.

I wonder the applicability of this method for pretraining from-scratch.
To do that, one needs to build a good initial cost matrix $\mathbb{C} = [C(v_i, v_j)]_{i,j}$. I can come up with a trivial uniform initialization for this. Is there a better way than this?

If you ever tried this method for pretraining from scratch, it would be really helpful to know the results. Thanks!

是否会导致模式坍塌?

你好,这篇论文很有启发性,从代码可以看出相当于加入了修正的cos loss。而这个是否会导致模型倾向于给相同语义的token类似的权重,比如前面的字是“今天是周三,明天是”, 后面会倾向于给出“周四”,而“星期四”和“周四”的语义类似,经过emo loss训练的模型给出“星期四”的logit和“周四”接近一致,这样语义相似且logit高的token占了大部分概率,导致模型倾向于输出单一的风格,是否会存在这个问题呢?
@DRSY

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.