kongds / prompt-bert Goto Github PK

View Code? Open in Web Editor NEW

325.0 325.0 31.0 61 KB

PromptBERT: Improving BERT Sentence Embeddings with Prompts

Python 96.30% Shell 3.70%

prompt-bert's People

Contributors

Stargazers

Watchers

prompt-bert's Issues

I'm curious to the meaning of `static token embeddings`

When reading the paper, I find there is a phrase that I cannot understand.

Can someone explain what is static token embeddings? Is it the output of word embedding layer of BERT?

p_mbv 是什么意思？

感谢大佬们的作品。

https://github.com/kongds/Prompt-BERT/blob/main/prompt_bert/models.py#L397

p_mbv 是什么意思？
self.p_mbv = torch.nn.Parameter(torch.zeros(10)) ，为何是10 ？

unsupervised contrastive learning

I understand that the concept of contrastive learning represents closer to each other and farther away.
When learning unsuperviced contrastive learning, do you use the template to bring the closest closer, but does it include the concept of further away?

Question of supervised train loss

Hello，
In the "model.py" , I finded the loss function of supervised training is CrossEntropyLoss. With hard negatives, why not select TripletMarginLoss?
Looking forward to your reply, thanks!

What if I augment the positive examles in promptbert?

I tired to aug the positive examples using back-translation with different prompt (the prompt used in unsupervised roberta) . Specifically, I got two views of the positive examples using back-translation. Then I feed them into promptbert, but I found that the avg spearman score for roberta base is only 75. (~79.2 for your paper). I also tired to only use the back-translation and I got an avg spearman score of 77. I am confused that why the prompt do not work for augmented positive data. Do you have some ideas?

I also found that in the supervised setting, using different prompt will hurt the performance. Does this means that your method only works for positive pairs with the same length? Thank you!

关于损失函数去除模板噪音

您好，我是一名初学者，但是看源代码的时候甚至找不到损失函数的定义，看论文对于去模板降噪也有点疑惑..请问下作者能贴一下论文里面的损失函数定义或者举一个更详细降噪的例子吗，感谢。

Transfer数据集SUBJ无法测试

您好，很棒的工作！
我在复现时遇到一点问题，再跑SUBJ这个数据集时会出现
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:84: operator(): block: [21,0,0], thread: [127,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
这个问题，想请问这个是由什么问题导致的呢，您遇到过类似的问题吗？

Any advice for the Chinese version?

Hi, I'm trying to adopt Prompt-Bert into the public Chinese dataset.

using prompt like following:
cls下面这句话"sent_0"对应的语义是mask。sep+

The spearman score is much worse than the rare SimCSE.

Is there any advice for the Chinese prompts?

how this "length" should be understood

I would like to ask how this "length" should be understood

Questions about the paper.

Hello,

I was very fortunate to read your paper, and the experimental results are exciting.
The paper mentions two observations:
Observation 1: Original BERT layers fail to improve the performance.
Observation 2: Embedding biases harm the sentence embeddings performance.
Based on your experimental results, these two phenomena do exist and can be improved. But I don't see a connection between these two observations and prompts.

How does prompt solve the bias problem?

Looking forward to your reply, thanks!

How to represent sentence in Template Denoising step?

Hi there,

I am recently rebuilding your work in fairseq. Your model is really impressive.

I am able to rebuild your results in Table 8, with different templates, I can get 78.41 scores on average (RoBERTa_base as backbone model).

However, when I try to reproduce your default method, which is different templates with denoising, the highest score I can get is 78.54 (RoBERTa_base as backbone model).

I tried using either 1) MASK token's representation to represent the template, or 2) cls token's representation to represent the template at the Template Denoising step.

Can you clarify which method you use as the template biases?

Many thanks!

关于训练阶段的一些问题

首先表示感谢，作者的工作很有启发。但有一些问题需要向您请教：

对于在 NLI 数据集进行有监督训练的 prompt-bert-base，你们设置的 max_sequnce_length 是 32 吗？这个值对于 NLI 数据集的部分句子而言过小，截断后可能造成语义损失。
我很好奇 models.py 中，labels 为什么要定义成如下形式

cos_sim = cls.sim(z1.unsqueeze(1), z2.unsqueeze(0))

loss_fct = nn.CrossEntropyLoss()
labels = torch.arange(cos_sim.size(0)).long().to(input_ids.device)

loss = loss_fct(cos_sim, labels)

期待您的回复，谢谢！

请问关于监督的promptbert的结果问题

我通过huggingface下载了监督的promptbert模型。
然后我发现在测试结果和论文中的效果展现不一致，语义相似度任务上差别较大，迁移任务上和论文一致，请问是什么原因

命令行

python evaluation.py --model_name_or_path result/sup-PromptBERT \
                                 --pooler avg \
                                 --mode test \ 
                                 --task_set full \
                                 --mask_embedding_sentence \
                                 --mask_embedding_sentence_template "*cls*_This_sentence_:_'_*sent_0*_'_means*mask*.*sep+*"

在V100卡上面测试的

模型的链接无法访问，是否失效了呢

如题

More details about the results of unfine-tuned version.

Hello, first congratualations on your great work.

I want to ask the details about the performance of the unfine-tuned prompt bert using manual prompt, cause I use "The sentence of "[X]" means [MASK]." only get an result of avg 59 on sts.

Did you use template denoising also on unfine-tuned version?

understanding the denoise step

Thanks for the paper.
I am going through the paper , am having doubt on this point.

To reduce the influence of the template itself on the sentence representation, we propose a novel way to denoise the template information. Given the sentence xi, we first calculate the corresponding sentence embeddings hi with a template. Then we calculate the template bias ˆ hi by directly feeding BERT with the template and the same template position ids. For example, if the xi has 5 tokens, then the position ids of template tokens after the [X] will be added by 5 to make sure the position ids of template are same. Finally, we can directly use the hi − ˆ hi as the denoised sentence representation. For the template denoising, more details can be found in Discussion.

here when u say sentence embeddings , does it mean representation corresponding to [MASK] token or what?
when u say directly feed to bert from which token ur consideriong the emedding ?

can u explain with one example how ur doing token positon ids adjuestement

thanks

Consult the meanings of some parameter

What are the meanings of these parameters? Can you explain them in detail？Thank you very much！
1.mask_embedding_sentence_delta
2.mask_embedding_sentence_org_mlp
3.mask_embedding_sentence_delta_freeze
4.mask_embedding_sentence_delta_no_position

how do i learn scratch

About ./run.sh bert-optiprompt or ./run.sh sup-roberta
How do i learn from scratch, readme is bash evaluation only.
After downloading the model via sh [unsup-bert|unsup-roberta|sup-berta], I can running ./run.sh sup-roberta.
I'd appreciate it if you could give me a hint. I'm fascinated by your thesis.

Standard deviation

Hi,

Thanks for your brilliant work!

I have a question about the results. I noticed that only unsupervised models are reported with standard deviation. Why supervised models are not reported with the standard deviation?

Also, would you mind to share about the random seeds that you used?

Thanks a lot!

RuntimeError: cosine_similarity requires both inputs to have the same sizes, but x1 has [128, 1, 768] and x2 has [1, 128, 768]

A little question about the training

Hi, thank u for your excellent work. Please forgive me for asking a simple question.
These days, I am training a unsup-bert model. I set random seed=0 and get a standard result with avg score 78.54, which is same as the paper reported (78.54±0.15), both the results are lower than your provided model (78.87).
So I wonder if you have updated the training strategy or provide the best model for us.

OSError: Can't load config for 'unsup-roberta'

It's my honor to read your article!
However, when I run the command bash eval_only.sh unsup-roberta, it raised error:

OSError: Can't load config for 'unsup-roberta'. Make sure that:

- 'unsup-roberta' is a correct model identifier listed on 'https://huggingface.co/models'

- or 'unsup-roberta' is the correct path to a directory containing a config.json file

and
requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/unsup-roberta/resolve/main/config.json

I don't know how to solve, so I came to ask for your help. I would really appreciate it!

Why adding periods during evaluation?

I noticed that you add periods during evaluation (in eval.py). I am not so sure if this operation is valid because I didn't find the same operation in SimCSE's evaluation code. I notice results under certain settings drop if I remove this operation.

Calculation of anisotropy

Hi, thanks for the code sharing. May I ask if the anisotropy calculation step is included in this repo. If not, could you please share more details about this part? e.g. dataset used (train/dev/test).

Thanks again.

A question about the paper

Hello, I have just read the excellent paper. And I have a question. I want to know if the "frequency" in Figure1 is calculated from the datasets where 「5.1Dataset」 mentioned?

Results

Hello, I download the "unsup-bert" model, and evaluate it. I only get 77.80 on avg.
Can you add an evaluate procedure in readme or a bash file ?

Invalid address about configs.

Hello,

I was fortunate to read your paper, and the experienmental results are really exicting.

However, when I am running the code you provided in GitHub, there are some issues happened. It reminded me that the confit.json which should be downloaded in "https://huggingface.co/result/unsup-bert_s0/resolve/main/config.json" is not found.

How could I solve this problem?

Looking forward to your reply, thanks!

中英文的文本匹配效果堪忧呀。。。

Relationships between anisotropy and removing biased tokens

First of all, thanks for your inspiring paper.

When I read the introduction part of your paper (I haven't read all yet), I wondered one thing.

In the paper, removing biased tokens shows improvements.
However, it doesn't really have any relationships with anisotropy?

I guess tokens which appear frequently in one sentence, have high probabilities that such tokens appear in other sentences.
So, I think removing such tokens makes sentence embeddings distinguished each other.
And finally, I expect it makes sentence embeddings have different directions and be isotropic.

My opinion can be wrong !
I am just curious of your opinion.

Thank you.

How to generate sentence embedding on customized data?

Hi Thanks for releasing the code. I would like to know how to generate sentence embedding on customized data?

Thanks

Template Denoising

The Template Denoising in the paper I understand is the first method. Have you tried the second noise reduction method?

Fine Tune on bert base uncase

Hi, there,

I am trying to use your script to train unsup prompt bert, however, I keep getting "CUDA OUT OF MOMORY" Error even though I set the batch size to be 32(I want to use a 12 GB GPU).

I am not familiar with your script, but I do have some experiences with SimCSE. Usually, if I specify GPU with os.environ["CUDA_DEVICE_ORDER"] in train.py, it will work. Not sure how this can be done with your script.
Any help would be appreciated.

question on fine-tune

Hi,
I met some questions that when I want to fine-tuned with the command ./run.sh unsup-bert 0 ,there is an error that OSError: Can't load config for 'result/unsup-bert_s0'. I want to know that if I need to download the model under the Results on STS Tasks title? and I am confused that Shouldn't these models be obtained after this command is executed? What should I put in the result directory？
thanks!

Questions about the length in the get_delta function?

Hi,

Thanks for releasing the code for this amazing paper.

I want to ask some questions about the length parameters of the get_delta function (this link).

Why the default value of the length is 50?
If I changed the maximum sequence length of input IDs from 32 to 256 in both training and evaluation (this link), does the default value still equal 50?
I encountered some errors in the Line 170 of the prompt_bert/model.py when I changed the default value of the maximum sequence length (32->256) and batch size (256->32).
Why the value of the parameter in the repeated function is 128?
https://github.com/kongds/Prompt-BERT/blob/main/prompt_bert/models.py#L292-L296

Thanks in advance.
Tien-Hong

加载中文模型有问题

大佬们好。
我改成本地加载 hfl-chinese-roberta-wwm-ext https://huggingface.co/hfl/chinese-roberta-wwm-ext 。
显示参数加载都有问题

应该改哪里，谢谢！

kongds / prompt-bert Goto Github PK

prompt-bert's People

Contributors

Stargazers

Watchers

Forkers

prompt-bert's Issues

Recommend Projects

Recommend Topics

Recommend Org