chen700564 / rgb Goto Github PK

View Code? Open in Web Editor NEW

251.0 251.0 22.0 11.77 MB

License: Other

Shell 1.18% Python 98.82%

rgb's People

Contributors

Stargazers

Watchers

Forkers

andysingal dongguanting fulayjan kisejin mayflower dmahan93

rgb's Issues

Some issues trying Rejection Rate

Hi! Congratulations for the amazing work you've done.

I'm trying to use RGB for testing my RAG pipeline, but I'm having the following issue with Rejection Rate: even if I only feed the model with the negative examples, it will still sometimes answer correctly, because it turns out some of the negative examples in fact have the correct answer...

For example:

Question: How much did Elon Musk bought Twitter?
Correct answer (according to RGB): [44 billion]
My model answer: Elon Musk bought Twitter at his original offer price of $54.20 a share, with a total cost of roughly $44 billion.

The model was expected to REJECT to answer this, since I only gave it negative examples, right? But in fact, there are a few negative documents that have this answer:

Oct 28, 2022 ... Elon Musk takes control of Twitter in $44bn deal · What next for Twitter under Elon Musk? · How the world's richest person bought Twitter · Who is\xa0...
After building a stake in Twitter at the start of the year, Mr Musk made his $44bn offer in April, a price tag that looked too high almost as soon as it was agreed. He...

Is it an expected behavior or am I doing something wrong?

thanks in advance

How you create the data

Your work is very excellent.
I would like to know how you create your data, for example, how the "en_fact.json" is created, I noticed that there are positive and negative samples, how these samples are created, is it created manually or just automatically.

Looking forward to receiving your reply.

Error during the evaluation of model

I'm trying to evaluate the vicuna-13b model on the data you've provided but I'm getting this error:

RAG's future

RAG is certainly promising, but do you think that RAG can be used by a company in general to address customer or product concerns?

rejection rate of chatGPT

in the article it says that gpt-3.5-turbo is used to measure the rejection rate. what explains this difference in results for chatGPT given that it is used as a reference?

Need openAI key for evaluating LLM?

I would like to thank you for the work you have done and encourage you to continue. I would like to know if an openAI key is required for the evaluation and if it is possible to evaluate models that have undergone quantization.

Unable to reproduce Counterfactual Robustness result with ChatGPT

Here's what I did:

Step1:
python evalue.py --dataset zh --noise_rate 0.0 --modelname chatgpt

Step2:
python fact_evalue.py --dataset zh --modelname chatgpt

I got file prediction_zh_chatgpt_temp0.7_noise0.0_passage5_correct0.0_result with content:

{
    "all_rate": 0.9473684210526315,
    "noise_rate": 0.0,
    "tt": 270,
    "nums": 285
}

And file prediction_zh_chatgpt_temp0.7_noise0.0_passage5_correct0.0_chatgptresult.json with content:

{
    "reject_rate": 0.0,
    "all_rate": 0.9385245901639344,
    "correct_rate": 0,
    "tt": 229,
    "rejecttt": 0,
    "correct_tt": 0,
    "nums": 244,
    "noise_rate": 0.0
}

I failed to see how this matches the results in the paper:

Any ideas?

confused by the calculation of accuracy

Is the calculation method for the accuracy metric in the paper consistent with the code in the repository? I'm a bit confused by this piece of code:

tt = 0
for i in results:
    label = i["label"]
    if noise_rate == 1 and label[0] == -1:
        tt += 1
    elif 0 not in label and 1 in label:
        tt += 1
print(tt/len(results))
scores = {
"all_rate": (tt)/len(results),
"noise_rate": noise_rate,
"tt":tt,
"nums": len(results),
}

Benchmarking on the zh.json dataset with a noise_rate of 0.2 implies that out of 1500 prompts, 300 are missing supplementary knowledge. The accuracy calculated from this code will be much lower than the value indicated in the paper's table. Am I misunderstanding something, or was this piece of code previously modified? Thank you!

chen700564 / rgb Goto Github PK

rgb's People

Contributors

Stargazers

Watchers

Forkers

rgb's Issues

Some issues trying Rejection Rate

How you create the data

Error during the evaluation of model

RAG's future

rejection rate of chatGPT

Need openAI key for evaluating LLM?

Unable to reproduce Counterfactual Robustness result with ChatGPT

confused by the calculation of accuracy

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent