Hi, I was very impressed after seeing retrieval performance measurements of LLM using

I was wondering about the evaluation method about llmtest_needleinahaystack HOT 2 CLOSED

gkamradt commented on July 23, 2024

I was wondering about the evaluation method

from llmtest_needleinahaystack.

Comments (2)

gkamradt commented on July 23, 2024

Hey! Thanks for the question and research. TLDR: I think it is up to the test designer to decide what they want to look for.

Just like in RAG there are multiple accuracy metrics I believe that the same goes for NIAH.

In your response, it got the answer, but it included a lot of fluff too. The fluff that was included seems to be slightly off topic.

I agree it doesn't feel like a 1, but that is a subjective opinion.

I think the route forward is allowing users more control over which evaluator is used (and the grading criteria) to allow them to make the test they want.

from llmtest_needleinahaystack.

gauss5930 commented on July 23, 2024

Thank you for answering my question! I asked this question out of curiosity because there was a slight difference between what I thought of NIAH's evaluation criteria and the actual NIAH evaluation criteria. Thanks to Kamradt's answer, I was able to resolve my curiosity.

As you mentioned, the model response contains unrelated content that is slightly different from the purpose of the question, so it seems difficult to say that it is a perfect retrieval. To conduct experiments more appropriately to NIAH's evaluation criteria, it would be a good idea to use methods such as prompt engineering!

Thank you for releasing a useful benchmark for measuring LLM performance like the NIAH benchmark!

from llmtest_needleinahaystack.

Recommend Projects

I was wondering about the evaluation method about llmtest_needleinahaystack HOT 2 CLOSED

Comments (2)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent