Comments (4)
The discrepancy arises from variations in the prompts and inherent randomness. When you perform inference using vLLM or LMDeploy, the performance might exhibit a significant fluctuation, typically within a range of 3 to 5 points.
from opencompass.
Thanks for your reply.
Do you have any suggestions on how to stabilize the performance during evaluation or achieve relatively higher results?
I guess that LLM developers might report higher results in their findings. If possible, I'd like to explore how they achieve that.
from opencompass.
Prompt engineering is an effective approach; you can explore different prompts. OC also offers several alternative prompts, so feel free to try them out.
Like: https://github.com/open-compass/opencompass/blob/main/configs/datasets/humaneval/humaneval_openai_sample_evals_gen_159614.py
from opencompass.
Thanks for your reply.
I will have a try regarding the prompt engineering.
from opencompass.
Related Issues (20)
- [Feature] opencompass有计划支持CS-Bench数据集评测吗? HOT 3
- [Bug] Webchat二维码过期了 HOT 1
- 关于大语言模型公开学术榜单 HOT 1
- [Feature] 调用本地模型服务,如何修改配置 HOT 2
- [Bug] accuracy error HOT 2
- longbench_vcsum找不到data_files HOT 1
- [Bug] commonsenseqa_gen 数据集测试时,显示OSError: We couldn't connect to 'https://huggingface.co' to load this file HOT 2
- [Bug] 评测Lawbench数据集时,分数会出现-0的情况 HOT 1
- [Bug] MBPP evaluator cannot extract the correct anwser HOT 4
- [Bug] The conflict between `datasets>=2.20.0` and `pandas<2.0.0` HOT 1
- [Bug] No predictions found HOT 4
- [Bug] qwen2_1_5b/API is encountering errors when evaluating the civil_comments dataset. HOT 1
- [Bug] Getting 0 accuracy for Llamma3-8b and qwen2-7b models HOT 3
- [Bug] I have a question: why doesn't the civilcomments dataset support API evaluation?
- [Bug] query_per_second does not work HOT 1
- [Bug] 0.3.0版本使用tools/case_analyzer.py时,TypeError: 'CMMLUDataset' object is not subscriptable
- [Bug] The mmengine does not support python3.11 or above HOT 1
- [Bug] 主观评估运行时报错
- [Feature] Unsupported Model Type in vLLM/LMDeploy acceleration HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from opencompass.