Comments (3)
but, i use python tools/prompt_viewer.py configs/eval_test_comsenseqa.py -n -a
from mmengine.config import read_base
from opencompass.partitioners import SizePartitioner, NaivePartitioner
from opencompass.runners import LocalRunner
from opencompass.tasks import OpenICLInferTask, OpenICLEvalTask
with read_base():
from .datasets.commonsenseqa.commonsenseqa_ppl_5545e2 import commonsenseqa_datasets
len of self.index_ds in commonsenseqa : 9741
len of self.test_ds in commonsenseqa : 1221
but the len of ice_idx_list is 1221. It is equal to len of self.test_ds.
from opencompass.
There is a partitioner machenism during launching tasks, that would explain the differences between 611 and 1221. But I'm not sure why there is a out of bound error...
from opencompass.
It‘s my fault. I saved the intermediate results of the index as binary files, but did not consider partitioner machenism. Thanks.
from opencompass.
Related Issues (20)
- meta-llama/Meta-Llama-3-8B-Instruct evaluated results is not consistent with hugging face's official results HOT 2
- [Bug] 增加数据集时失败 HOT 1
- [Feature] Add WildBench HOT 1
- [Bug] 大佬们,这个函数好像写的有问题,只能解析出来[BEGIN]到[DONE]中间的代码,然而基座模型最先输出的代码不是以[BEGIN]开头的。 HOT 1
- [Bug] llama3 8b 基座模型在ARC-C PPL数据集上的评估,accuracy只有41,不正常
- [Feature] Cached Dataset load
- [Bug] Find `scikit-learn` version conflict in `requirements/runtime.txt` and `requirements/extra.txt` HOT 1
- [Feature] 没有找到“subjective”中的compassarena、compassbench、creationbench等数据 HOT 1
- [Feature] 为啥我开始评测一直卡在这里 HOT 1
- [Bug] When I attempted to perform the agent evaluation, the console returned an error: "AttributeError: 'OpenAI' object has no attribute 'chat'". HOT 1
- [Bug] 在eval_qwen_7b这个base模型评估中,为何出现leaderboard.qwen中的gen的数据集,基础模型不应该是ppl的评估方式吗 HOT 3
- 有人配置过mmlu_pro数据集么?求分享代码~
- [Feature] Improve the Documentation for Subjective Evaluation
- [Bug] 按文档使用gpt3.5 测试数据集报错 HOT 1
- [Bug] When evaluation, {prediction} in origin_prompt is not replaced with model's response? HOT 2
- [Bug] qwen1.5-7B base 版本 在math测试集下得分仅有2.6分左右 远低于 官方评测给出的结果
- [Bug] flames的flames-scorer无法正确加载 HOT 1
- [Bug] 多卡时,GPU7显存占用比其他卡多30G+ HOT 2
- [Bug] Test the ChemBench dataset
- [Feature] whats the difference between mbpp and deprecated_mbpp ? HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from opencompass.