Comments (4)
We recommend using English or English & Chinese for issues so that we could have broader discussion.
from opencompass.
We recommend using English or English & Chinese for issues so that we could have broader discussion.
Okay, from now on, I will ask my questions in English.
from opencompass.
Given a model and an input text sequence, perplexity measures how likely the model is to generate the input text sequence.
As a metric, it can be used to evaluate how well the model has learned the distribution of the text it was trained on.
We are not predicting anything here but measuring the likelihood of generating the sequence, which is the literal meaning of perplexity
.
You could refer to https://huggingface.co/docs/transformers/perplexity for more details.
from opencompass.
Given a model and an input text sequence, perplexity measures how likely the model is to generate the input text sequence.
As a metric, it can be used to evaluate how well the model has learned the distribution of the text it was trained on.We are not predicting anything here but measuring the likelihood of generating the sequence, which is the literal meaning of
perplexity
. You could refer to https://huggingface.co/docs/transformers/perplexity for more details.
OK, I got it,thank you very much.
from opencompass.
Related Issues (20)
- [Bug] 不支持python3.10以上安装 HOT 1
- [Bug] ValueError: not enough values to unpack HOT 1
- [Feature] Difficulty in Evaluating Custom Models with OpenCompass HOT 1
- meta-llama/Meta-Llama-3-8B-Instruct evaluated results is not consistent with hugging face's official results HOT 2
- [Bug] 增加数据集时失败 HOT 1
- [Feature] Add WildBench HOT 1
- [Bug] 大佬们,这个函数好像写的有问题,只能解析出来[BEGIN]到[DONE]中间的代码,然而基座模型最先输出的代码不是以[BEGIN]开头的。 HOT 1
- [Bug] llama3 8b 基座模型在ARC-C PPL数据集上的评估,accuracy只有41,不正常
- [Feature] Cached Dataset load
- [Bug] Find `scikit-learn` version conflict in `requirements/runtime.txt` and `requirements/extra.txt` HOT 1
- [Feature] 没有找到“subjective”中的compassarena、compassbench、creationbench等数据 HOT 1
- [Feature] 为啥我开始评测一直卡在这里 HOT 1
- [Bug] When I attempted to perform the agent evaluation, the console returned an error: "AttributeError: 'OpenAI' object has no attribute 'chat'". HOT 1
- [Bug] 在eval_qwen_7b这个base模型评估中,为何出现leaderboard.qwen中的gen的数据集,基础模型不应该是ppl的评估方式吗 HOT 3
- 有人配置过mmlu_pro数据集么?求分享代码~
- [Feature] Improve the Documentation for Subjective Evaluation
- [Bug] 按文档使用gpt3.5 测试数据集报错 HOT 1
- [Bug] When evaluation, {prediction} in origin_prompt is not replaced with model's response? HOT 2
- [Bug] qwen1.5-7B base 版本 在math测试集下得分仅有2.6分左右 远低于 官方评测给出的结果
- [Bug] flames的flames-scorer无法正确加载 HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from opencompass.