Related Issues (20)
- Models testing themselves will always be biased. HOT 1
- [Feature] support arena-hard in opencompass HOT 2
- Bug in get_battles_from_judgment HOT 1
- [Discussion] Methodology for bootstrapping with replacement to obtain separability confidence intervals HOT 2
- Discrepancy in Scores When Switching GPT Model Versions HOT 6
- Multi-threads generation support ? HOT 1
- [Bug] Temperature is always `0.0` HOT 1
- [Q] About hosting `arena-hard-v0.1/question.json` in the Hugging Face Hub HOT 2
- Local model as a judge HOT 5
- Bradley-Terry model HOT 1
- How to add new models to the leaderboard? HOT 2
- Is there any plan to share the full dataset (200k prompts) with the "number of hardness criteria met" label ? I think it would be quite useful to the community HOT 1
- STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. HOT 6
- Can you add deepseek-coder-v2? HOT 1
- Markdown Rendering Issue HOT 1
- Majority of questions are coding questions! HOT 2
- Only support baseline=True and pairwise=True? HOT 1
- Evaluate local models HOT 2
- CI results different for same model answer copy HOT 2
- Allow to set generation sampling parameters HOT 11
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from arena-hard-auto.