Comments (3)
Hi! Evaluation is notoriously fickle and prompt sensitive.
To reproduce Meta's number, you would need to run the exact same setup (same batch size, same generation temperature, same few-shot samples in the exact same order, etc). If you want to try to reproduce results we get on the Open LLM Leaderboard, you can follow the steps in the About page, reproducibility section.
from transformers.
Hey! I don't think it is a difference in implementation as the model has been QUITE tested in the past years 😓
What is happening might be that the generation_config
does not include the same parameters? Or something with the prompt. We ran the evaluations on the openLLM leaderboard using transformers
as a backend and made sure that we were able to reproduce the results.
cc @clefourrier the lead of LLM Leaderboards!
from transformers.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
from transformers.
Related Issues (20)
- Running AutoTokenizer.from_pretrained with Mistral V3 is actually loading LlamaTokenizer HOT 5
- Idefics 2: shape mismatch: value tensor of shape [320, 4096] cannot be broadcast to indexing result of shape [0, 4096] HOT 2
- Problems when using SinkCache for model.generate() HOT 2
- https://github.com/VikParuchuri/surya can not convert model to onnx HOT 5
- AttributeError: module 'torch.utils._pytree' has no attribute 'register_pytree_node' during import in transformers
- Sinusoidal Position Embedding weights somehow get altered HOT 1
- MPS
- loss = 0 after first log with trainer API HOT 3
- LLM Cache HOT 2
- Add llama3-llava-next-8b to convert_llava_next_weights_to_hf.py
- MultiScaleDeformableAttentionFunction different results on different devices HOT 1
- SiqlipVisionModel does not support "device map= auto": no split modules`attribute HOT 5
- logging format HOT 1
- transformers unable to run whisper on MPS from version 4.40.0 onwards HOT 1
- Dropout sync across GPUs causes major performance drops HOT 5
- When tranining the RWKV, it report "backward error" HOT 1
- Control flow issue with symbolic_trace when using inputs_embeds in LlamaForCausalLM HOT 4
- Question about LlavaProcessor HOT 2
- Segmentation fault python3 when attempting T5ForConditionalGeneration.from_pretrained("t5-small") HOT 5
- use_cache=False makes a huge difference in Paligemma
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from transformers.