Comments (19)
Note: HuggingFace includes this in its datasets package.
https://huggingface.co/datasets/natural_questions
from lm-evaluation-harness.
Warning: This dataset is super big.
from lm-evaluation-harness.
Warning: This dataset is super big.
How big is “super big”?
from lm-evaluation-harness.
97G.
from lm-evaluation-harness.
what the fuck. Why are we not training on this.
from lm-evaluation-harness.
Ah, dev set is only 1G. But we should add train set to the pile.
from lm-evaluation-harness.
We would need to dedupe this with Wikipedia, since the bulk of it is just the HTML of Wikipedia pages.
from lm-evaluation-harness.
I can claim this
from lm-evaluation-harness.
I can claim this
Assigned!
from lm-evaluation-harness.
would love to take this on if help on implementing the evaluation is still needed?
from lm-evaluation-harness.
would love to take this on if help on implementing the evaluation is still needed?
Yes this would be quite helpful. Thanks!
from lm-evaluation-harness.
I think Natural Questions is implemented already? https://github.com/EleutherAI/lm-evaluation-harness/blob/master/lm_eval/tasks/naturalqs.py
from lm-evaluation-harness.
@haileyschoelkopf Some methods are not implemented, they raise NotImplementedError
from lm-evaluation-harness.
Ah you're right sorry!--I'm not sure why this was originally merged then. It's not in the task registry though so it should be alright to keep in the repo until the refactor is done, at which point we can decide what to do with it
from lm-evaluation-harness.
I wonder what the progress of NQ eval is and if any help is needed?
from lm-evaluation-harness.
@memray I am under the impression that is hasn't been implemented and help is need.
from lm-evaluation-harness.
+1
from lm-evaluation-harness.
+1
from lm-evaluation-harness.
Closed by #789 which implements the NaturalQs dataset split used by Llama and (possibly, unconfirmed) used by PaLM and more!
from lm-evaluation-harness.
Related Issues (20)
- No inference time is returned in results HOT 3
- HellaSwag with UnicodeDecodeError HOT 8
- Output constrained support
- Does this support the model to use generate functions to eval not likelihood?
- Bug in yaml parsing
- Seq2Seq Models with Batch Size `auto`
- Cannot have both a group list and task list HOT 3
- IndexError: list index out of range when running benchmark on gguf model
- Support OpenAI's Batch API HOT 1
- Same results - different models HOT 4
- How to filter to see only generate_until: lm-eval --tasks list
- Sorting task output alphabetically HOT 2
- error in eval-tracker : 'Namespace' object has no attribute 'push_results_to_hub' HOT 1
- Data preprocess is slow for mmlu HOT 1
- Error when limit is not specified (possibly issue with requirements?) HOT 2
- openai.InternalServerError: the model generated invalid Unicode output
- Support loading slices of a split from a dataset
- Math or minerva_math not generating any samples via scripts.write_out HOT 1
- Add NPU support for huggingface.py HOT 2
- llama3 baseline reproduction problem HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from lm-evaluation-harness.