From the GPT-3 paper In this section we measure GPT-3’s

Warning : This dataset is super big. <p dir="au

Implement the Natural Questions evaluation about lm-evaluation-harness HOT 19 CLOSED

StellaAthena commented on May 9, 2024

Implement the Natural Questions evaluation

from lm-evaluation-harness.

Comments (19)

cfoster0 commented on May 9, 2024

Note: HuggingFace includes this in its datasets package.

https://huggingface.co/datasets/natural_questions

from lm-evaluation-harness.

cfoster0 commented on May 9, 2024

Warning: This dataset is super big.

from lm-evaluation-harness.

StellaAthena commented on May 9, 2024

Warning: This dataset is super big.

How big is “super big”?

from lm-evaluation-harness.

cfoster0 commented on May 9, 2024

97G.

from lm-evaluation-harness.

sdtblck commented on May 9, 2024

what the fuck. Why are we not training on this.

from lm-evaluation-harness.

sdtblck commented on May 9, 2024

Ah, dev set is only 1G. But we should add train set to the pile.

from lm-evaluation-harness.

cfoster0 commented on May 9, 2024

We would need to dedupe this with Wikipedia, since the bulk of it is just the HTML of Wikipedia pages.

from lm-evaluation-harness.

moirage commented on May 9, 2024

I can claim this

from lm-evaluation-harness.

StellaAthena commented on May 9, 2024

I can claim this

Assigned!

from lm-evaluation-harness.

cr458 commented on May 9, 2024

would love to take this on if help on implementing the evaluation is still needed?

from lm-evaluation-harness.

StellaAthena commented on May 9, 2024

would love to take this on if help on implementing the evaluation is still needed?

Yes this would be quite helpful. Thanks!

from lm-evaluation-harness.

haileyschoelkopf commented on May 9, 2024

I think Natural Questions is implemented already? https://github.com/EleutherAI/lm-evaluation-harness/blob/master/lm_eval/tasks/naturalqs.py

from lm-evaluation-harness.

juletx commented on May 9, 2024

@haileyschoelkopf Some methods are not implemented, they raise NotImplementedError

from lm-evaluation-harness.

haileyschoelkopf commented on May 9, 2024

Ah you're right sorry!--I'm not sure why this was originally merged then. It's not in the task registry though so it should be alright to keep in the repo until the refactor is done, at which point we can decide what to do with it

from lm-evaluation-harness.

memray commented on May 9, 2024

I wonder what the progress of NQ eval is and if any help is needed?

from lm-evaluation-harness.

StellaAthena commented on May 9, 2024

@memray I am under the impression that is hasn't been implemented and help is need.

from lm-evaluation-harness.

wwngh1233 commented on May 9, 2024

from lm-evaluation-harness.

Sea-Snell commented on May 9, 2024

from lm-evaluation-harness.

haileyschoelkopf commented on May 9, 2024

Closed by #789 which implements the NaturalQs dataset split used by Llama and (possibly, unconfirmed) used by PaLM and more!

from lm-evaluation-harness.

Implement the Natural Questions evaluation about lm-evaluation-harness HOT 19 CLOSED

Comments (19)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent