Adding reasoning to your AI? Take these datasets, they may help you on your way.
AGI/causality/frml grammar |
|
|
Deepmind Chomsky Hierarchy |
Problems crafted for FSM/PDA/TM |
[1] |
automata |
a neurallambda tool to gen from grammars |
[1] |
im a strange dataset |
Tough for LLMs because of self-references. |
[1] |
DiagGSM8k |
NL Reasoning Benchmark |
[1] |
CLadder |
Causal reasoning |
[1] |
Cause-Effect Pairs |
108 datasets of 2 var dynamics (not NL) |
[1] |
MNLI Entailment |
sentence parsing + entailment |
[1] |
AGENT/TOOL |
|
|
THUDM AgentInstruct |
long form dialogs |
[1] |
WANG AgentInstruct |
gpt3 synthesized instructions |
[1] |
KnowLM Tool |
prompt + tool call + answer |
[1] |
Glaive Tool Usage |
sys prompt says tools + prompt + answer |
[1] |
opentoolformer retrieval |
prompt + tool call |
[1] |
CODE |
|
|
rosetta |
same program, many diff languages |
[1] |
EvoEval Tool Use |
100 prompt + code + tests |
[1] |
MATH/LOGIC |
|
|
gsm8k |
Grade School Math 8k |
[1] |
MetaMath |
one-shot math |
[1] |
MetaMathFewShot |
few-shot math |
[1] |
MathPile |
9B tok from filtered internet |
[1] |
LogiQA |
NL multi choice, requires abstraction |
[1] |
Logic-LM |
a model combining auto theorem provers and llms |
[1] |
Coq Facts |
270k cog theorem prover programs |
[1] |
NATURAL LANGUAGE |
|
|
UltraInteract_sft |
GPT generated iterated reasoning dialogs |
[1] |
MUD videogames |
(various could be training data) |
|
Winogrande |
ambiguous sentences, fill in 1 word |
[1] |
Winograd_wsc |
ambiguous sentences, choose the right word |
[1] |
Contradiction |
2 phrases, do they contradict |
[1] |
Recognizing Textual Entailment |
2 phrases, do they entail each other |
[1] |
Textual Entailment Pool |
more entailment |
[1] |
Answer Validation |
2 phrases, does the answer solve question |
[1] |
Monotonicity Entailment |
x is true, does y follow |
[1] |
entailment |
passage, question -> T/F |
[1] |
Commonsense QA |
muti choice QA |
[1] |
GLUE |
several datasets |
[1] |
custom multi-hop |
use wikipedia's graph of articles |
|
TOY PROBLEMS |
|
|
Big Bench Hard |
23 challenges (only 6k datapoints) |
[1] |
logical entailment dataset |
logic strings by deepmind |
[1] |
logical entailment dataset code |
(generate it yourself) |
[1] |
FSM Game |
generate strings according to grammar |
|
Adaptive Grammar |
grammar rule might change |
|
String/Graph Rewriting |
|
string_rewriting.py |
LibraryOfLogic |
generate NL from multiple games |
[1] |
AB-XY Game |
|
|
word ladder |
|
|
parser |
|
|
longest cmn subseq |
|
|
string reversal |
|
|
wisconsin card sorting |
|
|
anagram |
|
|
palindrome |
|
|