allenai / arc-solvers Goto Github PK

ARC Question Solvers

License: Apache License 2.0

Python 93.22% Shell 5.70% Dockerfile 1.07%

arc-solvers's Introduction

ARC-Solvers

Library of baseline solvers for AI2 Reasoning Challenge (ARC) Set (http://data.allenai.org/arc/). These solvers retrieve relevant sentences from a large text corpus (ARC_Corpus.txt in the dataset), and use two types of models to predict the correct answer.

An entailment-based model that computes the entailment score for each (retrieved sentence, question+answer choice as an assertion) pair and scores each answer choice based on the highest-scoring sentence.
A reading comprehension model (BiDAF) that converts the retrieved sentences into a paragraph per question. The model is used to predict the best answer span and each answer choice is scored based on the overlap with the predicted span.

Setup environment

Create the arc_solvers environment using Anaconda

conda create -n arc_solvers python=3.6

Activate the environment

source activate arc_solvers

Install the requirements in the environment:

sh scripts/install_requirements.sh

Install pytorch as per instructions on http://pytorch.org/. Command as of Feb. 26, 2018:

conda install pytorch torchvision -c pytorch

Setup data/models

Download the data and models into data/ folder. This will also build the ElasticSearch index (assumes ElasticSearch 6+ is running on ES_HOST machine defined in the script)

sh scripts/download_data.sh

Download and prepare embeddings. This will download glove.840B.300d.zip from https://nlp.stanford.edu/projects/glove/ and convert it to glove.840B.300d.txt.gz which is readable from AllenNLP

sh download_and_prepare_glove.sh

Running baseline models

Run the entailment-based baseline solvers against a question set using scripts/evaluate_solver.sh

Running a pre-trained DGEM model

For example, to evaluate the DGEM model on the Challenge Set, run:

sh scripts/evaluate_solver.sh \
	data/ARC-V1-Feb2018/ARC-Challenge/ARC-Challenge-Test.jsonl \
	data/ARC-V1-Models-Aug2018/dgem/

Change dgem to decompatt to test the Decomposable Attention model.

Running a pre-trained BiDAF model

To evaluate the BiDAF model, use the evaluate_bidaf.sh script

 sh scripts/evaluate_bidaf.sh \
    data/ARC-V1-Feb2018/ARC-Challenge/ARC-Challenge-Test.jsonl \
    data/ARC-V1-Models-Aug2018/bidaf/

Training and evaluating the BiLSTM Max-out with Question to Choices Max Attention

This model implements an attention interaction between the context-encoded representations of the question and the choices. The model is described here.

To train the model, download the data and word embeddings (see Setup data/models above).

Evaluate the trained model:

python arc_solvers/run.py evaluate \
    --archive_file data/ARC-V1-Models-Aug2018/max_att/model.tar.gz \
    --evaluation_data_file data/ARC-V1-Feb2018/ARC-Challenge/ARC-Challenge-Test.jsonl

Train a new model:

python arc_solvers/run.py train \
    -s trained_models/qa_multi_question_to_choices/serialization/ \
    arc_solvers/training_config/qa/multi_choice/reader_qa_multi_choice_max_att_ARC_Chellenge_full.json

Running against a new question set

To run the baseline solvers against a new question set, create a file using the JSONL format. For example:

{
    "id":"Mercury_SC_415702",
    "question": {
       "stem":"George wants to warm his hands quickly by rubbing them. Which skin surface will
               produce the most heat?",
       "choices":[
                  {"text":"dry palms","label":"A"},
                  {"text":"wet palms","label":"B"},
                  {"text":"palms covered with oil","label":"C"},
                  {"text":"palms covered with lotion","label":"D"}
                 ]
    },
    "answerKey":"A"
}

Run the evaluation scripts on this new file using the same commands as above.

Running a new Entailment-based model

To run a new entailment model (implemented using AllenNLP), you need to

Create a Predictor that converts the input JSON to an Instance expected by your entailment model. See DecompAttPredictor for an example.
Add your custom predictor to the predictor overrides For example, if your new model is registered using my_awesome_model and the predictor is registered using my_awesome_predictor, add "my_awesome_model": "my_awesome_predictor" to the predictor_overrides.
Run the evaluate_solver.sh script with your learned model in my_awesome_model/model.tar.gz:

 sh scripts/evaluate_solver.sh \
    data/ARC-V1-Feb2018/ARC-Challenge/ARC-Challenge-Test.jsonl \
    my_awesome_model/

Running a new Reading Comprehension model

To run a new reading comprehension (RC) model (implemented using AllenNLP), you need to

Create a Predictor that converts the input JSON to an Instance expected by your RC model. See BidafQaPredictor for an example.
Add your custom predictor to the predictor overrides For example, if your new model is registered using my_awesome_model and the predictor is registered using my_awesome_predictor, add "my_awesome_model": "my_awesome_predictor" to the predictor_overrides.
Run the evaluate_bidaf.sh script with your learned model in my_awesome_model/model.tar.gz:

 sh scripts/evaluate_solver.sh \
    data/ARC-V1-Feb2018/ARC-Challenge/ARC-Challenge-Test.jsonl \
    my_awesome_model/

arc-solvers's People

Contributors

Stargazers

Watchers

Forkers

dongfang91 dirkgr tbmihailov vikas95 akariasai keshavseth abarbosa94 aghie legodps strategist922 yukyin rpiryani shanestorks sos0911 thanx-computer pandeyamrit

arc-solvers's Issues

README.md with not replicable steps

Hey there, I'm trying to replicate some of the solvers in my machine but I am having problems with all of them. More specifically, I am trying to reproduce the BiLSTM Max-out with Question to Choices Max Attention.

I did not trained anything. I just tried to evaluate the trained model. I am receiving the following error however:

IndexError: invalid index of a 0-dim tensor. Use tensor.item() to convert a 0-dim tensor to a Python number

Could anyone please help?

ImportError: cannot import name 'Dataset'

Since I was using allennlp in a different project, I continued using the pip version and not the allennlp branch - git@5fd28f0f63d8ca96fc0931bebac8224fa071c35f outlined in this repo.

When running the baseline solvers (entailment based or BiDAF), I get the error ImportError: cannot import name 'Dataset' [entailment_tuple_reader.py] - which in the current release of allennlp seems to have been refactored away from allennlp.data.dataset.

Any ETA on when the arc-solvers codebase will be compatible with the release version of allennlp?

AuthorizationException(403) w.r.t elasticsearch

Hi tusharkhot,

I do the commands according to README. But when I run "sh scripts/download_data.sh", there is error w.r.t elasticsearch happened. The following are the details:

Undecodable raw error response from server: Expecting value: line 1 column 1 (char 0) AuthorizationException(403, '<!DOCTYPE HTML>\n<html>\n\n<head>\n

And I guess error happended in this row:
res = es.indices.create(index=index_name, ignore=400, body=mapping)

I am not familiar with elasticsearch. Could you give me some advices?

Bash files for training BiDAF and other models (except Bi-LSTM maxout)

Hi,

Thanks for providing these models, these are very useful resources.

Do we have bash files (similar to Bi-LSTM maxout model) for training BiDAF model just on ScienceQA datasets ?? I could find instructions for evaluating test data using pre-trained models but couldn't find instructions to train BiDAF or Decomp-att on ScienceQA datasets.

Other instructions suggested replacing pre-trained BiDAF models with the BiDAF model provided in this repo.
Do we have to externally train the BiDAF model (i.e. following the original paper) and then replace the trained model in "ARC-V1-models/" directory??

predictor for BiLSTM Max-out with Question to Choices Max Attention?

Are there a predictor for the Question to Choices Max Attention model?
and where did the model get supporting sentence?

Paper available?

Hello,

Nice work, and thank you for taking the time to write a descriptive repo. The ARC results page lists this as having been accepted to EMNLP (congrats!). Since reviews have finished, I was wondering if you have submitted to arxiv or if there would otherwise be a way to take a look at the paper itself to see the experimental design, etc.

Thanks,
Sam

Is BiDAF in this repo already trained? Where can I get one in PyTorch version?

The docker file can't pass build

Build Error on Step 11 /15 .

It's shows no right to build the lib.

I install on my mac laptop.
After I download the data, then I try to start an ElasticSearch instance and index the ARC corpus

python scripts/index-corpus.py arc_solvers/data/ARC-V1-Feb2018/ARC_Corpus.txt arc_corpus "localhost"

But I got an error :

ConnectionError(<urllib3.connection.HTTPConnection object at 0x110845080>: Failed to establish a new connection: [Errno 61] Connection refused) caused by: NewConnectionError(<urllib3.connection.HTTPConnection object at 0x110845080>: Failed to establish a new connection: [Errno 61] Connection refused)

I also try another workstation. It still has this error.

Is BiDAF in this repo ensemble?

Training a BiDAF solver

Hi,

I wanted to know how to train the BiDAF model on a new dataset after converting to the format specified in solvers/convert_to_para_comprehension.py

In short, is there a sample json file for the BiDAF model that I could use with the dataset format after conversion.

Thank you so much !

ImportError: torch.utils.ffi is deprecated.

I try to run the BiLSTM Max-out trained model but I get the following error message

Traceback (most recent call last):
  File "arc_solvers/run.py", line 10, in <module>
    from arc_solvers.commands import main  # pylint: disable=wrong-import-position
  File "/home/peter/peter/clone/ARC-Solvers/arc_solvers/commands/__init__.py", line 1, in <module>
    from allennlp.commands import main as main_allennlp
  File "/home/peter/peter/clone/ARC-Solvers/arc/lib/python3.6/site-packages/allennlp/commands/__init__.py", line 4,
 in <module>
    from allennlp.commands.serve import Serve
  File "/home/peter/peter/clone/ARC-Solvers/arc/lib/python3.6/site-packages/allennlp/commands/serve.py", line 28, i
n <module>
    from allennlp.service import server_sanic
  File "/home/peter/peter/clone/ARC-Solvers/arc/lib/python3.6/site-packages/allennlp/service/server_sanic.py", line
 20, in <module>
    from allennlp.models.archival import load_archive
  File "/home/peter/peter/clone/ARC-Solvers/arc/lib/python3.6/site-packages/allennlp/models/__init__.py", line 7, i
n <module>
    from allennlp.models.crf_tagger import CrfTagger
  File "/home/peter/peter/clone/ARC-Solvers/arc/lib/python3.6/site-packages/allennlp/models/crf_tagger.py", line 10
, in <module>
    from allennlp.modules import Seq2SeqEncoder, TimeDistributed, TextFieldEmbedder, ConditionalRandomField
  File "/home/peter/peter/clone/ARC-Solvers/arc/lib/python3.6/site-packages/allennlp/modules/__init__.py", line 13,
 in <module>
    from allennlp.modules.seq2seq_encoders import Seq2SeqEncoder
  File "/home/peter/peter/clone/ARC-Solvers/arc/lib/python3.6/site-packages/allennlp/modules/seq2seq_encoders/__ini
t__.py", line 83, in <module>
    from allennlp.modules.alternating_highway_lstm import AlternatingHighwayLSTM
  File "/home/peter/peter/clone/ARC-Solvers/arc/lib/python3.6/site-packages/allennlp/modules/alternating_highway_ls
tm.py", line 10, in <module>
    from allennlp.custom_extensions._ext import highway_lstm_layer
  File "/home/peter/peter/clone/ARC-Solvers/arc/lib/python3.6/site-packages/allennlp/custom_extensions/_ext/highway
_lstm_layer/__init__.py", line 2, in <module>
    from torch.utils.ffi import _wrap_function
  File "/home/peter/peter/clone/ARC-Solvers/arc/lib/python3.6/site-packages/torch/utils/ffi/__init__.py", line 1, i
n <module>
    raise ImportError("torch.utils.ffi is deprecated. Please use cpp extensions instead.")
ImportError: torch.utils.ffi is deprecated. Please use cpp extensions instead.

which says that torch.utils.ffi is deprecated.

Curious why the Inference based solvers were not used

In the ARC paper, aside from the IR and the PMI solver implementations, curious why the Inference based solvers from (Clark et al., 2016) (RULE, ILP) or Aristo were not used for the demarcation between easy and challenging questions. Neither do they appear in Table 6, where the scores of various implementations are listed.