Giter Club home page Giter Club logo

arc-solvers's Introduction

ARC-Solvers

Library of baseline solvers for AI2 Reasoning Challenge (ARC) Set (http://data.allenai.org/arc/). These solvers retrieve relevant sentences from a large text corpus (ARC_Corpus.txt in the dataset), and use two types of models to predict the correct answer.

  1. An entailment-based model that computes the entailment score for each (retrieved sentence, question+answer choice as an assertion) pair and scores each answer choice based on the highest-scoring sentence.
  2. A reading comprehension model (BiDAF) that converts the retrieved sentences into a paragraph per question. The model is used to predict the best answer span and each answer choice is scored based on the overlap with the predicted span.

Setup environment

  1. Create the arc_solvers environment using Anaconda
conda create -n arc_solvers python=3.6
  1. Activate the environment
source activate arc_solvers
  1. Install the requirements in the environment:
sh scripts/install_requirements.sh
  1. Install pytorch as per instructions on http://pytorch.org/. Command as of Feb. 26, 2018:
conda install pytorch torchvision -c pytorch

Setup data/models

  1. Download the data and models into data/ folder. This will also build the ElasticSearch index (assumes ElasticSearch 6+ is running on ES_HOST machine defined in the script)
sh scripts/download_data.sh
  1. Download and prepare embeddings. This will download glove.840B.300d.zip from https://nlp.stanford.edu/projects/glove/ and convert it to glove.840B.300d.txt.gz which is readable from AllenNLP
sh download_and_prepare_glove.sh

Running baseline models

Run the entailment-based baseline solvers against a question set using scripts/evaluate_solver.sh

Running a pre-trained DGEM model

For example, to evaluate the DGEM model on the Challenge Set, run:

sh scripts/evaluate_solver.sh \
	data/ARC-V1-Feb2018/ARC-Challenge/ARC-Challenge-Test.jsonl \
	data/ARC-V1-Models-Aug2018/dgem/

Change dgem to decompatt to test the Decomposable Attention model.

Running a pre-trained BiDAF model

To evaluate the BiDAF model, use the evaluate_bidaf.sh script

 sh scripts/evaluate_bidaf.sh \
    data/ARC-V1-Feb2018/ARC-Challenge/ARC-Challenge-Test.jsonl \
    data/ARC-V1-Models-Aug2018/bidaf/

Training and evaluating the BiLSTM Max-out with Question to Choices Max Attention

This model implements an attention interaction between the context-encoded representations of the question and the choices. The model is described here.

To train the model, download the data and word embeddings (see Setup data/models above).

Evaluate the trained model:

python arc_solvers/run.py evaluate \
    --archive_file data/ARC-V1-Models-Aug2018/max_att/model.tar.gz \
    --evaluation_data_file data/ARC-V1-Feb2018/ARC-Challenge/ARC-Challenge-Test.jsonl

or

Train a new model:

python arc_solvers/run.py train \
    -s trained_models/qa_multi_question_to_choices/serialization/ \
    arc_solvers/training_config/qa/multi_choice/reader_qa_multi_choice_max_att_ARC_Chellenge_full.json

Running against a new question set

To run the baseline solvers against a new question set, create a file using the JSONL format. For example:

{
    "id":"Mercury_SC_415702",
    "question": {
       "stem":"George wants to warm his hands quickly by rubbing them. Which skin surface will
               produce the most heat?",
       "choices":[
                  {"text":"dry palms","label":"A"},
                  {"text":"wet palms","label":"B"},
                  {"text":"palms covered with oil","label":"C"},
                  {"text":"palms covered with lotion","label":"D"}
                 ]
    },
    "answerKey":"A"
}

Run the evaluation scripts on this new file using the same commands as above.

Running a new Entailment-based model

To run a new entailment model (implemented using AllenNLP), you need to

  1. Create a Predictor that converts the input JSON to an Instance expected by your entailment model. See DecompAttPredictor for an example.

  2. Add your custom predictor to the predictor overrides For example, if your new model is registered using my_awesome_model and the predictor is registered using my_awesome_predictor, add "my_awesome_model": "my_awesome_predictor" to the predictor_overrides.

  3. Run the evaluate_solver.sh script with your learned model in my_awesome_model/model.tar.gz:

 sh scripts/evaluate_solver.sh \
    data/ARC-V1-Feb2018/ARC-Challenge/ARC-Challenge-Test.jsonl \
    my_awesome_model/

Running a new Reading Comprehension model

To run a new reading comprehension (RC) model (implemented using AllenNLP), you need to

  1. Create a Predictor that converts the input JSON to an Instance expected by your RC model. See BidafQaPredictor for an example.

  2. Add your custom predictor to the predictor overrides For example, if your new model is registered using my_awesome_model and the predictor is registered using my_awesome_predictor, add "my_awesome_model": "my_awesome_predictor" to the predictor_overrides.

  3. Run the evaluate_bidaf.sh script with your learned model in my_awesome_model/model.tar.gz:

 sh scripts/evaluate_solver.sh \
    data/ARC-V1-Feb2018/ARC-Challenge/ARC-Challenge-Test.jsonl \
    my_awesome_model/

arc-solvers's People

Contributors

dirkgr avatar tbmihailov avatar tusharkhot avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

arc-solvers's Issues

README.md with not replicable steps

Hey there, I'm trying to replicate some of the solvers in my machine but I am having problems with all of them. More specifically, I am trying to reproduce the BiLSTM Max-out with Question to Choices Max Attention.

I did not trained anything. I just tried to evaluate the trained model. I am receiving the following error however:

IndexError: invalid index of a 0-dim tensor. Use tensor.item() to convert a 0-dim tensor to a Python number

Could anyone please help?

ImportError: cannot import name 'Dataset'

Since I was using allennlp in a different project, I continued using the pip version and not the allennlp branch - git@5fd28f0f63d8ca96fc0931bebac8224fa071c35f outlined in this repo.

When running the baseline solvers (entailment based or BiDAF), I get the error ImportError: cannot import name 'Dataset' [entailment_tuple_reader.py] - which in the current release of allennlp seems to have been refactored away from allennlp.data.dataset.

Any ETA on when the arc-solvers codebase will be compatible with the release version of allennlp?

AuthorizationException(403) w.r.t elasticsearch

Hi tusharkhot,

I do the commands according to README. But when I run "sh scripts/download_data.sh", there is error w.r.t elasticsearch happened. The following are the details:

Undecodable raw error response from server: Expecting value: line 1 column 1 (char 0) AuthorizationException(403, '<!DOCTYPE HTML>\n<html>\n\n<head>\n

And I guess error happended in this row:
res = es.indices.create(index=index_name, ignore=400, body=mapping)

I am not familiar with elasticsearch. Could you give me some advices?

Bash files for training BiDAF and other models (except Bi-LSTM maxout)

Hi,

Thanks for providing these models, these are very useful resources.

Do we have bash files (similar to Bi-LSTM maxout model) for training BiDAF model just on ScienceQA datasets ?? I could find instructions for evaluating test data using pre-trained models but couldn't find instructions to train BiDAF or Decomp-att on ScienceQA datasets.

Other instructions suggested replacing pre-trained BiDAF models with the BiDAF model provided in this repo.
Do we have to externally train the BiDAF model (i.e. following the original paper) and then replace the trained model in "ARC-V1-models/" directory??

Paper available?

Hello,

Nice work, and thank you for taking the time to write a descriptive repo. The ARC results page lists this as having been accepted to EMNLP (congrats!). Since reviews have finished, I was wondering if you have submitted to arxiv or if there would otherwise be a way to take a look at the paper itself to see the experimental design, etc.

Thanks,
Sam

The docker file can't pass build

Build Error on Step 11 /15 .

It's shows no right to build the lib.


I install on my mac laptop.
After I download the data, then I try to start an ElasticSearch instance and index the ARC corpus

python scripts/index-corpus.py arc_solvers/data/ARC-V1-Feb2018/ARC_Corpus.txt arc_corpus "localhost"

But I got an error :

ConnectionError(<urllib3.connection.HTTPConnection object at 0x110845080>: Failed to establish a new connection: [Errno 61] Connection refused) caused by: NewConnectionError(<urllib3.connection.HTTPConnection object at 0x110845080>: Failed to establish a new connection: [Errno 61] Connection refused)

I also try another workstation. It still has this error.

Training a BiDAF solver

Hi,

I wanted to know how to train the BiDAF model on a new dataset after converting to the format specified in solvers/convert_to_para_comprehension.py

In short, is there a sample json file for the BiDAF model that I could use with the dataset format after conversion.

Thank you so much !

ImportError: torch.utils.ffi is deprecated.

I try to run the BiLSTM Max-out trained model but I get the following error message

Traceback (most recent call last):
  File "arc_solvers/run.py", line 10, in <module>
    from arc_solvers.commands import main  # pylint: disable=wrong-import-position
  File "/home/peter/peter/clone/ARC-Solvers/arc_solvers/commands/__init__.py", line 1, in <module>
    from allennlp.commands import main as main_allennlp
  File "/home/peter/peter/clone/ARC-Solvers/arc/lib/python3.6/site-packages/allennlp/commands/__init__.py", line 4,
 in <module>
    from allennlp.commands.serve import Serve
  File "/home/peter/peter/clone/ARC-Solvers/arc/lib/python3.6/site-packages/allennlp/commands/serve.py", line 28, i
n <module>
    from allennlp.service import server_sanic
  File "/home/peter/peter/clone/ARC-Solvers/arc/lib/python3.6/site-packages/allennlp/service/server_sanic.py", line
 20, in <module>
    from allennlp.models.archival import load_archive
  File "/home/peter/peter/clone/ARC-Solvers/arc/lib/python3.6/site-packages/allennlp/models/__init__.py", line 7, i
n <module>
    from allennlp.models.crf_tagger import CrfTagger
  File "/home/peter/peter/clone/ARC-Solvers/arc/lib/python3.6/site-packages/allennlp/models/crf_tagger.py", line 10
, in <module>
    from allennlp.modules import Seq2SeqEncoder, TimeDistributed, TextFieldEmbedder, ConditionalRandomField
  File "/home/peter/peter/clone/ARC-Solvers/arc/lib/python3.6/site-packages/allennlp/modules/__init__.py", line 13,
 in <module>
    from allennlp.modules.seq2seq_encoders import Seq2SeqEncoder
  File "/home/peter/peter/clone/ARC-Solvers/arc/lib/python3.6/site-packages/allennlp/modules/seq2seq_encoders/__ini
t__.py", line 83, in <module>
    from allennlp.modules.alternating_highway_lstm import AlternatingHighwayLSTM
  File "/home/peter/peter/clone/ARC-Solvers/arc/lib/python3.6/site-packages/allennlp/modules/alternating_highway_ls
tm.py", line 10, in <module>
    from allennlp.custom_extensions._ext import highway_lstm_layer
  File "/home/peter/peter/clone/ARC-Solvers/arc/lib/python3.6/site-packages/allennlp/custom_extensions/_ext/highway
_lstm_layer/__init__.py", line 2, in <module>
    from torch.utils.ffi import _wrap_function
  File "/home/peter/peter/clone/ARC-Solvers/arc/lib/python3.6/site-packages/torch/utils/ffi/__init__.py", line 1, i
n <module>
    raise ImportError("torch.utils.ffi is deprecated. Please use cpp extensions instead.")
ImportError: torch.utils.ffi is deprecated. Please use cpp extensions instead.

which says that torch.utils.ffi is deprecated.

Curious why the Inference based solvers were not used

In the ARC paper, aside from the IR and the PMI solver implementations, curious why the Inference based solvers from (Clark et al., 2016) (RULE, ILP) or Aristo were not used for the demarcation between easy and challenging questions. Neither do they appear in Table 6, where the scores of various implementations are listed.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.