Giter Club home page Giter Club logo

ai2's Introduction

Minimal Code Base For AI2 Commonsense Leaderboard

Dependencies

install apex if you want to use half precision: https://github.com/NVIDIA/apex. Conda env file is also included for reference, the apex might not be compatiable with conda directly so you can remove that before you create an environment.

pip install -r requirements.txt

Train

Modify config.yaml as you like and run python train.py to train a model. It loads the config file and outputs all the logs/checkpoints in outputs

Eval

Get predictions without evaluation

python eval.py \
    --input_x cache/physicaliqa-train-dev/physicaliqa-train-dev/dev.jsonl \
    --config config.yaml \
    --checkpoint outputs/2020-02-26/20-26-22/lightning_logs/version_6341419/checkpoints/_ckpt_epoch_3_v0.ckpt \
    --output pred.lst

Get predictions with evaluation(accuracy, confidence interval)

python eval.py \
    --input_x cache/physicaliqa-train-dev/physicaliqa-train-dev/dev.jsonl \
    --config config.yaml \
    --checkpoint outputs/2020-02-26/20-26-22/lightning_logs/version_6341419/checkpoints/_ckpt_epoch_3_v0.ckpt \
    --input_y cache/physicaliqa-train-dev/physicaliqa-train-dev/dev-labels.lst \
    --output pred.lst

Results

PIQA

Model Bootstrapped Accuracy Mean Bootstrapped Accuracy CI Accuracy
Roberta large (V100) 77.4 75.7 - 79.4 77.3
Roberta large (K80) 74.0 72.4 - 76.2 74.2

ai2's People

Contributors

chenghaomou avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ai2's Issues

ANLI data distribution makes it hard to create internal dev - so it's temporarily ignored

If you look at the original dev data, you will see every datapoint is distinct. Training data set, however, has a lot of repetitions. This makes it infeasible to do a 90-10-10 split.

Potential solutions:

  1. one reasonable thing to do would be to (a) separate in a "not overlapping" dev and set up a cross-fold validation experiment
  2. using a fraction of the original dev as internal dev for Anli

space of values for MODEL_TYPE, MODEL_WEIGHT not clear to newbs

some more handholding to teach someone what the space of legitimate values for those variables is helpful. Perhaps a handheld walkthrough that uses an existing huggingface model would be appropriate. On top of that, a walkthrough with a trivially different model that shows e.g. subclassing a huggingface model into a new name, making a small tweak, showing how to add that model to train.py/test.py.

Wondering more details about finetuning RoBERTa on PhysicalIQA

Hi @ChenghaoMou ,

Thanks for your implementation details on benchmarks from AI2. I'm trying to finetuning RoBERTa on PhysicalIQA and I want to know some more details:

  1. what's the dev accuracy of the model that you submitted to leaderboard.
  2. what's the hyper-parameters when you were training the model.

Thanks again!

Best
Tao

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.