Giter Club home page Giter Club logo

more's Introduction

Getting MoRE out of Mixture of Language Model Reasoning Experts (EMNLP 2023 Findings)

This repository contains the code and data for running the experiments in our paper. Please see below for more detailed instructions for running the code.

Data

All the model prediction data can be downloaded from this link. Once you download it, unzip it and put it under the uniqa_predictions_final folder.

It contains two subsets: one for dev set and another for test set. All our evaluation results are based on the test sets. Each subset should contain the experts' (and the dataset-specific few-shot baseline's) predictions on all the 12 datasets used in our paper.

Training the Router

You can run python3 feature_classifier.py to train the random forest router and run inference to score all predictions. For ablation, you can set agreement = False to exclude the inter-expert agreement features; or you can also set qonly = True to train a router that only uses the question features (see more detailed in the paper).

Generalizability Evaluation

Once you run inference and save the router scores (which we already provided in feature_classifiers), you can run python3 ensemble.py to reproduce all results reported in Table 1, The default method is classifier, which uses the router classifier's scores for answer selection; you can also set to other methods for comparison.

Selective QA Evaluation

For the selective QA evaluation, run python3 abstention.py. You can use either MaxProb or the router's score to score predictions by setting method correspondingly and set the metric among AUC, Cov@80, and Cov@90 in the all_metric function. Use the ER_metric function to compute the effective reliability, which involves first searching for a threshold based on the dev set.

Citation

@article{Si:Shi:Zhao:Zettlemoyer:Boyd-Graber-2023,
	Title = {Getting \underline{MoRE} out of \underline{M}ixture \underline{o}f Language Model \underline{R}easoning \underline{E}xperts},
	Author = {Chenglei Si and Weijia Shi and Chen Zhao and Luke Zettlemoyer and Jordan Lee Boyd-Graber},
	Journal = {Findings of Empirical Methods in Natural Language Processing},
	Year = {2023},
	Location = {Singapore},
}

If you have any questions about the code or paper, feel free to email Chenglei ([email protected]).

more's People

Contributors

noviscl avatar

Stargazers

 avatar Ege Onur Taga avatar Niranjan Anandkumar avatar Aashiq Muhamed avatar Jiaxin Zhang avatar Jeff Carpenter avatar ldruth avatar

Watchers

 avatar  avatar

more's Issues

few shot instruction

Hello, may I ask what the few-shot instruction(prompt) you use when doing the evaluation?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.