Giter Club home page Giter Club logo

known_examples's Introduction

Known vs Unknown Examples

This is the repository for the paper Crafting In-context Examples according to LMs’ Parametric Knowledge (NAACL 2024 Findings).

Dataset

We use 4 datasets, AmbigQA, QAMPARI, QUEST, and GSM8K. Please place each dataset under dataset directory with designated namings (ambiqa, qampari, quest, grade-school-math).

/jsons

This directory contains preprocessed datasets and experimental results. To preprocess three multi-QA datasets, run process.py in the folder of each dataset name. This will create train.json, dev.json, and test.json (if exists). Furthermore, train_reduced.json will be created, which contains examples that have less than 20 answers.

score_with_simcse.py

As our experiments utilize the SimCSE embeddings, we compute the embeddings for the datasets in prior. For each dataset, run the command below.

python score_with_simcse.py [data_name]

This will result train_embeddings.pt, dev_embeddings.pt, and test_embeddings.pt (if exists), and train_reduced_embeddings.pt.

/web_server

This is a critical folder than contains code for serving the language model for inference. Rather than loading LLM to the GPUs every time we run an experiment, we load them once by running the web server, and query them in our experiment scripts. The web server allows for complex types of queries (i.e. much more than basic text completion), many of which are useful for several of the experiments. We support two LLMs, Llama2 and OPT. You can run the web server by the command below.

flask --app app_run_[model] run -p [PORT]

/answer_set

This directory conducts experiment in Section 3. We first construct in-context example sets, differed by the amount of parametric knowledge the model shows. Please run answer_set.py with the command below.

python answer_shot.py [data_name] [strategy] [split] [PORT]

Once we have obtained answer_sets.json, we now use these to infer on evaluation datasets with answer_shot.py. strategy includes none, some, full, base, which responds to Unknown, HalfKnown, Known, Random, respectively.

python answer_set.py [data_name] [PORT]

/answer_set/math

We also experiment on GSM8K dataset. obtain_f1.py and calc_f1.py evaluate the training examples with intial_prompts and construct in-context example sets. Now we infer on test data with answer_shot.py and compute accuracy with eval.py. For INST, please use none, some, full, base. Set_num ranges from 1 to 4.

python obtain_f1.py [PORT]
python calc_f1.py
python answer_shot.py [PORT] [INST] [Set_num]
python eval.py [INST] [Set_num]

/answer_ordering

We first order the gold answers of train examples with diverse ordering strategies. For Greedy Ordering, please run the command below. reverse, which takes a value among grd and rev, indicates whether its ordering is Greedy or Reverse Greedy.

python greedy_decoding.py [data_name] [PORT] [reverse]

For Perplexity Ordering, please run the command below. Unlike above, we can obtain both Perplexity and Reverse Perplexity orderings with one inference. They are denoted as asc and desc, respectively.

python perplexity.py [data_name] [PORT]

Now that we have the answer orderings of train examples ready, we can infer on the evaluation datasets. order accepts a value among rand, grd, asc, rev, desc, alpha, which represents six answer ordering stratgies introduced in our paper. They are in the order as presented in Table 4.

python answer_ordering.py [data_name] [PORT] [split] [order]

experiment.py

Finally, we evaluate the generated outputs. experiment.py parses the model generations and computes F1/EM scores. The first argument of this script indicates which experiment it is, since we have two independent ones. pk indicates experiments from answer_set and ord indicates experiments from answer_ordering.

python experiment.py [pk] [data_name] [split] [strategy] [Set_num]
python experiment.py [ord] [data_name] [split] [order]

known_examples's People

Contributors

lilys012 avatar

Stargazers

Daehoon Gwak avatar Zhimeng Guo avatar bubble avatar

Watchers

 avatar

known_examples's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.