Giter Club home page Giter Club logo

counterfactual-evaluation's Introduction

Reasoning or Reciting? Exploring the Capabilities and Limitations of Language Models Through Counterfactual Evaluations

This repo contains the code, data, and model interactions (prompts and model responses) that we used in our paper, Reasoning or Reciting? Exploring the Capabilities and Limitations of Language Models Through Counterfactual Evaluations. Each task has its own README file with more details. You can install the dependencies via pip install -r requirements.txt (we have noticed that this sometimes fails on a mac; try linux). We tested with Python 3.11 though it is likely that other versions would also work. Unless otherwise mentioned, all commands in the README files should be run from the root directory of this repo, i.e. here.

Rerunning our experiments requires that you have access to the relevant APIs (export the corresponding API keys to the environment variables OPENAI_API_KEY, ANTHROPIC_API_KEY, and PALM_API_KEY). Nevertheless, if you want to reproduce our experiments, you can convert our provided model interactions files into a cache file, and then our query function will automatically reuse the cache content without making API calls. You should be able to reproduce all of our experiments this way, except for the natural language logic task where the author shared with us a non-public version of the dataset (see our paper for more details). To do this conversion, run

python create_cache.py {arithmetic,programming/execution,programming/generation,syntax,spatial,drawing,music/chords,music/melodies,chess,SET}/model_interactions

Using this cache, you should obtain the exact numbers from our paper, unless otherwise mentioned in the individual README files when some version of randomness is involved. Note that the model interactions for the logic task needs to be downloaded separately, see its README.md for more details. If you choose to do so, you should include the logic task in the above command as well.

A general note: you may see mentions of "controls" in our code/file names. You can mentally subsitute it to "CCC"---it was an old name for that.

counterfactual-evaluation's People

Contributors

zhaofengwu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

counterfactual-evaluation's Issues

Missing files prompts.py under syntax/

Hi,

I appreciate the excellent work you've done! However, I encountered an error while executing syntax/query.py. Here's the error log:

Traceback (most recent call last):
File "/Users/Documents/counterfactual-evaluation/syntax/query.py", line 10, in
from prompts import prompt_templates
ModuleNotFoundError: No module named 'prompts'

It looks like this issue may be due to the absence of the prompts.py file. Thank you.

Async API Requests Communication Issue

There appears to be an issue with the API call. When attempting async requests, it enters an indefinite loop and fails to generate any results. Interestingly, it functions correctly without async requests. Your assistance in investigating this matter would be greatly appreciated. Testing was conducted on an arithmetic dataset.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.