Giter Club home page Giter Club logo

faaf's Introduction

FaaF: Facts as a Function

made-with-pythonarxiv

This is the official release accompanying our 2024 paper, FaaF: Facts as a Function for the evaluation of generated text.

If you find FaaF useful, please cite:

@misc{katranidis2024faaf,
      title={FaaF: Facts as a Function for the evaluation of generated text}, 
      author={Vasileios Katranidis and Gabor Barany},
      year={2024},
      eprint={2403.03888},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Prerequisites

Before you start, ensure you have the following:

  • Python 3.11 installed on your system, as specified in the pyproject.toml file.
  • Poetry for Python dependency management and packaging. If Poetry is not installed, you can install it by following the instructions on the official Poetry website.
  • Access to the LLM APIs that you intend to use for fact verification, along with the necessary API keys or authentication credentials for each LLM.

Dataset

We use an augmented version of the WikiEval dataset which includes fact statements for each QA pair and human annotation of their truthfulness against each anser type. The dataset used in the paper is released in Hugging Face Vaskatr/WikiEvalFacts. See the paper for more details.

The dataset is programmatically fetched from Hugging Face during with each evaluation run. There is no need to download it manually.

Installation

1.Clone the repository:

git clone https://github.com/vasiliskatr/faaf.git
cd faaf

2.Install Dependencies: Use Poetry to install the project dependencies and set up the virtual environment:

poetry install

This command reads the pyproject.toml file and installs all the necessary dependencies required to run the project.

Usage

1.Activate the Virtual Environment:

To activate the poetry-created virtual environment, run:

poetry shell

This command spawns a shell within the virtual environment. Any Python or command-line tool you run in this shell will use the settings and dependencies specified for your project.

2.Add you API keys:

Use the .env_example as a template and add you API keys for the LLMs used. Note that only OPENAI_API_KEY and CLAUDE_API are required to reproduce the results in the paper.

3.Reproduce results:

To replicate the results as described in the paper run

python wiki_eval_factual_recall.py --auto true

Optioanlly, specify the number of threads for each LLM provider according to the available rate limit to speed up the evaluation.

python wiki_eval_factual_recall.py --auto true --oai_num_threads 40 --anthropic_num_threads 10

4.Run the evaluation using a specific LLM:

By excluding the --auto argument (defaults to False) and specifying --llm, the evaluation runs for a specfic LLM only, E.g.:

python wiki_eval_factual_recall.py --llm gpt-4-turbo --oai_num_threads 40 

Arguments

--oai_num_threads: Number of threads for OpenAI models.

--anthropic_num_threads: Number of threads for Anthropic models.

--mistral_num_threads: Number of threads for Mistral models.

--auto: Set to True to automatically run all experiments for reproducing the paper's results.

--llm: Specify the LLM model to use for the experiment.

Next steps

We may add functioality to use 'FaaF' to any Dataset

faaf's People

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.