Giter Club home page Giter Club logo

anah's Introduction

ANAH

license

This is the repository for our ANAH series of papers, containing ANAH: Analytical Annotation of Hallucinations in Large Language Models and ANAH-v2: Scaling Analytical Hallucination Annotation of Large Language Models.

The repo contains:

  • The data for training and evaluating the LLM which consists of sentence-level hallucination annotations.
  • The model for annotating the hallucination.
  • The code for evaluating the LLMs' ability to annotate hallucination.

🚀 What's New

  • [2024.07.08] ANAH-v2 available on arXiv. 🔥🔥🔥
  • [2024.05.31] ANAH available on arXiv. 🔥🔥🔥
  • [2024.05.16] ANAH has been accepted by the main conference of ACL 2024. 🎉🎉🎉

✨ Introduction

ANAH: Analytical Annotation of Hallucinations in Large Language Models

arXiv

ANAH is a bilingual dataset that offers analytical annotation of hallucinations in LLMs within generative question answering.

Each answer sentence in our dataset undergoes rigorous annotation, involving the retrieval of a reference fragment, the judgment of the hallucination type, and the correction of hallucinated content.

ANAH consists of ~12k sentence-level annotations for ~4.3k LLM responses covering over 700 topics, constructed by a human-in-the-loop pipeline.

ANAH-v2: Scaling Analytical Hallucination Annotation of Large Language Models

arXiv

ANAH-v2 is a scalable framework for the oversight of LLM hallucinations.

Through iterative self-training, we simultaneously and progressively scale up the hallucination annotation dataset and improve the accuracy of the hallucination annotator.

The final dataset encompasses both over ∼3k topics, ∼196k model responses, and ∼822k annotated sentences, in English and Chinese.

🤗 HuggingFace Model & Dataset

Dataset

The ANAH dataset is available on Huggingface dataset hub.

Dataset Huggingface Repo
ANAH Dataset Link

Model

ANAH can be used for training hallucination annotators.

We have trained the annotators based on InternLM2 series models.

The annotator models are available on Huggingface model hub.

Model Huggingface Repo
ANAH-7B Model Link
ANAH-20B Model Link
ANAH-v2 Model Link

You have to follow the prompt in our paper to annotate the hallucination. Note that ANAH and ANAH-v2 use completely different prompts.

The models follow the conversation format of InternLM2-chat, with the template protocol as:

dict(role='user', begin='<|im_start|>user\n', end='<|im_end|>\n'),
dict(role='assistant', begin='<|im_start|>assistant\n', end='<|im_end|>\n'),

🏗️ ️Evaluation

ANAH can be used for evaluating the current open-source and close-source LLMs' ability to generate fine-grained hallucination annotation.

1. Environment Setup

We recommend you use Python 3.10 and Pytorch 1.13.1.

conda create --name anah python=3.10.13
conda activate anah
pip install -r requirements.txt

2. Inference and Evaluation

We now support the evaluation of the InternLM2, Llama2, Qwen, and Baichuan2 series of open-source models.

We use LMdeploy for model deployment and inference. If you want to test more models, you can refer to LMDeploy for relevant configuration.

We recommend you download the huggingface model to your local path and replace the {your_hf_model_path} to that path.

Our evaluations are conducted on NVIDIA A100 GPUs, and OOM may occur on other types of machines.

python -u ./eval/eval.py \
    --model_type {your_model_type} \ 
    --server_addr {your_hf_model_path} \
    --json_path {test_set_path} \
    --output_path {your_inference_results_path} \
    --eval_sorce_path {your_evaluation_result_path} \

❤️ Acknowledgements

ANAH is built with InternLM and LMDeploy. Thanks for their awesome work!

🖊️ Citation

If you find this project useful in your research, please consider citing:

@article{ji2024anah,
  title={ANAH: Analytical Annotation of Hallucinations in Large Language Models},
  author={Ji, Ziwei and Gu, Yuzhe and Zhang, Wenwei and Lyu, Chengqi and Lin, Dahua and Chen, Kai},
  journal={arXiv preprint arXiv:2405.20315},
  year={2024}
}

@article{gu2024anah,
  title={ANAH-v2: Scaling Analytical Hallucination Annotation of Large Language Models},
  author={Gu, Yuzhe and Ji, Ziwei and Zhang, Wenwei and Lyu, Chengqi and Lin, Dahua and Chen, Kai},
  journal={arXiv preprint arXiv:2407.04693},
  year={2024}
}

💳 License

This project is released under the Apache 2.0 license.

anah's People

Contributors

liqu1d-g avatar zwwwayne avatar

Stargazers

 avatar Zhang Yu avatar  avatar Yby avatar Sky CH. Wang avatar Zhang Xingquan avatar Ong Seong Wu avatar tsubaki avatar  avatar Qinyuan Cheng avatar loveSnowBest avatar BigDong avatar  avatar

Watchers

Songyang Zhang avatar Kai Chen avatar  avatar Fengzhe Zhou avatar Kostas Georgiou avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.