Giter Club home page Giter Club logo

faithfulness's Introduction

Faithfulness πŸ˜‡

An easy-to-use library to evaluate faithfulness (factual correctness) of abstractive summaries. Faithfulness is computed by comparing a summary with its original source document.

This library includes multiple faithfulness metrics based on:

  • BERTScore
  • Entailment
  • Question Generation & Question Answering framework (QGQA)
  • Named Entity Recognition (NER)
  • Open Information Extraction (OpenIE)
  • Semantic Role Labeling (SRL)
  • Sentence Similarity (SentSim)

Installation βš™οΈ

  1. $ conda create -n my_project python=3.8 This creates a new virtual environment for your project with conda. You can activate it with $ conda activate my_project.
  2. $ conda install pytorch==1.7.1 torchvision==0.8.2 torchaudio==0.7.2 cudatoolkit=10.1 -c pytorch Please install PyTorch by following the instructions here. Make sure to install the CUDA variant that matches the CUDA version of your GPU.
  3. $ pip install faithfulness This installs the faithfulness library and it's dependencies. Read more about the dependencies below.

All faithfulness metrics are model-based. Some models have to be installed manually:

  • Download the SRL model here and save it in your project. e.g. /models/srl_model.tar.gz
  • Download a spacy model: $ python -m spacy download en_core_web_sm
  • Download CoreNLP: import stanza && stanza.install_corenlp()

Usage πŸ”₯

from faithfulness.QGQA import QGQA

qgqa = QGQA()
summary = "Lorem ipsum dolor sit amet"
source = "Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam ..."
result: QGQAResult = qgqa.score(summary, source)
print(f"Faithfulness: {result["f1"]}")

More examples can be found here πŸ’―.

Evaluation πŸ“Š

We evaluated all faithfulness metrics by correlating them with human judgements on the XSUM dataset (link). You will soon be able to read more about the evaluation in our paper. (Master's thesis)

Method Pearson (r) Spearman (p)
πŸ₯‡ BERTScore 0.501 0.486
πŸ₯ˆ Entailment 0.366 0.422
πŸ₯‰ SentSim 0.392 0.389
SRL 0.393 0.377
NER 0.252 0.259
QGQA 0.228 0.258
OpenIE 0.169 0.185

Reproduce results & evaluate custom dataset

You can download the preprocessed XSUM dataset here and the preprocessed Summeval dataset here to reproduce the above results. To evaluate the faithfulness metrics on other datasets, we recommend using the provided Experimentor class. For this, your dataset has to be in the following JSON format:

prepared_dataset.json

[{
    "summary": "the summary text..."
    "source": "the source text..."
    "summary_sentences": ["summary sentence 1", "summary sentence 2", ...] 
    "source_sentences": ["source sentence 1", "source sentence 2", ...] 
    "faithfulness": 0.0 - 1.0
}, ...]

You can convert .csv files to the required format using Experimentor.prepare_dataset(). This function uses the spacy model "en_core_web_lg" to perform sentence splitting and requires a csv file with summary, source and faithfulness keys:

dataset.csv:
summary,source,faithfulness
"summary text","source text",0.9

Experimentor.prepare_dataset(Path("dataset.csv"))

You can now use the experimentor:

output_path=Path("./experiments/dataset/qgqa"
faithfulness_metric = QGQA(metric=BERTScore, save_path=output_path, batch_mode=True)
Experimentor(data_path=Path("./prepared_dataset.json"),
             out_path=output_path),
             metric=faithfulness_metric,
             experiment_name="qgqa_bertscore").experiment()

In the above example, correlations are written to /experiments/dataset/qgqa/qgqa_bertscore.csv

Dependencies πŸ”—

By running $ pip install faithfulness you will install this library as well as the following dependencies:

Troubleshooting πŸ› 

There are currently problems when installing allennlp and jsonnet. If you encounter "Building wheel for jsonnet (setup.py) ... error_" during the installation please try:

apt-get install make 
apt-get install g++ 

or install jsonnet before installing this library

conda install -c conda-forge jsonnet
pip install faithfulness

faithfulness's People

Contributors

bigabig avatar dmlls avatar

Stargazers

 avatar Nikolaus Schlemm avatar Dennis Aumiller avatar  avatar Hou Pong (Ken) Chan avatar  avatar Miriam AnschΓΌtz avatar Kevin G avatar init avatar Talha Chafekar avatar Ulf Hamster avatar

Watchers

 avatar

faithfulness's Issues

Transformer version

I run into some version discrepancy when I run the experiments, could you please share what is the transformer version you use?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.