Giter Club home page Giter Club logo

advfact's Introduction

AdvFact

The directory contains trained models, diagnostic test sets and augmented training data for paper Factuality Checker is not Faithful: Adversarial Meta-evaluation of Factuality in Summarization

Factuality metrics

Six representative factuality checkers included in the paper are as follows:

The table below represents the 6 factuality metrics and their model types as well as training datas.

Models Type Train data
MnliBert NLI-S MNLI
MnliRoberta NLI-S MNLI
MnliElectra NLI-S MNLI
Dae NLI-A PARANMT-G
FactCC NLI-S CNNDM-G
Feqa QA QA2D,SQuAD

The model type and training data of factuality metrics. NLI-A and NLI-S represent the model belongs to NLI-based metrics while defining facts as dependency arcs and span respectively. PARANMT-G and CNNDM-G mean the automatically generated training data from PARANMT and CNN/DailyMail.

Adversarial transformation codes

The codes of adversarial transformations are in the directory of adversarial transformation. To make adversarial transformation, please run the following commands:

CUDA_VISIBLE_DEVICES=0 python ./adversarial_transformation/main.py -path DATA_PATH -save_dir SAVE_DIR -trans_type all

Change the DATA_PATH and SAVE_DIR to your own data path and save directory.

Diagnostic evaluation set

Six base evaluation datasets and four adversarial transformations are included in the paper.

Every adversarial transformation can be performed on the six base evaluation datasets, thus results in 24 diagnostic evaluation set. All base evaluation datasets and diagnostic evaluation sets can be found here. The detailed information for 6 baseline test sets and 24 diagnostic sets is shown in the table below :

Base Test Sets Origin Adversarial Transformation
Dataset type Nov. #Sys. #Sam. AntoSub NumEdit EntRep SynPrun
DocAsClaim CNNDM 0 .0 0 11490 26487 25283 6816 9533
RefAsClaim CNNDM 77.7 0 10000 14131 11621 28758 4572
FaccTe CNNDM 54 10 503 670 515 440 245
QagsC CNNDM 28.6 1 504 711 615 539 351
RankTe CNNDM 52.5 3 1072 1646 1310 767 540
FaithFact XSum 99.2 5 2332 363 94 114 118

The detailed statistics of baseline (left) and diagnostic (right) test sets. For baseline test sets in the left, dataset type means the dataset that source document and summary belong to. Here, CNNDM means CNN/DailyMail dataset. Nov.(%) means the proportion of trigrams in claims that don't exist in source documents. #Sys. and #Sam. represent the number of summarization systems that the output summaries come from and the test set size respectively. For diagnostic test sets on the right, all cells mean the sample size of the sets.

Error analysis samples

The 140 samples that are misclassified by the FactCC are in the directory: data

Augmented training data

The augmented training data can be downloaded here.

advfact's People

Contributors

slkjoweiefhuir avatar zide05 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.