AdvFact

The directory contains trained models, diagnostic test sets and augmented training data for paper Factuality Checker is not Faithful: Adversarial Meta-evaluation of Factuality in Summarization

Factuality metrics

Six representative factuality checkers included in the paper are as follows:

FactCC: the codes and original FactCC can be downloaded from https://github.com/salesforce/factCC. The four FactCCs trained with sub sampling and augmented data can be downloaded from here.
Dae: the codes and trained model can be downloaded from https://github.com/tagoyal/dae-factuality.
BertMnli, RobertaMnli, ElectraMnli: the codes are included in baseline and the trained models can be downloaded here.
Feqa: the codes and trained model can be downloaded from https://github.com/esdurmus/feqa.

The table below represents the 6 factuality metrics and their model types as well as training datas.

Models	Type	Train data
MnliBert	NLI-S	MNLI
MnliRoberta	NLI-S	MNLI
MnliElectra	NLI-S	MNLI
Dae	NLI-A	PARANMT-G
FactCC	NLI-S	CNNDM-G
Feqa	QA	QA2D,SQuAD

The model type and training data of factuality metrics. NLI-A and NLI-S represent the model belongs to NLI-based metrics while defining facts as dependency arcs and span respectively. PARANMT-G and CNNDM-G mean the automatically generated training data from PARANMT and CNN/DailyMail.

Adversarial transformation codes

The codes of adversarial transformations are in the directory of adversarial transformation. To make adversarial transformation, please run the following commands:

CUDA_VISIBLE_DEVICES=0 python ./adversarial_transformation/main.py -path DATA_PATH -save_dir SAVE_DIR -trans_type all

Change the DATA_PATH and SAVE_DIR to your own data path and save directory.

Diagnostic evaluation set

Six base evaluation datasets and four adversarial transformations are included in the paper.

Base evaluation datasets
- DocAsClaim: Document sentence as claim.
- RefAsClaim: Reference summary sentence as claim.
- FaccTe: Human annotated evaluation set from Evaluating the Factual Consistency of Abstractive Text Summarization
- QagsC: Human annotated evaluation set from Asking and Answering Questions to Evaluate the Factual Consistency of Summaries
- RankTe: Human annotated evaluation set from Ranking Generated Summaries by Correctness: An Interesting but Challenging Application for Natural Language Inference
- FaithFact: Human annotated evaluation set from On Faithfulness and Factuality in Abstractive Summarization
Adversarial transformation
- Antonym Substitution
- Numerical Editing
- Entity Replacement
- Syntactic Pruning

Every adversarial transformation can be performed on the six base evaluation datasets, thus results in 24 diagnostic evaluation set. All base evaluation datasets and diagnostic evaluation sets can be found here. The detailed information for 6 baseline test sets and 24 diagnostic sets is shown in the table below :

Base Test Sets	Origin				Adversarial Transformation
Base Test Sets	Dataset type	Nov.	#Sys.	#Sam.	AntoSub	NumEdit	EntRep	SynPrun
DocAsClaim	CNNDM	0 .0	0	11490	26487	25283	6816	9533
RefAsClaim	CNNDM	77.7	0	10000	14131	11621	28758	4572
FaccTe	CNNDM	54	10	503	670	515	440	245
QagsC	CNNDM	28.6	1	504	711	615	539	351
RankTe	CNNDM	52.5	3	1072	1646	1310	767	540
FaithFact	XSum	99.2	5	2332	363	94	114	118

The detailed statistics of baseline (left) and diagnostic (right) test sets. For baseline test sets in the left, dataset type means the dataset that source document and summary belong to. Here, CNNDM means CNN/DailyMail dataset. Nov.(%) means the proportion of trigrams in claims that don't exist in source documents. #Sys. and #Sam. represent the number of summarization systems that the output summaries come from and the test set size respectively. For diagnostic test sets on the right, all cells mean the sample size of the sets.

Error analysis samples

The 140 samples that are misclassified by the FactCC are in the directory: data

Augmented training data

The augmented training data can be downloaded here.

minghaoguo20 / advfact Goto Github PK

advfact's Introduction

AdvFact

Factuality metrics

Adversarial transformation codes

Diagnostic evaluation set

Error analysis samples

Augmented training data

advfact's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent