Giter Club home page Giter Club logo

fact-checking's Introduction

Fact-checking

This repo contains a collection of Kaggle notebooks that were used to train the models in our paper.

  • train and test contains skeleton code for training and testing the NLI models, which were adapted to suit different models and datasets for our experiments
  • bm25-crossencoder contains the code used to generate the top-100 snippets for each claim, and the top-5 reranking using a cross encoder. The dataset is available at https://www.kaggle.com/datasets/askuiper/quantemp, which also includes the QuanTemp dataset itself, and the corpus.
  • claimdecomp contains code used to train the BART model to decompose the claims in QuanTemp, and the reranking code to extract the top-5 snippets from the bm25 top-100. The dataset is available at https://www.kaggle.com/datasets/askuiper/quantemp-decomp-data.
  • temporal-reranking contains the code used to rerank the top-5 snippets from the bm25 top-100 by taking into account temporal information. The dataset is available at https://www.kaggle.com/datasets/lucasvm/quantemp-temporal-rerank.
  • strategyqa contains the code used to invistigate the use of a model trained on StrategyQA in order to decompose questions and answer them iteratively using a Deberta model trained on SQuAD. The dataset is available at https://www.kaggle.com/datasets/lucasvm/strategyqa-decomp-quantemp.

fact-checking's People

Contributors

lucasvanmol avatar kuipiekuip avatar

Watchers

 avatar  avatar

fact-checking's Issues

bulletpoint-1

Evaluate different NLI models while freezing the retrieval component. Some examples are BART-
large-MNLI, Roberta-Large-MNLI, sileod/deberta-v3-base-tasksource-nli. Also consider generative
models like FlanT5, GPT2, GPT3 (you can use API but evaluating on gpt3 is optional) and
BART. Analyze the performance across different classifications of claims like temporal, statistical,
comparison and interval claims given in the dataset. A reference for training NLI models can be
found at https://colab.research.google.com/drive/1gZJCakmY28cKGMj8B7wd1GUM3r72pdbi?
usp=sharing. While this script shows how you can use claim and justification document as evidence
to form entailment, in your experiments you would retrieve evidence relevant to claim and train your
classifier. You could choose to select top-k evidence and concatenate them to perform entailment
with the claim. You can also choose alternate ways of performing inference where you perform
entailment with claim and an evidence and aggregate predictions across evidences for a single claim
for more fine-grained veracity prediction.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.