Giter Club home page Giter Club logo

awesome-qa's Introduction

awesome-qa

A curated list of the Question Answering (QA) subject which is a computer science discipline within the fields of information retrieval and natural language processing (NLP)

  • Single-turn QA: answer without considering any context
    • Knowledge-based QA
    • Table/List-based QA
    • Text-based QA
    • Community-based QA
    • Visual QA
  • Conversational QA: use previsous conversation turns

Contents

Events

  • Wolfram Alpha launced the answer engine in 2009.
  • IBM Watson system defeated top Jeopardy! champions in 2011.
  • Apple's Siri integrated Wolfram Alpha's answer engine in 2011.
  • Google embraced QA by launching its Knowledge Graph, leveraging the free base knowledge base in 2012.
  • Amazon Echo | Alexa (2015), Google Home | Google Assistant (2016), INVOKE | MS Cortana (2017), HomePod (2017)

QA Systems

  • IBM Watson: has state-of-the-arts performance.
  • Facebook DrQA: Applied to the SQuAD1.0 dataset. The SQuAD2.0 dataset has released. but DrQA is not tested yet.

Lectures

Tutorial

Dataset Collections

Datasets

  • AI2 Science Questions v2.1(2017)
  • Children's Book Test
  • It is one of the bAbI project of Facebook AI Research which is organized towards the goal of automatic text understanding and reasoning. The CBT is designed to measure directly how well language models can exploit wider linguistic context.
  • DeepMind Q&A Dataset; CNN/Daily Mail
    • Hermann et al. (2015) created two awesome datasets using news articles for Q&A research. Each dataset contains many documents (90k and 197k each), and each document companies on average 4 questions approximately. Each question is a sentence with one missing word/phrase which can be found from the accompanying document/context.
    • paper: https://arxiv.org/abs/1506.03340
  • GraphQuestions
    • On generating Characteristic-rich Question sets for QA evaluation.
  • LC-QuAD
    • It is a gold standard KBQA (Question Answering over Knowledge Base) dataset containing 5000 Question and SPARQL queries. LC-QuAD uses DBpedia v04.16 as the target KB.
  • MS MARCO
  • NarrativeQA
  • NewsQA
  • Qestion-Answer Dataset by CMU
    • This is a corpus of Wikipedia articles, manually-generated factoid questions from them, and manually-generated answers to these questions, for use in academic research. These data were collected by Noah Smith, Michael Heilman, Rebecca Hwa, Shay Cohen, Kevin Gimpel, and many students at Carnegie Mellon University and the University of Pittsburgh between 2008 and 2010.
  • SQuAD1.0
    • Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable.
    • paper: https://arxiv.org/abs/1606.05250
  • SQuAD2.0
    • SQuAD2.0 combines the 100,000 questions in SQuAD1.1 with over 50,000 new, unanswerable questions written adversarially by crowdworkers to look similar to answerable ones. To do well on SQuAD2.0, systems must not only answer questions when possible, but also determine when no answer is supported by the paragraph and abstain from answering.
    • paper: https://arxiv.org/abs/1806.03822 -Story cloze test
    • 'Story Cloze Test' is a new commonsense reasoning framework for evaluating story understanding, story generation, and script learning. This test requires a system to choose the correct ending to a four-sentence story.
    • paper: https://arxiv.org/abs/1604.01696
  • TriviaQA
    • TriviaQA is a reading comprehension dataset containing over 650K question-answer-evidence triples. TriviaQA includes 95K question-answer pairs authored by trivia enthusiasts and independently gathered evidence documents, six per question on average, that provide high quality distant supervision for answering the questions.
    • paper: https://arxiv.org/abs/1705.03551
  • WikiQA
    • A publicly available set of question and sentence pairs for open-domain question answering.

Running Competitions in QA

Dataset Since Top Rank Model over Human Performance
0 MS MARCO V2 Microsoft 2016~ Baidu NLP VNET partialy
1 TriviaQA Univ. of Washington 2017~ mingyan ? ?
2 SQuAD 2.0 Univ. of Stanford 2018~ Kangwon National Univ. VS^3-NET(single model) x

Concluded Competitions in QA

Dataset Top Rank Model over Human Performance
2016 Story Cloze Test LSDSem'17 Radford et al Finetuned Transformer LM

Publications

The DeepQA Research Team in IBM Watson's publication within 5 years

MS Research's publication within 5 years

Google AI's publication within 5 years

Books

  • Natural Language Question Answering system Paperback - Boris Galitsky (2003)
  • New Directions in Question Answering - Mark T. Maybury (2004)
  • Part 3. 5. Question Answering in The Oxford Handbook of Computational Linguistics - Sanda Harabagiu and Dan Moldovan (2005)
  • Chap.28 Question Answering in Speech and Language Processing - Daniel Jurafsky & James H. Martin (2017)

Links

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.