awesome-qa

A curated list of the Question Answering (QA) subject which is a computer science discipline within the fields of information retrieval and natural language processing (NLP)

Single-turn QA: answer without considering any context
- Knowledge-based QA
- Table/List-based QA
- Text-based QA
- Community-based QA
- Visual QA
Conversational QA: use previsous conversation turns

Events
QA Systems
Lectures
Tutorial
Datasets
Competitions in QA
Publications
Books
Links

Events

Wolfram Alpha launced the answer engine in 2009.
IBM Watson system defeated top Jeopardy! champions in 2011.
Apple's Siri integrated Wolfram Alpha's answer engine in 2011.
Google embraced QA by launching its Knowledge Graph, leveraging the free base knowledge base in 2012.
Amazon Echo | Alexa (2015), Google Home | Google Assistant (2016), INVOKE | MS Cortana (2017), HomePod (2017)

QA Systems

IBM Watson: has state-of-the-arts performance.
Facebook DrQA: Applied to the SQuAD1.0 dataset. The SQuAD2.0 dataset has released. but DrQA is not tested yet.

Lectures

Question Answering - Natural Language Processing | by Dragomir Radev, Ph.D. | University of Michigan | 2016

Tutorial

Question Answering with Knowledge Bases, Web and Beyond | by Scott Wen-tau Yih & Hao Ma | Microsoft Research | 2016

Dataset Collections

Datasets

AI2 Science Questions v2.1(2017)
- It consists of questions used in student assessments in the United States across elementary and middle school grade levels. Each question is 4-way multiple choice format and may or may not include a diagram element.
- paper : http://ai2-website.s3.amazonaws.com/publications/AI2ReasoningChallenge2018.pdf
Children's Book Test
It is one of the bAbI project of Facebook AI Research which is organized towards the goal of automatic text understanding and reasoning. The CBT is designed to measure directly how well language models can exploit wider linguistic context.
DeepMind Q&A Dataset; CNN/Daily Mail
- Hermann et al. (2015) created two awesome datasets using news articles for Q&A research. Each dataset contains many documents (90k and 197k each), and each document companies on average 4 questions approximately. Each question is a sentence with one missing word/phrase which can be found from the accompanying document/context.
- paper: https://arxiv.org/abs/1506.03340
GraphQuestions
- On generating Characteristic-rich Question sets for QA evaluation.
LC-QuAD
- It is a gold standard KBQA (Question Answering over Knowledge Base) dataset containing 5000 Question and SPARQL queries. LC-QuAD uses DBpedia v04.16 as the target KB.
MS MARCO
- This is for real-world question answering.
- paper : https://arxiv.org/abs/1611.09268
NarrativeQA
- It includes the list of documents with Wikipedia summaries, links to full stories, and questions and answers.
- paper: https://arxiv.org/pdf/1712.07040v1.pdf
NewsQA
- A machine comprehension dataset
- paper: https://arxiv.org/pdf/1611.09830.pdf
Qestion-Answer Dataset by CMU
- This is a corpus of Wikipedia articles, manually-generated factoid questions from them, and manually-generated answers to these questions, for use in academic research. These data were collected by Noah Smith, Michael Heilman, Rebecca Hwa, Shay Cohen, Kevin Gimpel, and many students at Carnegie Mellon University and the University of Pittsburgh between 2008 and 2010.
SQuAD1.0
- Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable.
- paper: https://arxiv.org/abs/1606.05250
SQuAD2.0
- SQuAD2.0 combines the 100,000 questions in SQuAD1.1 with over 50,000 new, unanswerable questions written adversarially by crowdworkers to look similar to answerable ones. To do well on SQuAD2.0, systems must not only answer questions when possible, but also determine when no answer is supported by the paragraph and abstain from answering.
- paper: https://arxiv.org/abs/1806.03822 -Story cloze test
- 'Story Cloze Test' is a new commonsense reasoning framework for evaluating story understanding, story generation, and script learning. This test requires a system to choose the correct ending to a four-sentence story.
- paper: https://arxiv.org/abs/1604.01696
TriviaQA
- TriviaQA is a reading comprehension dataset containing over 650K question-answer-evidence triples. TriviaQA includes 95K question-answer pairs authored by trivia enthusiasts and independently gathered evidence documents, six per question on average, that provide high quality distant supervision for answering the questions.
- paper: https://arxiv.org/abs/1705.03551
WikiQA
- A publicly available set of question and sentence pairs for open-domain question answering.

Running Competitions in QA

	Dataset		Since	Top Rank	Model	over Human Performance
0	MS MARCO V2	Microsoft	2016~	Baidu NLP	VNET	partialy
1	TriviaQA	Univ. of Washington	2017~	mingyan	?	?
2	SQuAD 2.0	Univ. of Stanford	2018~	Kangwon National Univ.	VS^3-NET(single model)	x

Concluded Competitions in QA

	Dataset		Top Rank	Model	over Human Performance
2016	Story Cloze Test	LSDSem'17	Radford et al	Finetuned Transformer LM

Publications

Papers
- "Introduction to “This is Watson", IBM Journal of Research and Development, D. A. Ferrucci, 2012.
- "A survey on question answering technology from an information retrieval perspective", Information Sciences, 2011.
- "Question Answering in Restricted Domains: An Overview", Diego Mollá and José Luis Vicedo, Computational Linguistics, 2007
- "Natural language question answering: the view from here", L Hirschman, R Gaizauskas, natural language engineering, 2001.

The DeepQA Research Team in IBM Watson's publication within 5 years

2015
- "Automated Problem List Generation from Electronic Medical Records in IBM Watson", Murthy Devarakonda, Ching-Huei Tsou, IAAI, 2015.
- "Decision Making in IBM Watson Question Answering", J. William Murdock, Ontology summit, 2015.
- "Unsupervised Entity-Relation Analysis in IBM Watson", Aditya Kalyanpur, J William Murdock, ACS, 2015.
- "Commonsense Reasoning: An Event Calculus Based Approach", E T Mueller, Morgan Kaufmann/Elsevier, 2015.
2014
- "Problem-oriented patient record summary: An early report on a Watson application", M. Devarakonda, Dongyang Zhang, Ching-Huei Tsou, M. Bornea, Healthcom, 2014.
- "WatsonPaths: Scenario-based Question Answering and Inference over Unstructured Information", Adam Lally, Sugato Bachi, Michael A. Barborak, David W. Buchanan, Jennifer Chu-Carroll, David A. Ferrucci*, Michael R. Glass, Aditya Kalyanpur, Erik T. Mueller, J. William Murdock, Siddharth Patwardhan, John M. Prager, Christopher A. Welty, IBM Research Report RC25489, 2014.
- "Medical Relation Extraction with Manifold Models", Chang Wang and James Fan, ACL, 2014.

MS Research's publication within 5 years

2018
- "Characterizing and Supporting Question Answering in Human-to-Human Communication", Xiao Yang, Ahmed Hassan Awadallah, Madian Khabsa, Wei Wang, Miaosen Wang, ACM SIGIR, 2018.
- "FigureQA: An Annotated Figure Dataset for Visual Reasoning", Samira Ebrahimi Kahou, Vincent Michalski, Adam Atkinson, Akos Kadar, Adam Trischler, Yoshua Bengio, ICLR, 2018
2017
- "Multi-level Attention Networks for Visual Question Answering", Dongfei Yu, Jianlong Fu, Tao Mei, Yong Rui, CVPR, 2017.
- "A Joint Model for Question Answering and Question Generation", Tong Wang, Xingdi (Eric) Yuan, Adam Trischler, ICML, 2017.
- "Two-Stage Synthesis Networks for Transfer Learning in Machine Comprehension", David Golub, Po-Sen Huang, Xiaodong He, Li Deng, EMNLP, 2017.
- "Question-Answering with Grammatically-Interpretable Representations", Hamid Palangi, Paul Smolensky, Xiaodong He, Li Deng,
- "Search-based Neural Structured Learning for Sequential Question Answering", Mohit Iyyer, Wen-tau Yih, Ming-Wei Chang, ACL, 2017.
2016
- "Stacked Attention Networks for Image Question Answering", Zichao Yang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Smola, CVPR, 2016.
- "Question Answering with Knowledge Base, Web and Beyond", Yih, Scott Wen-tau and Ma, Hao, ACM SIGIR, 2016.
- "NewsQA: A Machine Comprehension Dataset", Adam Trischler, Tong Wang, Xingdi Yuan, Justin Harris, Alessandro Sordoni, Philip Bachman, Kaheer Suleman, RepL4NLP, 2016.
- "Table Cell Search for Question Answering", Sun, Huan and Ma, Hao and He, Xiaodong and Yih, Wen-tau and Su, Yu and Yan, Xifeng, WWW, 2016.
2015
- "WIKIQA: A Challenge Dataset for Open-Domain Question Answering", Yi Yang, Wen-tau Yih, and Christopher Meek, EMNLP, 2015.
- "Web-based Question Answering: Revisiting AskMSR", Chen-Tse Tsai, Wen-tau Yih, and Christopher J.C. Burges, MSR-TR, 2015.
- "Open Domain Question Answering via Semantic Enrichment", Huan Sun, Hao Ma, Wen-tau Yih, Chen-Tse Tsai, Jingjing Liu, and Ming-Wei Chang, WWW, 2015.
2014
- "An Overview of Microsoft Deep QA System on Stanford WebQuestions Benchmark", Zhenghao Wang, Shengquan Yan, Huaming Wang, and Xuedong Huang, MSR-TR, 2014.
- "Semantic Parsing for Single-Relation Question Answering", Wen-tau Yih, Xiaodong He, Christopher Meek, ACL, 2014.

Google AI's publication within 5 years

2018
- "Ask the Right Questions: Active Question Reformulation with Reinforcement Learning", Christian Buck and Jannis Bulian and Massimiliano Ciaramita and Wojciech Paweł Gajewski and Andrea Gesmundo and Neil Houlsby and Wei Wang, ICLR, 2018.
- "Did the model understand the question?", Pramod K. Mudrakarta and Ankur Taly and Mukund Sundararajan and Kedar Dhamdhere, ACL, 2018.
2017
- "Analyzing Language Learned by an Active Question Answering Agent", Christian Buck and Jannis Bulian and Massimiliano Ciaramita and Wojciech Gajewski and Andrea Gesmundo and Neil Houlsby and Wei Wang, NIPS, 2017.
- "Learning Recurrent Span Representations for Extractive Question Answering", Kenton Lee and Shimi Salant and Tom Kwiatkowski and Ankur Parikh and Dipanjan Das and Jonathan Berant, ICLR, 2017.
- "Neural Paraphrase Identification of Questions with Noisy Pretraining", Gaurav Singh Tomar and Thyago Duque and Oscar Täckström and Jakob Uszkoreit and Dipanjan Das, SCLeM, 2017.
2014
- "Great Question! Question Quality in Community Q&A", Sujith Ravi and Bo Pang and Vibhor Rastogi and Ravi Kumar, ICWSM, 2014.

Books

Natural Language Question Answering system Paperback - Boris Galitsky (2003)
New Directions in Question Answering - Mark T. Maybury (2004)
Part 3. 5. Question Answering in The Oxford Handbook of Computational Linguistics - Sanda Harabagiu and Dan Moldovan (2005)
Chap.28 Question Answering in Speech and Language Processing - Daniel Jurafsky & James H. Martin (2017)

songjein / awesome-qa Goto Github PK

awesome-qa's Introduction

awesome-qa

Contents

Events

QA Systems

Lectures

Tutorial

Dataset Collections

Datasets

Running Competitions in QA

Concluded Competitions in QA

Publications

The DeepQA Research Team in IBM Watson's publication within 5 years

MS Research's publication within 5 years

Google AI's publication within 5 years

Books

Links

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent