The nl2lf from mars-wei

NL2LF

(持续更新中...)

The Resources for "Natural Language to Logical Form" Research;
"自然语言转逻辑形式"研究资料收集: 本阶段主要以NL2SQL的研究为主, 主要包括评测公开数据集、相关论文和部分代码实现、相关博客或公众号文章。

NL2SQL
一、主要评测数据集 dataset
二、主要论文方法及代码实现 papers&code
    1. WikiSQL
    2. Spider
三、相关资源扩展 extend-resources
    1. RelatedWorks
    2. SQL2Seq
    3. 图神经网络 GNN

NL2SQL & Text2SQL

一、主要评测数据集(DataSet)

Academic, Advising, ATIS, Geography, Restaurants, Scholar, IMDB, Yelp, etc.
- Blog http://jkk.name/text2sql-data/
- GitHub https://github.com/jkkummerfeld/text2sql-data
- Paper Improving Text-to-SQL Evaluation Methodology, Finegan-Dollak C, Kummerfeld J K, Zhang L, et al., ACL 2018

WikiTableQuestions
- Home WikiTableQuestions: a Complex Real-World Question Understanding Dataset

WikiSQL
WikiSQL数据集特点:
1. 单表单列查询;
2. 聚合操作('MAX', 'MIN', 'COUNT', 'SUM', 'AVG');
3. 条件连接('AND');
4. 条件比较('=', '>', '<')
- GitHub https://github.com/salesforce/WikiSQL
- Paper Seq2sql: Generating structured queries from natural language using reinforcement learning, Zhong V, Xiong C, Socher R. , 2017.

Spider
Spider数据集特点:
1. Complex, Cross-domain and Zero-shot
2. 多表多列查询, 复杂子查询;
3. 聚合操作('MAX', 'MIN', 'COUNT', 'SUM', 'AVG','GROUP', 'HAVING', 'LIMIT');
4. join连接：('join', 'on', 'as')
5. where连接：('AND','OR');
6. where操作：('not', 'between', '=', '>', '<', '>=', '<=', '!=', 'in', 'like', 'is', 'exists')
7. 排序操作：('order by', 'desc', 'asc')
8. sql连接：('Intersect', 'Union', 'Except')
- Home https://yale-lily.github.io/spider
- GitHub https://github.com/taoyds/spider
- Paper Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task Yu T, Zhang R, Yang K, et al. , EMNLP 2018.
- PPT spider/wikisql/tableQA数据集统计对比_by gibbsxiong

SParC
SParC数据集特点:
1. Context-dependent and Multi-turn version of the Spider task.
  继承Spider特点的上下文多轮任务。
- Home https://yale-lily.github.io/sparc
- PaperSParC: Cross-Domain Semantic Parsing in Context, Yu T, Zhang R, Yasunaga M, et al., ACL 2019.

CoSQL
CoSQL数据集特点:
1. Cross-domain Conversational, the Dilaogue version of the Spider and SParC tasks.
  继承Spider特点的多轮对话任务，涉及意图澄清。
- Home https://yale-lily.github.io/cosql
- Paper CoSQL: A Conversational Text-to-SQL Challenge Towards Cross-Domain Natural Language Interfaces to Databases, Yu T, Zhang R, Er H Y, et al., EMNLP-IJCNLP 2019.

Chinese Spider

中文版Spider
- Home https://taolusi.github.io/CSpider-explorer/
- GitHub https://github.com/taolusi/chisp
- Paper A Pilot Study for Chinese SQL Semantic Parsing, Qingkai Min, Yuefeng Shi and Yue Zhang, EMNLP-IJCNLP 2019.

TableQA
首届中文NL2SQL挑战赛数据特点:
1. 中文加强版WikiSql，金融等泛领域数据
2. 单表多列(两列)查询
3. 聚合操作('MAX', 'MIN', 'COUNT', 'SUM', 'AVG');
4. 条件连接('AND', 'OR');
5. 条件比较('=', '>', '<', '!=')

二、主要论文方法及代码实现（Papers&Code）

论文主要以WikiSQL和Spider为评测数据，相应排行榜详见任务主页。
下面主要整理具有代表性的方法，持续更新补充...
注: score表示 | model | Dev accuracy | Test accuracy |，WikiSQL表示执行准确率，Spider表示逻辑准确率，且不包括值预测。

`1. WikiSQL:`

Weakly Supervised

采用弱监督方法，即不使用sql的逻辑形式作为监督信号。

Paper

Min S, Chen D, Hajishirzi H, et al. A discrete hard em approach for weakly supervised question answering[C]. EMNLP-IJCNLP 2019.
Agarwal R, Liang C, Schuurmans D, et al. Learning to Generalize from Sparse and Underspecified Rewards. 2019.
Liang C, Norouzi M, Berant J, et al. Memory augmented policy optimization for program synthesis and semantic parsing[C].NeurIPS, 2018: 9994-10006.

Code

Score

Hard-EM	84.4	83.9
MeRL	74.9	74.8
MAPO	72.2	72.1

ExecutionGuided

Execution Guided 可以在解码阶段通过执行错误对生成sql的项进行修正,从而过滤了一些不符合实际的sql语句。主要分为三类执行错误：1）句法解析错误，即生成的sql语法错误。2）执行失败。常见的run-time error, 例如SUM( ) 和比较string类型的数据；3）假设执行结果不为空，则空查询的条件错误。例如条件值实际并不存在于预测的列中, 因此会去 Beam Search 实际包含条件值的列。

Paper

Wang C, Huang P S, Polozov A, et al. Robust Text-to-SQL Generation with Execution-Guided Decoding[J]. 2018.
Wang C, Brockschmidt M, Singh R. Pointing out SQL queries from text[J]. 2018.
Dong L, Lapata M. Coarse-to-fine decoding for neural semantic parsing[J]. 2018.
Huang P S, Wang C, Singh R, et al. Natural language to structured query generation via meta-learning[J]. 2018.

Code

Score

Coarse2Fine + EG	84.0	83.8
Coarse2Fine	79.0	78.5
Pointer-SQL + EG	78.4	78.3
Pointer-SQL	72.5	71.9

SQLNet Framework

设计了一种满足SQL语法的框架, 在这样的语法框架内，只需要预测并填充相应的槽位。语法框架为：
SELECT $AGG $COLUMN
WHERE $COLUMN $OP $VALUE
(AND $COLUMN $OP $VALUE)*
在这基础上去完成不同的联合任务的分类预测：

select-column, 选择的列

select-aggregation，聚合操作类型

where-number， where条件语句的数量

where-column， where条件中的列

where-operator， where条件操作类型（'<','=','>'）

where-value， where条件值

Paper

Xu X, Liu C, Song D. SQLNet: Generating structured queries from natural language without reinforcement learning[J]. 2018.
Hwang W, Yim J, Park S, et al. A Comprehensive Exploration on WikiSQL with Table-Aware Word Contextualization[J]. 2019.
He P, Mao Y, Chakrabarti K, et al. X-SQL: reinforce schema representation with context[J]. 2019.

Code

Score

BERT-XSQL-Attention + EG	92.3	91.8
BERT-XSQL-Attention	89.5	88.7
BERT-SQLova-LSTM	87.2	86.2
BERT-SQLova-LSTM + EG	90.2	89.6
GloVe-SQLNet-BiLSTM	69.8	68.0

Model Interactive

基于用户交互的语义解析，更偏向于落地实践。在生成sql后，通过自然语句生成来进一步要求用户进行意图澄清，从而对sql进行修正。

Blog

Facebook提出全新交互式语义分析框架，自然语言生成SQL语句准确率提升10%

Paper

Yao Z, Su Y, Sun H, et al. Model-based Interactive Semantic Parsing: A Unified Framework and A Text-to-SQL Case Study[C]. EMNLP-IJCNLP 2019.

Code

https://github.com/sunlab-osu/MISP

`2. Spider:`

GNN Encoding Seq2Seq

利用多表关联信息来建立一个表名、列名为节点，表内、表间关系为边的图。通过GNN方法计算每一个节点(table item)的隐藏状态。在seq2seq模型的encoding阶段，每个query word 向量对每个 table item隐藏向量进行attention计算，并将attention权重作为每个query word的图表示。在decoding阶段，结合语法规则，如果输出应为table item,则将输出向量与所有table item隐藏向量进行全连接打分，计算其关联程度。

Paper

Krishnamurthy J, Dasigi P, Gardner M. Neural semantic parsing with type constraints for semi-structured tables[C]. EMNLP 2017.
Lin K, Bogin B, Neumann M, et al. Grammar-based Neural Text-to-SQL Generation. 2019.
Bogin B, Gardner M, Berant J. Representing Schema Structure with Graph Neural Networks for Text-to-SQL Parsing[C]. ACL 2019.
Bogin B, Gardner M, Berant J. Global Reasoning over Database Structures for Text-to-SQL Parsing[C]. EMNLP-IJCNLP 2019.
Shaw P, Massey P, Chen A, et al. Generating Logical Forms from Graph Representations of Text and Entities[C]. ACL 2019.

Code

Score

BERTRAND + GNN	57.9	54.6
Global-GNN	52.7	47.4
GNN	40.7	39.4
GNN w/edge vectors	32.1	-

RATSQL

Paper

Wang B, Shin R, Liu X, et al.RAT-SQL: Relation-Aware Schema Encoding and Linking for Text-to-SQL Parsers [C].
ICLR 2020. review on ACL 2020.

Score

RATSQL v2 + BERT (DB content used)	65.8	61.9
RASQL + BERT	60.8	55.7
RAT-SQL	60.6	53.7

IRNet MSRA work
Blog

智能数据分析技术，解锁Excel“对话”新功能 Conversational Data Analysis

Paper

Guo J, Zhan Z, Gao Y, et al. Towards Complex Text-to-SQL in Cross-Domain Database with Intermediate Representation[C]. ACL 2019.
Dong Z, Sun S, Liu H, et al. Data-Anonymous Encoding for Text-to-SQL Generation[C] EMNLP-IJCNLP 2019.
Liu H, Fang L, Liu Q, et al. Leveraging Adjective-Noun Phrasing Knowledge for Comparison Relation Prediction in Text-to-SQL[C]. EMNLP-IJCNLP 2019.
Liu Q, Chen B, Lou J G, et al. FANDA: A Novel Approach to Perform Follow-up Query Analysis[C]. AAAI 2019.
Liu Q, Chen B, Liu H, et al. A Split-and-Recombine Approach for Follow-up Query Analysis[C]. EMNLP-IJCNLP 2019.

Code

https://github.com/neeraj-bhat/IRNet/tree/dev

Score

IRNet-v2 + BERT	63.9	55.0
IRNet + BERT-Base	61.9	54.7
IRNet-v2	55.4	48.5
IRNet	53.2	46.7

EditSQL

Paper

Zhang R, Yu T, Er H Y, et al. Editing-Based SQL Query Generation for Cross-Domain Context-Dependent Questions[C]. EMNLP-IJCNLP 2019.

Code

https://github.com/ryanzhumich/editsql

Score

EditSQL + BERT	57.6	53.4
EditSQL	36.4	32.9

SQLNet Framework

Paper

Leveraging Adjective-Noun Phrasing Knowledge for Comparison Relation Prediction in Text-to-SQL[C]. EMNLP 2019
Yu T, Yasunaga M, Yang K, et al. Syntaxsqlnet: Syntax tree networks for complex and cross-domaintext-to-sql task[C]. EMNLP 2018.
Dongjun Lee. Clause-Wise and Recursive Decoding for Complex and Cross-Domain Text-to-SQL Generation[C]. EMNLP 2019.
Lin K, Bogin B, Neumann M, et al. Grammar-based Neural Text-to-SQL Generation. 2019.

Code

https://github.com/taoyds/syntaxSQL

Score

GrammarSQL	34.8	33.8
SyntaxSQLNet + augment	24.8	27.2
RCSQL	28.5	24.3
SyntaxSQLNet	18.9	19.7
SQLNet	10.9	12.4

三、相关资源扩展 (extend resources)

1. RelatedWorks

Blog

Paper

Dhamdhere K, McCurley K S, Nahmias R, et al. Analyza: Exploring data with conversation[C]//Proceedings of the 22nd International Conference on Intelligent User Interfaces. ACM, 2017.

2. SQL2Seq

Paper

Xu K, Wu L, Wang Z, et al. Graph2seq: Graph to sequence learning with attention-based neural networks.2018.
Xu K, Wu L, Wang Z, et al. SQL-to-text generation with graph-to-sequence model[C]. EMNLP 2018.

Code

3. 图神经网络（GNN)

Blog

从图(Graph)到图卷积(Graph Convolution)：漫谈图神经网络模型

Paper

Code

mars-wei / nl2lf Goto Github PK

nl2lf's Introduction

NL2LF

NL2SQL & Text2SQL

一、主要评测数据集(DataSet)

二、主要论文方法及代码实现（Papers&Code）

`1. WikiSQL:`

`2. Spider:`

三、相关资源扩展 (extend resources)

1. RelatedWorks

2. SQL2Seq

3. 图神经网络（GNN)

nl2lf's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent