The nl2lf from baeseulki

NL2LF

(持续更新中...)
recently update log:

0. UnifiedSKG, UniSAr
1. GNN works: LGESQL, ShadowGNN, SADGA, S²SQL (SOTA)
2. RatSQL + Pretraining (STRUG, GraPPa, GAP, GP) + NatSQL
3. PICARD, DT-Fixup, RaSaP
4. wikisql: SeaD, SeqGenSQL, BRIDGE^

The Resources for Natural Language to Logical Form Research, Focus on NL2SQL first.
"自然语言转逻辑形式"研究资料收集: 本阶段主要以NL2SQL的研究为主, 主要包括评测公开数据集、相关论文和部分代码实现、相关博客或公众号文章。

NL2SQL
一、主要评测数据集 dataset
二、主要论文方法及代码实现 papers&code
    1. WikiSQL
    2. Spider
    3. UnifiedSKG
三、相关资源扩展 extend-resources
    1. Related Works
        1.1. Pre-training
        1.2. Systems
        1.3. Surveys
        1.4. Blogs
        1.5. Other Papers
        1.6. Tools
    2. SQL2Seq
    3. 图神经网络 GNN

NL2SQL & Text2SQL

一、主要评测数据集(DataSet)

Academic, Advising, ATIS, Geography, Restaurants, Scholar, IMDB, Yelp, etc.
- Blog http://jkk.name/text2sql-data/
- GitHub https://github.com/jkkummerfeld/text2sql-data
- Paper Improving Text-to-SQL Evaluation Methodology, Finegan-Dollak C, Kummerfeld J K, Zhang L, et al., ACL 2018

WikiTableQuestions
- Home WikiTableQuestions: a Complex Real-World Question Understanding Dataset

WikiSQL
WikiSQL数据集特点:
1. 单表单列查询;
2. 聚合操作('MAX', 'MIN', 'COUNT', 'SUM', 'AVG');
3. 条件连接('AND');
4. 条件比较('=', '>', '<')
- GitHub https://github.com/salesforce/WikiSQL
- Paper Seq2sql: Generating structured queries from natural language using reinforcement learning, Zhong V, Xiong C, Socher R. , 2017.

Spider
Spider数据集特点:
1. Complex, Cross-domain and Zero-shot
2. 多表多列查询, 复杂子查询;
3. 聚合操作('MAX', 'MIN', 'COUNT', 'SUM', 'AVG','GROUP', 'HAVING', 'LIMIT');
4. join连接：('join', 'on', 'as')
5. where连接：('AND','OR');
6. where操作：('not', 'between', '=', '>', '<', '>=', '<=', '!=', 'in', 'like', 'is', 'exists')
7. 排序操作：('order by', 'desc', 'asc')
8. sql连接：('Intersect', 'Union', 'Except')
- Home https://yale-lily.github.io/spider
- GitHub https://github.com/taoyds/spider
- Paper Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task Yu T, Zhang R, Yang K, et al. , EMNLP 2018.
- PPT spider/wikisql/tableQA数据集统计对比_by gibbsxiong

SParC
SParC数据集特点:
1. Context-dependent and Multi-turn version of the Spider task.
  继承Spider特点的上下文多轮任务。
- Home https://yale-lily.github.io/sparc
- PaperSParC: Cross-Domain Semantic Parsing in Context, Yu T, Zhang R, Yasunaga M, et al., ACL 2019.

CoSQL
CoSQL数据集特点:
1. Cross-domain Conversational, the Dilaogue version of the Spider and SParC tasks.
  继承Spider特点的多轮对话任务，涉及意图澄清。
- Home https://yale-lily.github.io/cosql
- Paper CoSQL: A Conversational Text-to-SQL Challenge Towards Cross-Domain Natural Language Interfaces to Databases, Yu T, Zhang R, Er H Y, et al., EMNLP-IJCNLP 2019.

Chinese Spider

中文版Spider
- Home https://taolusi.github.io/CSpider-explorer/
- GitHub https://github.com/taolusi/chisp
- Paper A Pilot Study for Chinese SQL Semantic Parsing, Qingkai Min, Yuefeng Shi and Yue Zhang, EMNLP-IJCNLP 2019.

TableQA
首届中文NL2SQL挑战赛数据特点:
1. 中文加强版WikiSql，金融等泛领域数据
2. 单表多列(两列)查询
3. 聚合操作('MAX', 'MIN', 'COUNT', 'SUM', 'AVG');
4. 条件连接('AND', 'OR');
5. 条件比较('=', '>', '<', '!=')
- Home https://tianchi.aliyun.com/competition/entrance/231716/information
- GitHub https://github.com/ZhuiyiTechnology/nl2sql_baseline
- Paper TableQA: a Large-Scale Chinese Text-to-SQL Dataset for Table-Aware SQL Generation[J]. Sun N, Yang X, Liu Y. 2020.
- Blog
- RANK
  1. 冠军方案_MSQL_Top1.pdf https://github.com/nudtnlp/tianchi-nl2sql-top1
  2. 亚军方案_top2.pdf
  3. 季军方案_top3.pdf 第三名代码实现 https://github.com/beader/tianchi_nl2sql
  4. 第四名方案_top4.pdf
  5. 第五名方案_top5.pdf
  6. 第六名代码实现 https://github.com/eguilg/nl2sql
- Paper
  1. Zhang X, Yin F, Ma G, et al. M-SQL: Multi-Task Representation Learning for Single-Table Text2sql Generation[J]. IEEE Access, 2020, 8: 43156-43167. 🔥

DuSQL
百度2020语言与智能技术竞赛：语义解析任务，大规模开放领域的复杂中文Text-to-SQL数据集数据特点：
1. 包含200个Database以及对应的2.3万对(question, SQL query)，其中18000对用于训练集，2000用于验证集，3000用于测试集。
2. 200个Database来自百科infobox、百科表格数据、以及互联网上存在的表格数据。每个Database包含若干张表格（2-11张，平均4.1张），人工构建了表之间的链接操作（即foreign key）。为了验证解析算法Database无关性及question无关性，在训练集合和测试集合的Database无交叉。
3. 包含复杂的多表join查询和嵌套查询，复杂度和spider类似。评价方法关注每一个组件的精准匹配度，并消除顺序影响。因此对val的准确度要求更高。具体的sql嵌套结构单元分解如下：
```
# 关键词和嵌套规则
select: [(agg_id, val_unit), (agg_id, val_unit), ...]
from: {'table_units': [table_unit, table_unit, ...], 'conds': condition}
where: condition
groupBy: [col_unit, ...]
orderBy: asc/desc, [(agg_id, val_unit), ...]
having: condition
limit: None/number
intersect: None/sql
except: None/sql 
union: None/sql
```
```
# 连接单元
val: number(float)/string(str)/sql(dict)
col_unit: (agg_id, col_id)
val_unit: (calc_op, col_unit1, col_unit2)
table_type: 'table_unit'/'sql'
table_unit: (table_type, table_id/sql)
cond_unit: (agg_id, cond_op, val_unit, val1, val2)
condition: [cond_unit1, 'and'/'or', cond_unit2, ...]
```
```
# op操作符
agg_id: (none, max, min, count, sum, avg)
calc_op: (none, -, +, \*, /)
cond_op: (not_in, between, =, >, <, >=, <=, !=, in, like)
```
- Home https://aistudio.baidu.com/aistudio/competition/detail/30?isFromCcf=true
- GitHub https://github.com/PaddlePaddle/Research/tree/master/NLP/DuSQL-Baseline
- Video 冠军分享 http://mbd.baidu.com/webpage?type=live&action=liveshow&source=h5pre&room_id=4008201814
- Paper
  - Wang L, Zhang A, Wu K, et al. DuSQL: A Large-Scale and Pragmatic Chinese Text-to-SQL Dataset[C]//Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2020: 6923-6935.
- Dataset https://github.com/luge-ai/luge-ai/blob/master/semantic-parsing/semantic-parsing.md
- Blog

CCKS2022：金融NL2SQL评测

现有NL2SQL数据和方法主要关注“封闭场景指定库/表”设定，这很难满足业务范围动态发展的需求。从领域特性来看，金融数据多为时间序列，包括日频行情、季频财报、年度GDP、不定期股票质押解质押等，这无疑会增大问题转SQL难度。

二、主要论文方法及代码实现（Papers&Code）

论文主要以WikiSQL和Spider为评测数据，相应排行榜详见任务主页。
下面主要整理具有代表性的方法，持续更新补充...
注: Exe_score 表示 | model | Dev accuracy | Test accuracy |，表示执行准确率(Execution accuracy)
Log_score 表示逻辑准确率(Logical accuracy)，且Spider中不包括值预测。

`1. WikiSQL:`

Weakly Supervised

采用弱监督方法，即不使用sql的逻辑形式作为监督信号。

Paper
- Li N, Keller B, Butler M, et al. SeqGenSQL--A Robust Sequence Generation Model for Structured Query Language[J]. 2020. 🔥
- Min S, Chen D, Hajishirzi H, et al. A discrete hard em approach for weakly supervised question answering[C]. EMNLP-IJCNLP 2019.
- Wang B, Titov I, Lapata M. Learning Semantic Parsers from Denotations with Latent Structured Alignments and Abstract Programs. 2019.
- Agarwal R, Liang C, Schuurmans D, et al. Learning to Generalize from Sparse and Underspecified Rewards. 2019.
- Liang C, Norouzi M, Berant J, et al. Memory augmented policy optimization for program synthesis and semantic parsing[C].NeurIPS, 2018: 9994-10006.
- Guo T, Gao H. Using Database Rule for Weak Supervised Text-to-SQL Generation[J]. 2019.
Code
- Hard-EM https://github.com/shmsw25/qa-hard-em 🔥
- LatentAlignment https://github.com/berlino/weaksp_em19
- MeRL / MAPO https://github.com/google-research/google-research/tree/master/meta_reward_learning
- Rule-SQL https://github.com/guotong1988/Rule-SQL
Exe_score

Hard-EM 84.4 83.9

LatentAlignment 79.4 79.3

MeRL 74.9 74.8

MAPO 72.2 72.1

Rule-SQL 61.1 61.0

Hard-EM	84.4	83.9
LatentAlignment	79.4	79.3
MeRL	74.9	74.8
MAPO	72.2	72.1
Rule-SQL	61.1	61.0

ExecutionGuided

Execution Guided (EG) 可以在解码阶段通过执行错误对生成sql的项进行修正,从而过滤了一些不符合实际的sql语句。主要分为三类执行错误：1）句法解析错误，即生成的sql语法错误。2）执行失败。常见的run-time error, 例如SUM( ) 和比较string类型的数据；3）假设执行结果不为空，则空查询的条件错误。例如条件值实际并不存在于预测的列中, 因此会去 Beam Search 实际包含条件值的列。

Paper
- Wang C, Huang P S, Polozov A, et al. Robust Text-to-SQL Generation with Execution-Guided Decoding[J]. 2018.
- Wang C, Brockschmidt M, Singh R. Pointing out SQL queries from text[J]. 2018.
- Dong L, Lapata M. Coarse-to-fine decoding for neural semantic parsing[J]. 2018.
- Huang P S, Wang C, Singh R, et al. Natural language to structured query generation via meta-learning[J]. 2018.
Code
- https://github.com/microsoft/PointerSQL
- https://github.com/donglixp/coarse2fine
Exe_score

Coarse2Fine + EG 84.0 83.8

Coarse2Fine 79.0 78.5

Pointer-SQL + EG 78.4 78.3

Pointer-SQL 72.5 71.9

Coarse2Fine + EG	84.0	83.8
Coarse2Fine	79.0	78.5
Pointer-SQL + EG	78.4	78.3
Pointer-SQL	72.5	71.9

SQLNet Framework

设计了一种满足SQL语法的框架, 在这样的语法框架内，只需要预测并填充相应的槽位。语法框架为：
SELECT $AGG $COLUMN
WHERE $COLUMN $OP $VALUE
(AND $COLUMN $OP $VALUE)*
在这基础上去完成不同的联合任务的分类预测：

select-column, 选择的列

select-aggregation，聚合操作类型

where-number， where条件语句的数量

where-column， where条件中的列

where-operator， where条件操作类型（'<','=','>'）

where-value， where条件值

Paper

Xu X, Liu C, Song D. SQLNet: Generating structured queries from natural language without reinforcement learning[J]. 2018.
Hwang W, Yim J, Park S, et al. A Comprehensive Exploration on WikiSQL with Table-Aware Word Contextualization[J]. 2019.
He P, Mao Y, Chakrabarti K, et al. X-SQL: reinforce schema representation with context[J]. 2019. 🔥
Tong Guo, Huilin Gao. Content Enhanced BERT-based Text-to-SQL Generation .2019.
Qin Lyu, Kaushik Chakrabarti, Shobhit Hathi, Souvik Kundu, Jianwen Zhang, Zheng Chen. Hybrid Ranking Network for Text-to-SQL. 2020 🔥

Code

Exe_score

RoBERTa-Large-HydraNet + EG	92.4	92.2
BERT-Large-HydraNet + EG	92.2	91.8
RoBERTa-Large-HydraNet	89.1	89.2
BERT-Large-HydraNet	88.9	88.6
BERT-XSQL-Attention + EG	92.3	91.8
(Tong) BERT-base-TableContent-used + EG	91.1	90.1
(Tong) BERT-base-TableContent-used	90.3	89.2
BERT-XSQL-Attention	89.5	88.7
BERT-SQLova-LSTM	87.2	86.2
BERT-SQLova-LSTM + EG	90.2	89.6
GloVe-SQLNet-BiLSTM	69.8	68.0

Schema aware Denoising (SeaD) 🔥🔥

在text-to-SQL任务中，由于架构设计的限制，seq2seq模型通常会导致局部最优。在本文中，作者提出了一种简单而有效的方法：采用基于transformer的seq2seq模型来加强文本到SQL生成。使用模式感知去噪（SeaD）对seq2seq模型进行训练：由两个去噪目标组成，训练模型从erosion和随机噪声中恢复输入或预测输出(自回归方式)，而不是对encoder施加约束或将任务重新格式化为槽位填充。这些去噪目标作为辅助任务，用于在seq2seq生成中更好地建模结构数据。此外，作者改进并提出了一种子句敏感执行引导（Execution Guided, EG）解码策略，以克服生成模型EG解码的局限性。

Paper
- [1] Xuan K , Wang Y , Wang Y , et al. SeaD: End-to-end Text-to-SQL Generation with Schema-aware Denoising [J]. 2021.
Exe_score

SeaD + EG 92.9 93.0

SeaD 90.2 90.1

SeaD + EG	92.9	93.0
SeaD	90.2	90.1

Schema Dependency Guided 🔥🔥

结合Question和Schema之间的依存关系来进行多任务学习。

Paper
- Hui B, Shi X, Geng R, et al. Improving Text-to-SQL with Schema Dependency Learning[J]. arXiv preprint arXiv:2103.04399, 2021.
Exe_score

SDSQL + EG 92.5 92.4

SDSQL 88.7 88.8

SDSQL + EG	92.5	92.4
SDSQL	88.7	88.8

BRIDGE^ 🔥

Paper
- Lin X V, Socher R, Xiong C. Bridging Textual and Tabular Data for Cross-Domain Text-to-SQL Semantic Parsing[C]//EMNLP: Findings. 2020: 4870-4888.
Code
- https://github.com/salesforce/TabularSemanticParsing
- https://github.com/WING-NUS/slsql
Exe_score

BRIDGE^ + EG 92.6 91.9

BRIDGE 91.7 91.1

BRIDGE^ + EG	92.6	91.9
BRIDGE	91.7	91.1

T5 SeqGenSQL 🔥🔥

利用T5预训练语言（文本生成）模型, 将问题直接转换为SQL语句。同时，探索了如何利用表格模式信息对问题进行扩充，生成新的(silver)训练数据集

Paper
- Li N , Keller B , Butler M , et al. SeqGenSQL -- A Robust Sequence Generation Model for Structured Query Language[J]. 2020.
- Youssef M, Abdelkader R, et al. SQL Generation from Natural Language: A Sequence-to-Sequence Model Powered by the Transformers Architecture and Association Rules[J]. 2021
Exe_score

SeqGenSQL + EG 90.8 90.5

SeqGenSQL(T5-base + 250K silver data) 90.6 90.3

T5-large&mT5-large + Association Rules * 91.2 91.0

SeqGenSQL + EG	90.8	90.5
SeqGenSQL(T5-base + 250K silver data)	90.6	90.3
T5-large&mT5-large + Association Rules *	91.2	91.0

Information Extraction Approach

信息抽取的方法: 采用统一的基于BERT的抽取模型来识别query提及的槽位类型，包括序列标注方法、关系抽取和基于文本匹配的链接方法。

Paper
- Ping An Life, AI Team. IE-SQL: Mention Extraction and Linking for SQL Query Generation 2020
Exe_score

BERT-IE-SQL + EG 92.6 92.5

BERT-IE-SQL 88.7 88.8

BERT-IE-SQL + EG	92.6	92.5
BERT-IE-SQL	88.7	88.8

MRC Approach 🔥

阅读理解的方法: 与传统槽位填充方法不同的是，该方法将NL2SQL转化为QA问题,通过统一的MRC框架来预测不同的槽位。

Paper
- Yan Z, Ma J, Zhang Y, et al. SQL Generation via Machine Reading Comprehension[C]//Proceedings of the 28th International Conference on Computational Linguistics. 2020: 350-356.
Code
- https://github.com/nl2sql/QA-SQL
Exe_score

BERT-MRC-SQL + STILTs training + AGG enhancement 87.8 87.4

BERT-MRC-SQL + STILTs training 86.2 86.0

BERT-MRC-SQL 85.9 85.9

BERT-MRC-SQL + STILTs training + AGG enhancement	87.8	87.4
BERT-MRC-SQL + STILTs training	86.2	86.0
BERT-MRC-SQL	85.9	85.9

Model Interactive

基于用户交互的语义解析，更偏向于落地实践。在生成sql后，通过自然语句生成来进一步要求用户进行意图澄清，从而对sql进行修正。

Blog
- Facebook提出全新交互式语义分析框架，自然语言生成SQL语句准确率提升10%
Paper
- Yao Z, Su Y, Sun H, et al. Model-based Interactive Semantic Parsing: A Unified Framework and A Text-to-SQL Case Study[C]. EMNLP-IJCNLP 2019.
Code
- https://github.com/sunlab-osu/MISP

`2. Spider:`

GNN Encoding Seq2Seq 🔥

Schema-GNN: 利用多表关联信息来建立一个表名、列名为节点，表内、表间关系为边的图。通过GNN方法计算每一个节点(table item)的隐藏状态。在seq2seq模型的encoding阶段，每个query word 向量对每个 table item隐藏向量进行attention计算，并将attention权重作为每个query word的图表示。在decoding阶段，结合语法规则，如果输出应为table item,则将输出向量与所有table item隐藏向量进行全连接打分，计算其关联程度。

LGESQL: 以往的建图方式存在问题：1）忽略了边在拓扑结构中丰富的语义信息 2）无法区分每个节点的局部和非局部的关系。本文方法(Line Graph Enhanced Text-toSQL)在不构建元路径的情况挖掘潜在的关系特征.借助Line Graph，消息可以有效的在连接节点之间以及拓扑有向边上进行传播。在图迭代过程中，局部和非局部关系被显著地集成。同时，还设计了图剪枝的辅助任务，来提高编码器的识别能力。

ShadowGNN: 在跨域结构下，传统的语义解析模型难以适应不可见的数据库模式。为了提高稀少且不可见模式的模型泛化能力，我们提出了一种新的架构ShadowGNN，它可以在抽象和语义级别处理schemas。具体地，通过忽略数据库中语义项的名称，抽象schemas利用图映射神经网络来获得question和schema的去符号化表示。在领域无关表示的基础上，利用关系感知转换器进一步提取question和schema之间的逻辑联系。最后，还应用了一个具有上下文无关语法的SQL解码器。

SADGA: Structure-Aware Dual Graph Aggregation Network, 设计了一种基于图结构的聚合方法来学习question图和schema图的映射关系。该聚合方法的特征来源于图的全局链接、局部链接以及双图聚合机制。

S²SQL: 以往的基于图的编码器，没有很好的建模question的句法结构。本文利用句法解析器来抽取question的信息，并将句法信息注入到question-schema图编码器中。同时还使用了解耦约束来引导不同的边关系嵌入，从而提升网络性能。

Paper

Code

Log_score

S²SQL + ELECTRA (DB content used)	76.4	72.1
SADGA + GAP (DB content used)	73.1	70.1
LGESQL + ELECTRA (DB content used)	75.1	72.0
LGESQL + BERT (DB content used	74.1	68.3
LGESQL + Glove (DB content used)	67.6	62.8
ShadowGNN + RoBERTa (DB content used)	72.3	66.1
ShadowGNN (DB content used)	-	64.8
GNN + Bertrand-DR	57.9	54.6
Global-GNN	52.7	47.4
GNN	40.7	39.4
GNN w/edge vectors	32.1	-

RATSQL related works 🔥🔥🔥

Paper

[1] Wang B, Shin R, Liu X, et al.RAT-SQL: Relation-Aware Schema Encoding and Linking for Text-to-SQL Parsers [C]. ACL 2020.
[2] Deng X, Awadallah A H, Meek C, et al. Structure-Grounded Pretraining for Text-to-SQL[C]. NAACL, 2021.
[3] Gan Y , Chen X , Xie J , et al. Natural SQL: Making SQL Easier to Infer from Natural Language Specifications[C]. EMNLP Findings. 2021.
[4] Yu T, Wu C S, Lin X V, et al. GraPPa: Grammar-Augmented Pre-Training for Table Semantic Parsing[C]. ICLR 2021.
[5] Shi P , Ng P , Wang Z , et al. Learning Contextual Representations for Semantic Parsing with Generation-Augmented Pre-Training[C]. AAAI 2021.
[6] Zhao L, Cao H, Zhao Y. GP: Context-free Grammar Pre-training for Text-to-SQL Parsers[J]. arXiv preprint arXiv:2101.09901, 2021.

Code

Exe_score

[3] RATSQL + GAP + NatSQL (DB content used)	73.3

Log_score

RAT-SQL + GraPPa + Adv (DB content used)	75.5	70.5
RATSQL++ + ELECTRA (DB content used)	75.7	70.3
[6] RATSQL + GraPPa + GP (DB content used)	72.8	69.8
[5] RATSQL + GAP (DB content used)	71.8	69.7
[4] RATSQL + GraPPa (DB content used)	73.4	69.6
[3] RATSQL + GAP + NatSQL (DB content used)	-	68.7
[2] RAT-SQL + STRUG (DB content used)	72.6	68.4
[1] RATSQL v3 + BERT (DB content used)	69.7	65.6
[1] RATSQL v2 + BERT (DB content used)	65.8	61.9
[1] RATSQL v2 (DB content used)	62.7	57.2
[1] RATSQL + BERT	60.8	55.7
[1] RATSQL	60.6	53.7

MSRA: IRNet related works 🔥🔥

Blog & Video
- 智能数据分析技术，解锁Excel“对话”新功能 Conversational Data Analysis
- Use Ideas in Excel to get Immediate answers with ONE Click
Paper
- Guo J, Zhan Z, Gao Y, et al. Towards Complex Text-to-SQL in Cross-Domain Database with Intermediate Representation[C]. ACL 2019.
- Dong Z, Sun S, Liu H, et al. Data-Anonymous Encoding for Text-to-SQL Generation[C] EMNLP-IJCNLP 2019.
- Liu H, Fang L, Liu Q, et al. Leveraging Adjective-Noun Phrasing Knowledge for Comparison Relation Prediction in Text-to-SQL[C]. EMNLP-IJCNLP 2019.
- Liu Q, Chen B, Lou J G, et al. FANDA: A Novel Approach to Perform Follow-up Query Analysis[C]. AAAI 2019.
- Liu Q, Chen B, Liu H, et al. A Split-and-Recombine Approach for Follow-up Query Analysis[C]. EMNLP-IJCNLP 2019.
Code
- https://github.com/microsoft/IRNet
- https://github.com/neeraj-bhat/IRNet/tree/dev
Log_score

IRNet++ + XLNet (DB content used) 65.5 60.1

IRNet++ + XLNet (DB content used) 65.5 60.1

IRNet-v2 + BERT 63.9 55.0

IRNet + BERT-Base 61.9 54.7

IRNet-v2 55.4 48.5

IRNet 53.2 46.7

IRNet++ + XLNet (DB content used)	65.5	60.1
IRNet++ + XLNet (DB content used)	65.5	60.1
IRNet-v2 + BERT	63.9	55.0
IRNet + BERT-Base	61.9	54.7
IRNet-v2	55.4	48.5
IRNet	53.2	46.7

MSRA DKI Group's works 🔥🔥

Paper & Code
- https://github.com/microsoft/ContextualSP
Log_score

ETA + BERT (DB content used) 70.8 65.3

PICARD

Paper
- Scholak T , Schucher N , Bahdanau D . PICARD: Parsing Incrementally for Constrained Auto-Regressive Decoding from Language Models[C]. EMNLP. 2021.
Code
- https://github.com/ElementAI/picard
Log_score

PICARD + T5-3B (DB content used) 75.5 71.9

Exe_score

PICARD + T5-3B (DB content used) - 75.1

DT-Fixup SQL-SP

Paper
- Xu P , Kumar D , Yang W , et al. Optimizing Deeper Transformers on Small Datasets[C]. ACL. 2021.
Code
- https://github.com/BorealisAI/DT-Fixup
Log_score

DT-Fixup SQL-SP + RoBERTa (DB content used) 75.0 70.9

RaSaP

Paper
- Hua Ng J , Wa Ng Y , Wa Ng Y , et al. Relation Aware Semi-autoregressive Semantic Parsing for NL2SQL[J]. 2021.
Log_score

RaSaP + ELECTRA (DB content used) 74.7 69.0

EditSQL

Paper
- Zhang R, Yu T, Er H Y, et al. Editing-Based SQL Query Generation for Cross-Domain Context-Dependent Questions[C]. EMNLP-IJCNLP 2019.
Code
- https://github.com/ryanzhumich/editsql
Log_score

EditSQL + BERT 57.6 53.4

EditSQL 36.4 32.9

EditSQL + BERT	57.6	53.4
EditSQL	36.4	32.9

RYANSQL

Paper
- Choi D H, Shin M C, Kim E G, et al. [RYANSQL: Recursively Applying Sketch-based Slot Fillings for Complex Text-to-SQL in Cross-Domain Databases](RYANSQL: Recursively Applying Sketch-based Slot Fillings for Complex Text-to-SQL in Cross-Domain Databases)[J]. 2020.
Log_score

RYANSQL v2 + BERT 70.6 60.6

RYANSQL + BERT 66.6 58.2

RYANSQL 43.3 -

RYANSQL v2 + BERT	70.6	60.6
RYANSQL + BERT	66.6	58.2
RYANSQL	43.3	-

SmBoP

与自上而下的自回归分析相比，半自回归自底向上解析器具有多种优势。首先，由于每个解码步骤中的子树都是并行生成的，因此理论上的运行时间是对数而不是线性复杂度。其次，自下而上的方法学习在每个步骤上学习语义子程序的表示，而不是语义上模糊的部分树。最后，SMBOP基于Transformer的层将子树相互关联起来，与传统的beam-search不同，以探索过的其他树木为条件为树进行评分。

Paper
- Rubin O, Berant J. SmBoP: Semi-autoregressive Bottom-up Semantic Parsing[C]. NAACL, 2021.
Code
https://github.com/OhadRubin/SmBop

Log_score

SmBoP + GraPPa (DB content used) 74.7 69.5

SmBoP + BART 66.0 60.5

Exe_score

SmBoP + GraPPa (DB content used) - 71.1

SmBoP + GraPPa (DB content used)	74.7	69.5
SmBoP + BART	66.0	60.5

SLSQL

Schema Linking is the crux for the current text-to-SQL task.

Paper
- Lei W, Wang W, Ma Z, et al. Re-examining the Role of Schema Linking in Text-to-SQL[C]. EMNLP 2020: 6943-6954.
Code
- https://github.com/salesforce/TabularSemanticParsing
- https://github.com/WING-NUS/slsql
Log_score

SLSQL + BERT + Data Annotation 60.8 55.7

BRIDGE

Paper
- Lin X V, Socher R, Xiong C. Bridging Textual and Tabular Data for Cross-Domain Text-to-SQL Semantic Parsing[C]//EMNLP: Findings. 2020: 4870-4888.
Code
- https://github.com/salesforce/TabularSemanticParsing
- https://github.com/WING-NUS/slsql
Log_score

BRIDGE(k = 2) + BERT (DB content used) 65.5 59.2

BRIDGE(k = 1) + BERT (DB content used) 65.3 -

Exe_score

BRIDGE v2 + BERT(ensemble) (DB content used) - 68.3

BRIDGE v2 + BERT (DB content used) - 64.3

BRIDGE(k = 2) + BERT (DB content used) - 59.9

BRIDGE(k = 2) + BERT (DB content used)	65.5	59.2
BRIDGE(k = 1) + BERT (DB content used)	65.3	-

BRIDGE v2 + BERT(ensemble) (DB content used)	-	68.3
BRIDGE v2 + BERT (DB content used)	-	64.3
BRIDGE(k = 2) + BERT (DB content used)	-	59.9

GAZP

GAZP combines a forward semantic parser with a backward utterance generator to synthesize data (e.g. utterances and SQL queries) in the new environment, then selects cycleconsistent examples to adapt the parser. Unlike data-augmentation, which typically synthesizes unverified examples in the training environment, GAZP synthesizes examples in the new environment whose inputoutput consistency are verified.

Paper
- Zhong V, Lewis M, Wang S I, et al. Grounded adaptation for zero-shot executable semantic parsing[C]. EMNLP-2020.
Exe_score

GAZP + BERT - 53.5

SQLNet Framework

Paper
- Leveraging Adjective-Noun Phrasing Knowledge for Comparison Relation Prediction in Text-to-SQL[C]. EMNLP 2019
- Yu T, Yasunaga M, Yang K, et al. Syntaxsqlnet: Syntax tree networks for complex and cross-domaintext-to-sql task[C]. EMNLP 2018.
- Dongjun Lee. Clause-Wise and Recursive Decoding for Complex and Cross-Domain Text-to-SQL Generation[C]. EMNLP 2019.
- Lin K, Bogin B, Neumann M, et al. Grammar-based Neural Text-to-SQL Generation. 2019.
Code
- https://github.com/taoyds/syntaxSQL
Score

GrammarSQL 34.8 33.8

SyntaxSQLNet + augment 24.8 27.2

RCSQL 28.5 24.3

SyntaxSQLNet 18.9 19.7

SQLNet 10.9 12.4

GrammarSQL	34.8	33.8
SyntaxSQLNet + augment	24.8	27.2
RCSQL	28.5	24.3
SyntaxSQLNet	18.9	19.7
SQLNet	10.9	12.4

`3. UnifiedSKG:` 🔥🔥

Blog

结构化知识的统一建模和多任务学习

Code

https://github.com/hkunlp/unifiedskg

Paper

Xie T , Wu C H , Shi P , et al. UnifiedSKG: Unifying and Multi-Tasking Structured Knowledge Grounding with Text-to-Text Language Models[J]. arXiv e-prints, 2022.

三、相关资源扩展 (extend resources)

1. Related Works

1.1 `Pre-training` 🔥🔥🔥

jointly learns representations of natural language utterances and table schemas by leveraging generation models to generate pre-train data.

Shi P , Ng P , Wang Z , et al. GAP: Learning Contextual Representations for Semantic Parsing with Generation-Augmented Pre-Training[C]. AAAI 2021.

A novel weakly supervised Structure-Grounded pretraining framework (STRUG) for text-to-SQL that can effectively learn to capture text-table alignment based on a parallel text-table corpus.

[ ]Deng X, Awadallah A H, Meek C, et al. Structure-Grounded Pretraining for Text-to-SQL[C]. NAACL, 2021.

A new method for Text-to-SQL parsing, Grammar Pre-training (GP),is proposed to decode deep relations between question and database.

[ ]Zhao L, Cao H, Zhao Y. GP: Context-free Grammar Pre-training for Text-to-SQL Parsers[J]. arXiv preprint arXiv:2101.09901, 2021.

An effective pre-training approach for table semantic parsing that learns a compositional inductive bias in the joint representations of textual and tabular data.

Yu T, Wu C S, Lin X V, et al. GraPPa: Grammar-Augmented Pre-Training for Table Semantic Parsing[C]. ICLR 2021.

this paper designs two novel pre-training objectives to impose the desired inductive bias into the learned representations for table pre-training.

Qin B , Wang L , Hui B , et al. SDCUP: Schema Dependency-Enhanced Curriculum Pre-Training for Table Semantic Parsing[J]. 2021.

table pre-training can be realized by learning a neural SQL executor over a synthetic corpus, which is obtained by automatically synthesizing executable SQL queries.

Liu Q , Chen B , Guo J , et al. [TAPEX: Table Pre-training via Learning a Neural SQL Executor(https://arxiv.org/abs/2107.07653)[J]. 2021.

A pretrained language model that jointly learns representations for NL sentences and (semi-)structured tables.

Pengcheng Yin, Graham Neubig, et al. TaBERT: Pretraining for Joint Understanding of Textual and Tabular Data[C]. ACL 2020.

this paper designs two novel pre-training objectives to impose the desired inductive bias into the learned representations for table pre-training.

Bowen Q, LiHan W, et al.Linking-Enhanced Pre-Training for Table Semantic Parsing. 2021

Adapting a semantic parser trained on a single language.

Tom Sherborne, Yumo Xu, Mirella Lapata. Bootstrapping a Crosslingual Semantic Parser.2020.

1.2 `Systems`

Zeng J, Lin X V, Xiong C, et al. Photon: A Robust Cross-Domain Text-to-SQL System[J]. 2020.
Brunner U, Stockinger K. ValueNet: A Neural Text-to-SQL Architecture Incorporating Values[J]. 2020.
Elgohary A, Hosseini S, Awadallah A H. Speak to your Parser: Interactive Text-to-SQL with Natural Language Feedback[J]. 2020.

1.3 `Surveys`

Jovan, Martina, Frosina. Recent Advances in SQL Query Generation: A Survey//Part of the 17th International Conference on Informatics and Information Technologies. Received best paper award. 2020.

1.4 `Blogs`

1.5 `Other Papers`

Dhamdhere K, McCurley K S, Nahmias R, et al. Analyza: Exploring data with conversation[C]//Proceedings of the 22nd International Conference on Intelligent User Interfaces. ACM, 2017.
Chen S, San A, Liu X, et al. A Tale of Two Linkings: Dynamically Gating between Schema Linking and Structural Linking for Text-to-SQL Parsing[C]. COLING 2020.
Dou L , Gao Y , Pan M , et al. UniSAr: A Unified Structure-Aware Autoregressive Language Model for Text-to-SQL[J]. 2022.

1.6 `Tools`

Test suite for text2sql code: https://github.com/taoyds/test-suite-sql-eval
Test suite for text2sql paper: Zhong R, Yu T, Klein D. Semantic Evaluation for Text-to-SQL with Distilled Test Suites[C]. EMNLP2020.
SQL Parser https://github.com/mozilla/moz-sql-parser

2. SQL2Seq

Paper

Xu K, Wu L, Wang Z, et al. Graph2seq: Graph to sequence learning with attention-based neural networks.2018.
Xu K, Wu L, Wang Z, et al. SQL-to-text generation with graph-to-sequence model[C]. EMNLP 2018.

Code

3. 图神经网络（GNN)

Paper

Code

baeseulki / nl2lf Goto Github PK

nl2lf's Introduction

NL2LF

NL2SQL & Text2SQL

一、主要评测数据集(DataSet)

二、主要论文方法及代码实现（Papers&Code）

1. WikiSQL:

2. Spider:

3. UnifiedSKG: 🔥🔥

三、相关资源扩展 (extend resources)

1. Related Works

1.1 Pre-training 🔥🔥🔥

1.2 Systems

1.3 Surveys

1.4 Blogs

1.5 Other Papers

1.6 Tools

2. SQL2Seq

3. 图神经网络（GNN)

nl2lf's People

Contributors

Stargazers

Watchers

Forkers

Recommend Projects

Recommend Topics

Recommend Org

`1. WikiSQL:`

`2. Spider:`

`3. UnifiedSKG:` 🔥🔥

1.1 `Pre-training` 🔥🔥🔥

1.2 `Systems`

1.3 `Surveys`

1.4 `Blogs`

1.5 `Other Papers`

1.6 `Tools`