Giter Club home page Giter Club logo

nl2lf's Introduction

NL2LF

(持续更新中...)
recently update log:

0. UnifiedSKG, UniSAr
1. GNN works: LGESQL, ShadowGNN, SADGA, S²SQL (SOTA)
2. RatSQL + Pretraining (STRUG, GraPPa, GAP, GP) + NatSQL
3. PICARD, DT-Fixup, RaSaP
4. wikisql: SeaD, SeqGenSQL, BRIDGE^

The Resources for Natural Language to Logical Form Research, Focus on NL2SQL first.
"自然语言转逻辑形式"研究资料收集: 本阶段主要以NL2SQL的研究为主, 主要包括评测公开数据集、相关论文和部分代码实现、相关博客或公众号文章。

NL2SQL
一、主要评测数据集 dataset
二、主要论文方法及代码实现 papers&code
    1. WikiSQL
    2. Spider
    3. UnifiedSKG
三、相关资源扩展 extend-resources
    1. Related Works
        1.1. Pre-training
        1.2. Systems
        1.3. Surveys
        1.4. Blogs
        1.5. Other Papers
        1.6. Tools
    2. SQL2Seq
    3. 图神经网络 GNN

NL2SQL & Text2SQL

一、主要评测数据集(DataSet)











  • CCKS2022:金融NL2SQL评测

    现有NL2SQL数据和方法主要关注“封闭场景指定库/表”设定,这很难满足业务范围动态发展的需求。从领域特性来看,金融数据多为时间序列,包括日频行情、季频财报、年度GDP、不定期股票质押解质押等,这无疑会增大问题转SQL难度。

二、主要论文方法及代码实现(Papers&Code)

论文主要以WikiSQL和Spider为评测数据,相应排行榜详见任务主页。
下面主要整理具有代表性的方法,持续更新补充...
注: Exe_score 表示 | model | Dev accuracy | Test accuracy |,表示执行准确率(Execution accuracy)
Log_score 表示逻辑准确率(Logical accuracy),且Spider中不包括值预测。

1. WikiSQL:







  • Schema aware Denoising (SeaD) 🔥🔥

    在text-to-SQL任务中,由于架构设计的限制,seq2seq模型通常会导致局部最优。在本文中,作者提出了一种简单而有效的方法:采用基于transformer的seq2seq模型来加强文本到SQL生成。使用模式感知去噪(SeaD)对seq2seq模型进行训练:由两个去噪目标组成,训练模型从erosion和随机噪声中恢复输入或预测输出(自回归方式),而不是对encoder施加约束或将任务重新格式化为槽位填充。这些去噪目标作为辅助任务,用于在seq2seq生成中更好地建模结构数据。此外,作者改进并提出了一种子句敏感执行引导(Execution Guided, EG)解码策略,以克服生成模型EG解码的局限性。

    Paper

    Exe_score

    SeaD + EG 92.9 93.0
    SeaD 90.2 90.1


  • Schema Dependency Guided 🔥🔥

    结合Question和Schema之间的依存关系来进行多任务学习。

    Paper

    Exe_score

    SDSQL + EG 92.5 92.4
    SDSQL 88.7 88.8






  • Information Extraction Approach

    信息抽取的方法: 采用统一的基于BERT的抽取模型来识别query提及的槽位类型,包括序列标注方法、关系抽取和基于文本匹配的链接方法。

    Paper

    Exe_score

    BERT-IE-SQL + EG 92.6 92.5
    BERT-IE-SQL 88.7 88.8


  • MRC Approach 🔥

    阅读理解的方法: 与传统槽位填充方法不同的是,该方法将NL2SQL转化为QA问题,通过统一的MRC框架来预测不同的槽位。

    Paper

    Code

    Exe_score

    BERT-MRC-SQL + STILTs training + AGG enhancement 87.8 87.4
    BERT-MRC-SQL + STILTs training 86.2 86.0
    BERT-MRC-SQL 85.9 85.9




2. Spider:



















  • SmBoP

    与自上而下的自回归分析相比,半自回归自底向上解析器具有多种优势。首先,由于每个解码步骤中的子树都是并行生成的,因此理论上的运行时间是对数而不是线性复杂度。其次,自下而上的方法学习在每个步骤上学习语义子程序的表示,而不是语义上模糊的部分树。最后,SMBOP基于Transformer的层将子树相互关联起来,与传统的beam-search不同,以探索过的其他树木为条件为树进行评分。

    Paper

    Code
    https://github.com/OhadRubin/SmBop

    Log_score

    SmBoP + GraPPa (DB content used) 74.7 69.5
    SmBoP + BART 66.0 60.5

    Exe_score

    SmBoP + GraPPa (DB content used) - 71.1






  • GAZP

    GAZP combines a forward semantic parser with a backward utterance generator to synthesize data (e.g. utterances and SQL queries) in the new environment, then selects cycleconsistent examples to adapt the parser. Unlike data-augmentation, which typically synthesizes unverified examples in the training environment, GAZP synthesizes examples in the new environment whose inputoutput consistency are verified.

    Paper

    Exe_score

    GAZP + BERT - 53.5




3. UnifiedSKG: 🔥🔥

Blog

Code

Paper

三、相关资源扩展 (extend resources)

1. Related Works
1.1 Pre-training 🔥🔥🔥

jointly learns representations of natural language utterances and table schemas by leveraging generation models to generate pre-train data.

A novel weakly supervised Structure-Grounded pretraining framework (STRUG) for text-to-SQL that can effectively learn to capture text-table alignment based on a parallel text-table corpus.

A new method for Text-to-SQL parsing, Grammar Pre-training (GP),is proposed to decode deep relations between question and database.

An effective pre-training approach for table semantic parsing that learns a compositional inductive bias in the joint representations of textual and tabular data.

this paper designs two novel pre-training objectives to impose the desired inductive bias into the learned representations for table pre-training.

table pre-training can be realized by learning a neural SQL executor over a synthetic corpus, which is obtained by automatically synthesizing executable SQL queries.

A pretrained language model that jointly learns representations for NL sentences and (semi-)structured tables.

this paper designs two novel pre-training objectives to impose the desired inductive bias into the learned representations for table pre-training.

Adapting a semantic parser trained on a single language.

1.2 Systems
1.3 Surveys
1.4 Blogs
1.5 Other Papers
1.6 Tools
2. SQL2Seq

Paper

Code

3. 图神经网络(GNN)

Paper

Code

nl2lf's People

Contributors

baeseulki avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.