Giter Club home page Giter Club logo

mac-sql's Introduction

📖Introduction

This is the official repository for the paper "MAC-SQL: A Multi-Agent Collaborative Framework for Text-to-SQL".

In this paper, we propose a multi-agent collaborative Text-to-SQL framework MAC-SQL, which comprises three agents: the Selector, the Decomposer, and the Refiner.

🔥 Updates

  • [2024-04-23] We have updated the sql-llama-instruct-v0.5.jsonl and training scripts in training_scripts dir of this project. Please check it out.
  • [2024-04-22] We have updated the SQL-Llama-v0.5 model and data.zip (update dev_gold_schema.json in bird and spider) The download links of the updated data are available on Baidu Disk and Google Drive.
  • [2024-02-18] We have updated the paper, with updates mainly focusing on experiments and framework details, check it out! link.
  • [2023-12-26] We have updated the paper, with updates mainly focusing on the title, abstract, introduction, some details, and appendix. In addition, we give some bad case examples on bad_cases folder, check it out!
  • [2023-12-19] We released our first version paper, code. Check it out!

⚡Environment

  1. Config your local environment.
conda create -n macsql python=3.9 -y
conda activate macsql
pip install -r requirements.txt
python -c "import nltk; nltk.download('punkt')"

Note: we use openai==0.28.1, which use openai.ChatCompletion.create to call api.

  1. Edit openai config at core/api_config.py, and set related environment variables of Azure OpenAI API.

Currently, we use gpt-4-1106-preview (128k version) by default, which is 2.5 times less expensive than the gpt-4 (8k) on average.

export OPENAI_API_BASE="YOUR_OPENAI_API_BASE"
export OPENAI_API_KEY="YOUR_OPENAI_API_KEY"

🔧 Data Preparation

In order to prepare the data more quickly, I have packaged the files including the databases of the BIRD dataset and the Spider dataset into data.zip and uploaded them. All files were downloaded on December 19, 2023, ensuring they are the latest version at that moment. The download links are available on Baidu Disk and Google Drive(update on 2024-04-22).

After downloading the data.zip file, you should delete the existing data folder in the project directory and replace it with the unzipped data folder from data.zip.

🚀 Run

The run script will first run 5 examples in Spider to check environment. You should open code comments for different usage.

  • run.sh for Linux/Mac OS
  • run.bat for Windows OS

For SQL execution demo, you can use app_bird.py or app_spider.py to get the execution result of your SQL query.

cd ./scripts
python app_bird.py
python app_spider.py

If occur error /bin/bash^M: bad interpreter in Linux, use sed -i -e 's/\r$//' run.sh to solve it.

📝Evaluation Dataset

We evaluate our method on both BIRD dataset and Spider dataset.

EX: Execution Accuracy(%)

VES: Valid Efficiency Score(%)

Refer to our paper for the details.

🫡Run SQL-Llama

Download the SQL-Llama(current v0.5 version) and follow the SQL-Llama-deployment.md to deploy.

Uncomment the MODEL_NAME = 'CodeLlama-7b-hf' in core/api_config.py to set the global model and comment other MODEL_NAME = xxx lines.

Uncomment the export OPENAI_API_BASE='http://0.0.0.0:8000/v1' in run.sh to set the local model api base.

Then, run run.sh to start your local inference.

🌟 Project Structure

├─data # store datasets and databases
|  ├─spider
|  ├─bird
├─core
|  ├─agents.py       # define three agents class
|  ├─api_config.py   # OpenAI API ENV config
|  ├─chat_manager.py # manage the communication between agents
|  ├─const.py        # prompt templates and CONST values
|  ├─llm.py          # api call function and log print
|  ├─utils.py        # utils function
├─scripts            # sqlite execution flask demo
|  ├─app_bird.py
|  ├─app_spider.py
|  ├─templates
├─evaluation # evaluation scripts
|  ├─evaluation_bird_ex.py
|  ├─evaluation_bird_ves.py
|  ├─evaluation_spider.py
├─bad_cases
|  ├─badcase_BIRD(dev)_examples.xlsx
|  └badcase_Spider(dev)_examples.xlsx
├─evaluation_bird_ex_ves.sh # bird evaluation script
├─README.md
├─requirements.txt
├─run.py # main run script
├─run.sh # generation and evaluation script

💬Citation

If you find our work is helpful, please cite as:

@misc{wang2024macsql,
      title={MAC-SQL: A Multi-Agent Collaborative Framework for Text-to-SQL}, 
      author={Bing Wang and Changyu Ren and Jian Yang and Xinnian Liang and Jiaqi Bai and Linzheng Chai and Zhao Yan and Qian-Wen Zhang and Di Yin and Xing Sun and Zhoujun Li},
      year={2024},
      eprint={2312.11242},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

👍Contributing

We welcome contributions and suggestions!

mac-sql's People

Contributors

wbbeyourself avatar

Stargazers

 avatar LinJipeng avatar Harry Chih avatar  avatar Pranav Kumar avatar Chao Liang avatar Fadhil avatar João Vitor avatar MayDown avatar Abdelrahman Alkhodary avatar Yuan Chai avatar Sarath M avatar weihuan xia avatar Nazih Ouchta avatar Ayoub ADIL avatar  avatar Keval Kamani avatar Daniel  avatar lipengming avatar zhuyl avatar ZhouEr avatar Rafael Padilla avatar Murmansk avatar  avatar yanggang avatar Chen Xinran avatar lc avatar  avatar William Horton avatar Rocke Dong avatar Dazhi (Vincent) Peng avatar Ned Gao avatar Yu"Cattleya"Guo avatar Li Xiang avatar S. CHEN avatar Andrew Zhao avatar Parzon avatar  avatar EthanXie avatar Cat Yung avatar  avatar  avatar  avatar Nguyễn Việt Hoàng avatar ChavesLiu avatar  avatar  avatar Williams Ziyu Jiang avatar  avatar Li Zelin avatar Shukai Liu avatar  avatar  avatar  avatar xiaotupansy avatar rzhao avatar Evenchow avatar Longfei Yun avatar  avatar Alessandro Gambin da Silva avatar Sayantan Das avatar Yue Gong avatar santosh sawant avatar ShaneDean avatar Arjun Viswanathan avatar Hojae Han avatar  avatar Junhyeok Choi avatar Sarmad Afzal avatar  avatar Lei Li avatar Woosog Benjamin Chay avatar  avatar  avatar  avatar YISH avatar  avatar  avatar  avatar  avatar Dan Todor avatar  avatar Hoàng Thành Đạt avatar Yufeng Zhang avatar  avatar  avatar Sean Reed avatar Xiaolei Chen avatar  avatar Dragutin Oreški avatar Bellinda MESBAH avatar Xuanlei Zhao avatar Antony Prince avatar Bell Eapen avatar Bonan Ruan avatar Jiayi Shi avatar Nontapat Kaewamporn avatar min.wu avatar Frank Lee avatar 涑尼 avatar

Watchers

Bashar Mahasen avatar  avatar  avatar  avatar Ali Gilbert avatar

mac-sql's Issues

关于微调数据的一些疑问

作者,你好,项目中代码通过三个agent进行推理的代码,请问论文最后在训练sql-llama时,微调数据在三个aegnt的处理时,其中selector中的字段和表名可以通过标准的sql语句推断出来,该步骤微调数据的构造没问题,第二个agent——decomposer这一步对于现有的业务数据,我没有分解的数据,论文里面你训练时候,微调数据是怎么来的

Using MAC-SQL for a specific databse

Hello everybody,

I am working with a group of students in a text-to-SQL tool for a specific database, and given your amazing work and results we decided to try to use your framework. Our idea is to generate a question-query dataset from the single database (which contains about a 1000 tables) to use it with your framework. In our case, we only have available (for the moment) GPT-3.5 Turbo but, since the queries to make are in principle very simple (not even with JOINS), we believe it would be good enough for us. Also, we are working on creating table embeddings (using the table descriptions in natural language) as a selection method, so the selector agent could be omitted in case this accelerates the process.

We would appreciate some basic guidelines and/or corrections/advice on our approach.

Thank you for your great job!

Decomposition on Spider

Thank you so much for your great work.
I notice that the decompose_template_spider is just one-shot without any form of decomposition like decompose_template_bird. Is there any reason for you to do that? Probably spider is way more simpler than bird?
Thank you so much for your help.

Clarification on EX and VES Scores without Oracle Schema in Table 1

In Table 1, the EX and VES scores are provided with the Oracle Schema, indicating that the Selector is not activated during the test.

However, I am interested in understanding how these scores would change if the Selector were included in the test. Could you provide information on the EX and VES scores without the Oracle Schema when the Selector is active? This would help in comparing the performance with and without the Selector.

image

Code flow

Hello, excellent researcher.

After reading your code, I want to know more about your work flow, but I have encountered difficulties when running your code.

Could you briefly introduce your overall code logic to me? I know this is a lot of trouble for you, and I'm very sorry.

Selector few-shot prompts

In your paper you use 2-shots for some experiments, e.g. in Table 4, but your code and the paper only has 1 shot for the selector. Did you use exactly 1 shot for all experiments with the selector?

Help wanted! I can't reproduce your results on Spider dev as shown in the paper

I have followed the code in your script, but the results I get are not quite right. I wonder if there are other parameters? Is there any way to improve this? Can you help me solve this problem? Thank you.

=====================   EXECUTION ACCURACY     =====================
execution            0.911                0.814                0.759                0.651                0.802               

====================== EXACT MATCHING ACCURACY =====================
exact match          0.069                0.007                0.000                0.000                0.019               

---------------------PARTIAL MATCHING ACCURACY----------------------
select               1.000                1.000                0.000                0.000                0.952               
select(no AGG)       1.000                1.000                0.000                0.000                0.952               
where                0.000                1.000                0.000                1.000                1.000               
where(no OP)         0.000                1.000                0.000                1.000                1.000               
group(no Having)     0.000                0.000                0.000                0.000                0.000               
group                0.000                0.000                0.000                0.000                0.000               
order                0.000                0.000                0.000                0.000                0.000               
and/or               1.000                0.899                0.897                0.880                0.920               
IUEN                 0.000                0.000                0.000                0.000                0.000               
keywords             0.000                1.000                0.000                1.000                1.000               
---------------------- PARTIAL MATCHING RECALL ----------------------
select               0.069                0.007                0.000                0.000                0.019               
select(no AGG)       0.069                0.007                0.000                0.000                0.019               
where                0.000                0.016                0.000                0.011                0.008               
where(no OP)         0.000                0.016                0.000                0.011                0.008               
group(no Having)     0.000                0.000                0.000                0.000                0.000               
group                0.000                0.000                0.000                0.000                0.000               
order                0.000                0.000                0.000                0.000                0.000               
and/or               1.000                1.000                1.000                1.000                1.000               
IUEN                 0.000                0.000                0.000                0.000                0.000               
keywords             0.000                0.008                0.000                0.006                0.005               
---------------------- PARTIAL MATCHING F1 --------------------------
select               0.128                0.013                1.000                1.000                0.038               
select(no AGG)       0.128                0.013                1.000                1.000                0.038               
where                1.000                0.032                1.000                0.021                0.017               
where(no OP)         1.000                0.032                1.000                0.021                0.017               
group(no Having)     1.000                1.000                1.000                1.000                1.000               
group                1.000                1.000                1.000                1.000                1.000               
order                1.000                1.000                1.000                1.000                1.000               
and/or               1.000                0.947                0.945                0.936                0.958               
IUEN                 1.000                1.000                1.000                1.000                1.000               
keywords             1.000                0.016                1.000                0.012                0.009

result in spider-realistic

I was wondering if you could kindly provide me with the results of MAC-SQL in Spider-Realistic? I would greatly appreciate it. Thank you very much!

Improving Dataset Accessibility on Google Drive

Hello,

I appreciate the work on this project and recently requested access to the dataset on Google Drive. To facilitate smoother and faster access, could we consider simplifying the current request-based access method?

Understanding the balance between ease of access and data security, I'd be interested in hearing your thoughts on this suggestion.

Thank you for considering this enhancement.

Best regards,
Mahamadi

Cannot Reproduce SQL-Llama Result

Hello thank you authors for open sourcing the code.

I have some trouble reproducing the results of SQL-Llama(7B) in the paper.

I followed the instructions of setting up SQL-Llama.
Here is the result of MAC-SQL + SQL-Llama:

Evaluate BIRD EX begin!

                     simple               moderate             challenging          total               
count                925                  465                  144                  1534                
======================================    ACCURACY    =====================================
accuracy             43.78                27.53                23.61                36.96               
===========================================================================================
Finished evaluation
Evaluate EX done!

Here is the result of MAC-SQL + SQL-Llama + Oracle Schema

Evaluate BIRD EX begin!

                     simple               moderate             challenging          total               
count                925                  465                  144                  1534                
======================================    ACCURACY    =====================================
accuracy             50.92                33.98                25.00                43.35               
===========================================================================================
Finished evaluation
Evaluate EX done!

The result 36.96/43.35 is different from 43.94/51.43 as reported in the paper.

This is the command I run the gold schema condition:

export OPENAI_API_BASE='http://0.0.0.0:8000/v1'

...
python ./run.py --dataset_name="bird" \
   --dataset_mode="dev" \
   --input_file="./data/bird/dev.json" \
   --db_path="./data/bird/dev_databases/" \
   --tables_json_path "./data/bird/dev_tables.json" \
   --output_file=$output_file \
   --log_file=$log_file \
   --use_gold_schema

In api_config.py I uncommented MODEL_NAME = 'CodeLlama-7b-hf'.

Did I miss anything?

Thank you very much!

gold error

Hello, when I was reading your paper, I saw that you mentioned "gold error", but I don't quite understand its meaning.
Could you please explain it to me? You'd better give me an example to help me understand.

gold error

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.