Giter Club home page Giter Club logo

Comments (6)

lihaoyang-ruc avatar lihaoyang-ruc commented on July 30, 2024

Evaluating... is a placeholder which means that this checkpoint (i.e., checkpoint-61642) is under evaluation, the other evaluation processes need to skip this checkpoint. This is designed for multi-process parallel evaluation. For more details, please see evaluate_text2sql_ckpts.py lines 54-82.

However, when the evaluation process is terminated unexpectedly, this file still exists. If we restart the evaluation process, the new process will naturally skip this checkpoint.

Solution:
Delete this file (i.e., checkpoint-61642.txt), and run evaluate_on_spider_realistic.sh again.

Please let me know if this problem persists. :)

from resdsql.

oemd001 avatar oemd001 commented on July 30, 2024

Hello! Thank you so much for your fast reply!

I attempted to re-run this but still, for some reason, I still get the same results (as the image I posted above).

Also, something very interesting that I noticed (when I was running the code on my end) was that when I attempted to print variables "em" and "exec", nothing was printed to the console. I'm not sure if this error is exclusive to my machine or if you were able to encounter such an issue as well.

Again, thank you so much for your fast reply!

from resdsql.

lihaoyang-ruc avatar lihaoyang-ruc commented on July 30, 2024

Unfortunately, I have not encountered this problem.

As far as I know, it will take a long time to evaluate each checkpoint using the 3B scale model. If your script finished quickly and without any exceptions, there must be some problems.

I recommend checking line by line, and if you find any bugs, please feel free to contact me.

from resdsql.

lihaoyang-ruc avatar lihaoyang-ruc commented on July 30, 2024

Alternatively, we provide the inference script in scripts/inference, but this script can only evaluate one checkpoint at a time.

Run sh ./scripts/inference/infer_text2natsql.sh 3b spider-realistic can reproduce our results on spider-realistic.

from resdsql.

lihaoyang-ruc avatar lihaoyang-ruc commented on July 30, 2024

I ran sh ./scripts/evaluate_robustness/evaluate_on_spider_realistic.sh and obatined the following outputs:

ckpt_names: ['checkpoint-61642', 'checkpoint-78302']
Start evaluating ckpt: checkpoint-61642
Namespace(batch_size=1, db_path='./database', dev_filepath='./data/preprocessed_data/resdsql_spider_realistic_natsql.json', device='0', eval_results_path='./eval_results/text2natsql-t5-3b-spider-realistic', mode='eval', num_beams=8, num_return_sequences=8, original_dev_filepath='./data/spider-realistic/spider-realistic.json', output='predicted_sql.txt', save_path='./models/text2natsql-t5-3b/checkpoint-61642', seed=42, tables_for_natsql='./data/preprocessed_data/spider_realistic_tables_for_natsql.json', target_type='natsql')
 19%|███████████████████▉                                                                                        | 94/508 [03:37<15:49,  2.29s/it]select sum(*) from cars_data where  cars_data.year = 1980
wrong number of arguments to function sum()
 21%|██████████████████████▌                                                                                    | 107/508 [04:09<17:03,  2.55s/it]Before fix: select cars_data.model from cars_data where cars_data.cylinders = 4 order by cars_data.horsepower desc limit 1
After fix: select car_names.model from cars_data where cars_data.cylinders = 4 order by cars_data.horsepower desc limit 1
---------------
 28%|█████████████████████████████▉                                                                             | 142/508 [05:24<13:09,  2.16s/it]Before fix: select count ( flights.* ) from flights where airports.airport = 'ASY' and airlines.airline = 'United Airlines'
After fix: select count ( flights.* ) from flights where airports.airportcode = 'ASY' and airlines.airline = 'United Airlines'
---------------
 42%|████████████████████████████████████████████▊                                                              | 213/508 [07:53<11:58,  2.44s/it]Before fix: select visitor.id , visitor.name , visitor.level_of_membership from visitor group by visit.id order by sum ( visit.total_spent ) desc limit 1
After fix: select visitor.id , visitor.name , visitor.level_of_membership from visitor group by visitor.id order by sum ( visit.total_spent ) desc limit 1
---------------
 43%|█████████████████████████████████████████████▋                                                             | 217/508 [08:04<12:16,  2.53s/it]Before fix: select sum ( visit.total_spent ) from visit where visit.level_of_membership = 1
After fix: select sum ( visit.total_spent ) from visit where visitor.level_of_membership = 1
---------------
 52%|███████████████████████████████████████████████████████▌                                                   | 264/508 [10:09<13:01,  3.20s/it]Before fix: select students.first_name , students.middle_name , students.last_name from student_enrolment order by student_enrolment.date_first_registered asc limit 1
After fix: select students.first_name , students.middle_name , students.last_name from student_enrolment order by students.date_first_registered asc limit 1
---------------
 59%|██████████████████████████████████████████████████████████████▊                                            | 298/508 [11:33<08:25,  2.41s/it]Before fix: select tv_series.series_name from tv_series where tv_series.episode = 'A Love of a Lifetime'
After fix: select tv_channel.series_name from tv_series where tv_series.episode = 'A Love of a Lifetime'
---------------
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 508/508 [19:42<00:00,  2.33s/it]
Text-to-SQL inference spends 1214.2015063762665s.
exact_match score: 0.7736220472440944
exec score: 0.8188976377952756
Start evaluating ckpt: checkpoint-78302
Namespace(batch_size=1, db_path='./database', dev_filepath='./data/preprocessed_data/resdsql_spider_realistic_natsql.json', device='0', eval_results_path='./eval_results/text2natsql-t5-3b-spider-realistic', mode='eval', num_beams=8, num_return_sequences=8, original_dev_filepath='./data/spider-realistic/spider-realistic.json', output='predicted_sql.txt', save_path='./models/text2natsql-t5-3b/checkpoint-78302', seed=42, tables_for_natsql='./data/preprocessed_data/spider_realistic_tables_for_natsql.json', target_type='natsql')
  7%|███████▋                                                                                                    | 36/508 [01:21<16:36,  2.11s/it]Before fix: select pets.weight from pets where pets.pet_type = 'dog' order by pets.pet_age asc limit 1
After fix: select pets.weight from pets where pets.pet_age = 'dog' order by pets.pet_age asc limit 1
---------------
 11%|███████████▋                                                                                                | 55/508 [02:04<13:48,  1.83s/it]Before fix: select student.lname from student where @.@ join has_pet.* and pets.pet_age = 3 and has_pet.pettype = 'cat'
After fix: select student.lname from student where @.@ join has_pet.* and pets.pet_age = 3 and pets.pettype = 'cat'
---------------
 21%|██████████████████████▌                                                                                    | 107/508 [04:06<16:56,  2.53s/it]Before fix: select cars_data.model from cars_data where cars_data.cylinders = 4 order by cars_data.horsepower desc limit 1
After fix: select car_names.model from cars_data where cars_data.cylinders = 4 order by cars_data.horsepower desc limit 1
---------------
 31%|█████████████████████████████████▋                                                                         | 160/508 [05:53<10:06,  1.74s/it]Before fix: select flights.flightno from flights where airports.destairport = 'APG'
After fix: select flights.flightno from flights where flights.destairport = 'APG'
---------------
 42%|████████████████████████████████████████████▊                                                              | 213/508 [07:47<11:42,  2.38s/it]Before fix: select visitor.id , visitor.name , visitor.level_of_membership from visitor group by visit.id order by sum ( visit.total_spent ) desc limit 1
After fix: select visitor.id , visitor.name , visitor.level_of_membership from visitor group by visitor.id order by sum ( visit.total_spent ) desc limit 1
---------------
 59%|██████████████████████████████████████████████████████████████▊                                            | 298/508 [11:14<08:25,  2.41s/it]Before fix: select tv_series.series_name from tv_series where tv_series.episode like '%A Love of a Lifetime%'
After fix: select tv_channel.series_name from tv_series where tv_series.episode like '%A Love of a Lifetime%'
---------------
 75%|████████████████████████████████████████████████████████████████████████████████▍                          | 382/508 [14:25<03:45,  1.79s/it]Before fix: select countrylanguage.country from countrylanguage where countrylanguage.isofficial = 'English' or countrylanguage.isofficial = 'Dutch'
After fix: select countrylanguage.countrycode from countrylanguage where countrylanguage.isofficial = 'English' or countrylanguage.isofficial = 'Dutch'
---------------
 76%|█████████████████████████████████████████████████████████████████████████████████▎                         | 386/508 [14:35<04:58,  2.45s/it]Before fix: select countrylanguage.language from countrylanguage where country.governmentform = 'Republic' and count ( countrylanguage.* ) = 1 group by country.language.language
After fix: select countrylanguage.language from countrylanguage where country.governmentform = 'Republic' and count ( countrylanguage.* ) = 1 group by countrylanguage.language
---------------
 90%|████████████████████████████████████████████████████████████████████████████████████████████████           | 456/508 [17:09<01:46,  2.05s/it]Before fix: select highschooler.friend_name from highschooler where highschooler.name = 'Kyle'
After fix: select highschooler.name from highschooler where highschooler.name = 'Kyle'
---------------
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 508/508 [19:10<00:00,  2.27s/it]
Text-to-SQL inference spends 1177.4357948303223s.
exact_match score: 0.765748031496063
exec score: 0.8070866141732284
ckpt name: ./models/text2natsql-t5-3b/checkpoint-61642
EM: 0.7736220472440944
EXEC: 0.8188976377952756
-----------
ckpt name: ./models/text2natsql-t5-3b/checkpoint-78302
EM: 0.765748031496063
EXEC: 0.8070866141732284
-----------
Best EM ckpt: {'ckpt': './models/text2natsql-t5-3b/checkpoint-61642', 'EM': 0.7736220472440944, 'EXEC': 0.8188976377952756}
Best EXEC ckpt: {'ckpt': './models/text2natsql-t5-3b/checkpoint-61642', 'EM': 0.7736220472440944, 'EXEC': 0.8188976377952756}
Best EM+EXEC ckpt: {'ckpt': './models/text2natsql-t5-3b/checkpoint-61642', 'EM': 0.7736220472440944, 'EXEC': 0.8188976377952756}

I hope this will help you.

from resdsql.

oemd001 avatar oemd001 commented on July 30, 2024

Yes, this helps me a lot! Thank you so much for the information that you've provided!

I think this issue might be something to do with my machine. Thank you for your efforts and response. I really appreciate it!

from resdsql.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.