debadityapal / roberta-nl2sql Goto Github PK

A Data Blind Approach to the popular Semantic Parsing task NL2SQL

License: MIT License

Python 100.00%

roberta-nl2sql's Introduction

Hello 👋, I'm Debaditya!

I'm a Graduate Student 👨🏽‍💼 @USC in Los Angeles, USA. I am a huge admirer of Artificial Intelligence and have recently finished writing my third Research Paper in Natural Language Processing. I strongly advocate my fellow classmates to get into Open Source 📢. Besides academics, I'm the Lead Guitarist 🎸 of a Progressive Rock band and have done a couple of concerts here and there.

📖 I’m currently working as a Research Assistant at the USC Institute for Creative Technologies under the supervision of Dr. David Traum
🤹🏽 Fields I enjoy the most include 📈 Probability and Statistics, 🎛 Natural Language Processing, 🖼 Computer Vision, 📊 Data Science
📈 I’m fluent in C/C++, Python. I also dabble in Julia, JavaScript and R.
💬 I am fast to respond and would love to grow my network.
📫 How to reach me: [email protected];

roberta-nl2sql's People

Contributors

Stargazers

Watchers

Forkers

kc611 aneesha vinayakpandi tanmaysk001 pritamraj943 satwik-bhagwat sarthak-125 psumanth-sigmoid aadhilr sapruitt42 iampreetiranjan amaan-zafar omkar-sh

roberta-nl2sql's Issues

Assertion error in training: AttributeError: 'bool' object has no attribute 'tolist'

I followed the steps in the notebook but at the training the model step the code failed. This is the output from the training step in the notebook:

Keyword arguments {'is_pretokenized': True} not recognized.
<message repeats 100 times>
Keyword arguments {'is_pretokenized': True} not recognized.

AttributeError                            Traceback (most recent call last)
<ipython-input-12-88f6aa8b6960> in <module>()
      2 epoch_best = 0
      3 for epoch in range(EPOCHS):
----> 4     acc_train = dev_function.train( seq2sql_model, roberta_model, model_optimizer, roberta_optimizer, tokenizer, configuration, path_wikisql, train_loader)
      5     acc_dev, results_dev, cnt_list = dev_function.test(seq2sql_model, roberta_model, model_optimizer, tokenizer, configuration, path_wikisql, dev_loader, mode="dev")
      6     print_result(epoch, acc_train, 'train')

2 frames
/content/RoBERTa-NL2SQL/dev_function.py in train(seq2sql_model, roberta_model, model_optimizer, roberta_optimizer, roberta_tokenizer, roberta_config, path_wikisql, train_loader)
     61             = roberta_training.get_wemb_roberta(roberta_config, roberta_model, roberta_tokenizer, 
     62                                         natural_lang_utterance_tokenized, headers,max_seq_length= 222,
---> 63                                         num_out_layers_n=2, num_out_layers_h=2)
     64         # natural_lang_embeddings: natural language embedding
     65         # header_embeddings: header embedding

/content/RoBERTa-NL2SQL/roberta_training.py in get_wemb_roberta(roberta_config, model_roberta, tokenizer, nlu_t, hds, max_seq_length, num_out_layers_n, num_out_layers_h)
     76     all_encoder_layer, i_nlu, i_headers,\
     77     l_n, l_hpu, l_hs, \
---> 78     nlu_tt, t_to_tt_idx, tt_to_t_idx = get_roberta_output(model_roberta, tokenizer, nlu_t, hds, max_seq_length)
     79     # all_encoder_layer: RoBERTa outputs from all layers.
     80     # i_nlu: start and end indices of question in tokens

/content/RoBERTa-NL2SQL/roberta_training.py in get_roberta_output(model_roberta, tokenizer, nlu_t, headers, max_seq_length)
    176     all_encoder_layer = list(all_encoder_layer)
    177 
--> 178     assert all((check == all_encoder_layer[-1]).tolist())
    179 
    180     # 5. generate l_hpu from i_headers

AttributeError: 'bool' object has no attribute 'tolist'

I tried modifying the assertion line to a direct check:

assert check == all_encoder_layer[-1]

It ran but resulted in the assertion triggering and printing out the values yielded:

check: last_hidden_state, all_encoder_layer[-1]: s

I also tried removing the assertion entirely but, as expected, the code failed later.
Has anybody executed the notebook successfully or found a fix so it can complete training?

Error while running using CPU

Hi,
I was trying to run the code in my personal machine (without GPU). So I changed all
device = torch.device("cuda") to device = torch.device("cpu")
after doing so infer the existing model.pt works.
But got the bellow error while try to train the model ftom scratch

D:\MySpace\PycharmProjects\RoBERTa-NL2SQL\envs\python.exe D:/MySpace/PycharmProjects/RoBERTa-NL2SQL/Train.py
0%| | 0/7045 [00:00<?, ?it/s]
Traceback (most recent call last):
File "D:/MySpace/PycharmProjects/RoBERTa-NL2SQL/Train.py", line 42, in
acc_train = dev_function.train( seq2sql_model, roberta_model, model_optimizer, roberta_optimizer, tokenizer, configuration, path_wikisql, train_loader)
File "D:\MySpace\PycharmProjects\RoBERTa-NL2SQL\dev_function.py", line 36, in train
for batch_index, batch in enumerate(tqdm(train_loader)):
File "D:\MySpace\PycharmProjects\RoBERTa-NL2SQL\envs\lib\site-packages\tqdm\notebook.py", line 253, in iter
for obj in super(tqdm_notebook, self).iter(*args, **kwargs):
File "D:\MySpace\PycharmProjects\RoBERTa-NL2SQL\envs\lib\site-packages\tqdm\std.py", line 1166, in iter
for obj in iterable:
File "D:\MySpace\PycharmProjects\RoBERTa-NL2SQL\envs\lib\site-packages\torch\utils\data\dataloader.py", line 352, in iter
return self._get_iterator()
File "D:\MySpace\PycharmProjects\RoBERTa-NL2SQL\envs\lib\site-packages\torch\utils\data\dataloader.py", line 294, in _get_iterator
return _MultiProcessingDataLoaderIter(self)
File "D:\MySpace\PycharmProjects\RoBERTa-NL2SQL\envs\lib\site-packages\torch\utils\data\dataloader.py", line 801, in init
w.start()
File "D:\MySpace\PycharmProjects\RoBERTa-NL2SQL\envs\lib\multiprocessing\process.py", line 105, in start
self._popen = self._Popen(self)
File "D:\MySpace\PycharmProjects\RoBERTa-NL2SQL\envs\lib\multiprocessing\context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "D:\MySpace\PycharmProjects\RoBERTa-NL2SQL\envs\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "D:\MySpace\PycharmProjects\RoBERTa-NL2SQL\envs\lib\multiprocessing\popen_spawn_win32.py", line 65, in init
reduction.dump(process_obj, to_child)
File "D:\MySpace\PycharmProjects\RoBERTa-NL2SQL\envs\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'get_data..'
Process finished with exit code 1

Can you tell me if it is related to gpu/cpu or something else.

File not Found.!

I was running the code through Colab but have faced a FileNotFoundError. Can you please help me with the issue. Below is the error message from colab.

`FileNotFoundError Traceback (most recent call last)
in ()
6
7 train_data, train_table, dev_data, dev_table, train_loader, dev_loader = load_data.get_data(path_wikisql, batch_size = BATCH_SIZE)
----> 8 test_data,test_table,test_loader = load_data.get_test_data(path_wikisql, batch_size = BATCH_SIZE)
9 zero_data,zero_table,zero_loader = load_data.get_zero_data(path_wikisql, batch_size = BATCH_SIZE) # Data to test Zero Shot Learning

/content/RoBERTa-NL2SQL/load_data.py in get_test_data(file_path, batch_size)
71 test_table = {}
72
---> 73 with open(file_path + '/test_knowledge.jsonl') as test_data_file:
74 for idx, line in enumerate(test_data_file):
75 current_line = json.loads(line.strip())

FileNotFoundError: [Errno 2] No such file or directory: '/content/drive/MyDrive/NL2SQL/test_knowledge.jsonl'`

debadityapal / roberta-nl2sql Goto Github PK

roberta-nl2sql's Introduction

Hello 👋, I'm Debaditya!

roberta-nl2sql's People

Contributors

Stargazers

Watchers

Forkers

roberta-nl2sql's Issues

Assertion error in training: AttributeError: 'bool' object has no attribute 'tolist'

Error while running using CPU

File not Found.!

AttributeError: 'bool' object has no attribute 'tolist'

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent