Giter Club home page Giter Club logo

roberta-nl2sql's Introduction

Hello ๐Ÿ‘‹, I'm Debaditya!


Debaditya's Orcid Debaditya's LinkdeIN Debaditya's Facebook Debaditya's Instagram Debaditya's Kaggle

I'm a Graduate Student ๐Ÿ‘จ๐Ÿฝโ€๐Ÿ’ผ @USC in Los Angeles, USA. I am a huge admirer of Artificial Intelligence and have recently finished writing my third Research Paper in Natural Language Processing. I strongly advocate my fellow classmates to get into Open Source ๐Ÿ“ข. Besides academics, I'm the Lead Guitarist ๐ŸŽธ of a Progressive Rock band and have done a couple of concerts here and there.

GIF

  • ๐Ÿ“– Iโ€™m currently working as a Research Assistant at the USC Institute for Creative Technologies under the supervision of Dr. David Traum
  • ๐Ÿคน๐Ÿฝ Fields I enjoy the most include ๐Ÿ“ˆ Probability and Statistics, ๐ŸŽ› Natural Language Processing, ๐Ÿ–ผ Computer Vision, ๐Ÿ“Š Data Science
  • ๐Ÿ“ˆ Iโ€™m fluent in C/C++, Python. I also dabble in Julia, JavaScript and R.
  • ๐Ÿ’ฌ I am fast to respond and would love to grow my network.
  • ๐Ÿ“ซ How to reach me: [email protected];

roberta-nl2sql's People

Contributors

debadityapal avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

roberta-nl2sql's Issues

Assertion error in training: AttributeError: 'bool' object has no attribute 'tolist'

I followed the steps in the notebook but at the training the model step the code failed. This is the output from the training step in the notebook:

Keyword arguments {'is_pretokenized': True} not recognized.
<message repeats 100 times>
Keyword arguments {'is_pretokenized': True} not recognized.

AttributeError                            Traceback (most recent call last)
<ipython-input-12-88f6aa8b6960> in <module>()
      2 epoch_best = 0
      3 for epoch in range(EPOCHS):
----> 4     acc_train = dev_function.train( seq2sql_model, roberta_model, model_optimizer, roberta_optimizer, tokenizer, configuration, path_wikisql, train_loader)
      5     acc_dev, results_dev, cnt_list = dev_function.test(seq2sql_model, roberta_model, model_optimizer, tokenizer, configuration, path_wikisql, dev_loader, mode="dev")
      6     print_result(epoch, acc_train, 'train')

2 frames
/content/RoBERTa-NL2SQL/dev_function.py in train(seq2sql_model, roberta_model, model_optimizer, roberta_optimizer, roberta_tokenizer, roberta_config, path_wikisql, train_loader)
     61             = roberta_training.get_wemb_roberta(roberta_config, roberta_model, roberta_tokenizer, 
     62                                         natural_lang_utterance_tokenized, headers,max_seq_length= 222,
---> 63                                         num_out_layers_n=2, num_out_layers_h=2)
     64         # natural_lang_embeddings: natural language embedding
     65         # header_embeddings: header embedding

/content/RoBERTa-NL2SQL/roberta_training.py in get_wemb_roberta(roberta_config, model_roberta, tokenizer, nlu_t, hds, max_seq_length, num_out_layers_n, num_out_layers_h)
     76     all_encoder_layer, i_nlu, i_headers,\
     77     l_n, l_hpu, l_hs, \
---> 78     nlu_tt, t_to_tt_idx, tt_to_t_idx = get_roberta_output(model_roberta, tokenizer, nlu_t, hds, max_seq_length)
     79     # all_encoder_layer: RoBERTa outputs from all layers.
     80     # i_nlu: start and end indices of question in tokens

/content/RoBERTa-NL2SQL/roberta_training.py in get_roberta_output(model_roberta, tokenizer, nlu_t, headers, max_seq_length)
    176     all_encoder_layer = list(all_encoder_layer)
    177 
--> 178     assert all((check == all_encoder_layer[-1]).tolist())
    179 
    180     # 5. generate l_hpu from i_headers

AttributeError: 'bool' object has no attribute 'tolist'

I tried modifying the assertion line to a direct check:

assert check == all_encoder_layer[-1]

It ran but resulted in the assertion triggering and printing out the values yielded:

check: last_hidden_state, all_encoder_layer[-1]: s

I also tried removing the assertion entirely but, as expected, the code failed later.
Has anybody executed the notebook successfully or found a fix so it can complete training?

Error while running using CPU

Hi,
I was trying to run the code in my personal machine (without GPU). So I changed all
device = torch.device("cuda") to device = torch.device("cpu")
after doing so infer the existing model.pt works.
But got the bellow error while try to train the model ftom scratch

D:\MySpace\PycharmProjects\RoBERTa-NL2SQL\envs\python.exe D:/MySpace/PycharmProjects/RoBERTa-NL2SQL/Train.py
0%| | 0/7045 [00:00<?, ?it/s]
Traceback (most recent call last):
File "D:/MySpace/PycharmProjects/RoBERTa-NL2SQL/Train.py", line 42, in
acc_train = dev_function.train( seq2sql_model, roberta_model, model_optimizer, roberta_optimizer, tokenizer, configuration, path_wikisql, train_loader)
File "D:\MySpace\PycharmProjects\RoBERTa-NL2SQL\dev_function.py", line 36, in train
for batch_index, batch in enumerate(tqdm(train_loader)):
File "D:\MySpace\PycharmProjects\RoBERTa-NL2SQL\envs\lib\site-packages\tqdm\notebook.py", line 253, in iter
for obj in super(tqdm_notebook, self).iter(*args, **kwargs):
File "D:\MySpace\PycharmProjects\RoBERTa-NL2SQL\envs\lib\site-packages\tqdm\std.py", line 1166, in iter
for obj in iterable:
File "D:\MySpace\PycharmProjects\RoBERTa-NL2SQL\envs\lib\site-packages\torch\utils\data\dataloader.py", line 352, in iter
return self._get_iterator()
File "D:\MySpace\PycharmProjects\RoBERTa-NL2SQL\envs\lib\site-packages\torch\utils\data\dataloader.py", line 294, in _get_iterator
return _MultiProcessingDataLoaderIter(self)
File "D:\MySpace\PycharmProjects\RoBERTa-NL2SQL\envs\lib\site-packages\torch\utils\data\dataloader.py", line 801, in init
w.start()
File "D:\MySpace\PycharmProjects\RoBERTa-NL2SQL\envs\lib\multiprocessing\process.py", line 105, in start
self._popen = self._Popen(self)
File "D:\MySpace\PycharmProjects\RoBERTa-NL2SQL\envs\lib\multiprocessing\context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "D:\MySpace\PycharmProjects\RoBERTa-NL2SQL\envs\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "D:\MySpace\PycharmProjects\RoBERTa-NL2SQL\envs\lib\multiprocessing\popen_spawn_win32.py", line 65, in init
reduction.dump(process_obj, to_child)
File "D:\MySpace\PycharmProjects\RoBERTa-NL2SQL\envs\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'get_data..'
Process finished with exit code 1

Can you tell me if it is related to gpu/cpu or something else.

File not Found.!

I was running the code through Colab but have faced a FileNotFoundError. Can you please help me with the issue. Below is the error message from colab.

`FileNotFoundError Traceback (most recent call last)
in ()
6
7 train_data, train_table, dev_data, dev_table, train_loader, dev_loader = load_data.get_data(path_wikisql, batch_size = BATCH_SIZE)
----> 8 test_data,test_table,test_loader = load_data.get_test_data(path_wikisql, batch_size = BATCH_SIZE)
9 zero_data,zero_table,zero_loader = load_data.get_zero_data(path_wikisql, batch_size = BATCH_SIZE) # Data to test Zero Shot Learning

/content/RoBERTa-NL2SQL/load_data.py in get_test_data(file_path, batch_size)
71 test_table = {}
72
---> 73 with open(file_path + '/test_knowledge.jsonl') as test_data_file:
74 for idx, line in enumerate(test_data_file):
75 current_line = json.loads(line.strip())

FileNotFoundError: [Errno 2] No such file or directory: '/content/drive/MyDrive/NL2SQL/test_knowledge.jsonl'`

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.