Giter Club home page Giter Club logo

ln2sql's People

Contributors

0xharib avatar ferrerojeremy avatar rhnvrm avatar rupinder1133 avatar shashank-smarkets avatar shashank88 avatar theparadoxer02 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ln2sql's Issues

pip3 package doesn't work

When installing the package for python3 it installs successfully

$ pip3 install ln2sql
Collecting ln2sql
Installing collected packages: ln2sql
Successfully installed ln2sql-0.1

but module can't be loaded

$ python -m ln2sql
venv/bin/python: No module named ln2sq

and the package is empty

$ ls -a venv/lib/python3.7/site-packages/ln2sql-0.1.dist-info/
.		INSTALLER	RECORD		top_level.txt
..		METADATA	WHEEL

Is this expected to be working? If so how is it meant to be used?

Thesaurus Support

I am having trouble getting the thesaurus support working. Here's a command I'm running in order to test the pupil -> student rule in th_english.dat:

python2.7 ln2sql.py -d database/school.sql -l lang/english.csv -j output.json -t thesaurus/th_english.dat -i "Count how many pupil are there with firstname is John"

Although it works for the word "student", it outputs this error with the word "pupil" (as shown above):

ParsingException.ParsingException: Error: No table name found in sentence!

Am I invoking it incorrectly? I'm guessing this is simply a configuration mistake on my part, but I can't figure out what's wrong with the above command.

Group tests into chunks

It's not a good practice to loop all different test cases in one tests. Grouping into categories like:
test_select()
test_select_where()

gives more context and allows adding tests more easily and detecting issues more easier. As I have already moved to pytest, just need to break it to chunks

Inconsistant generated SQL Statement

My question was : "what is name in emp with the highest score?"
and the answer was :

SELECT emp.name
FROM emp
WHERE MAX(emp.score) = OOV;

Seems that subqueries are not taking into account. Am I wrong ?

The last punctuation character is kept in value extraction

A punctuation sign in end of sentence (like in english) can distort the detection of the last value of the WHERE.

This sentence:
Count how many city there are where the score is greater than 2?

give:

SELECT COUNT(*)
FROM city
INNER JOIN emp
ON emp.cityId = city.id
WHERE emp.score > '2?';

case sensitive issue how to solve it

#58 i am facing some problem to solve this issue when i run this code it accepts only exact table name for example we have table whose name is 'class' and when i write table name like (Class,CLASS,classes) it is not giving me the output although i am using [flags=re.IGNORRECASE] how to solve this issue please help me.

def create_table(self, table_string):
    lines = table_string.split("\n")
    table = Table()
    for line in lines:
        if 'TABLE' in line:
            table_name = re.search("`(\w+)`", line,flags=re.IGNORECASE)
            table.name = table_name.group(1)
            if self.thesaurus_object is not None:
                table.equivalences = self.thesaurus_object.get_synonyms_of_a_word(table.name)
        elif 'PRIMARY KEY' in line:
            primary_key_columns = re.findall("`(\w+)`", line,flags=re.IGNORECASE)
            for primary_key_column in primary_key_columns:
                table.add_primary_key(primary_key_column)
        else:
            column_name = re.search("`(\w+)`", line,flags=re.IGNORECASE)
            if column_name is not None:
                column_type = self.predict_type(line)
                if self.thesaurus_object is not None:
                    equivalences = self.thesaurus_object.get_synonyms_of_a_word(column_name.group(1))
                else:
                    equivalences = []
                table.add_column(column_name.group(1), column_type, equivalences)
    return table

def alter_table(self, alter_string):
    lines = alter_string.replace('\n', ' ').split(';')
    for line in lines:
        if 'PRIMARY KEY' in line:
            table_name = re.search("TABLE `(\w+)`", line,flags=re.IGNORECASE).group(1)
            table = self.get_table_by_name(table_name)
            primary_key_columns = re.findall("PRIMARY KEY \(`(\w+)`\)", line,flags=re.IGNORECASE)
            for primary_key_column in primary_key_columns:
                table.add_primary_key(primary_key_column)
        elif 'FOREIGN KEY' in line:
            table_name = re.search("TABLE `(\w+)`", line,flags=re.IGNORECASE).group(1)
            table = self.get_table_by_name(table_name)
            foreign_keys_list = re.findall("FOREIGN KEY \(`(\w+)`\) REFERENCES `(\w+)` \(`(\w+)`\)", line,flags=re.IGNORECASE)
            for column, foreign_table, foreign_column in foreign_keys_list:
                table.add_foreign_key(column, foreign_table, **foreign_column)**

Start Up Error

There is an error while starting the project in python3

  • On running the file ln2sql.py

Primary key and Foreign key undparsed in ALTER TABLE

In tables.sql.zip, the following lines are ignored during the parsing.

ALTER TABLE `city`
  ADD PRIMARY KEY (`id`),
  ADD KEY `id` (`id`);

ALTER TABLE `emp`
  ADD PRIMARY KEY (`id`),
  ADD KEY `cityId` (`cityId`);

ALTER TABLE `emp`
  MODIFY `id` int(11) NOT NULL AUTO_INCREMENT, AUTO_INCREMENT=31;

ALTER TABLE `emp`
  ADD CONSTRAINT `emp_ibfk_1` FOREIGN KEY (`cityId`) REFERENCES `city` (`id`);

Break tests into sensible groups

It's not a good practice to loop all different test cases in one tests. Grouping into categories like:
test_select()
test_select_where()

gives more context and allows adding tests more easily and detecting issues more easier. As I have already moved to pytest, just need to break it to chunks

Query churned out seems to be wrong for the example in README.md

The out put should be a simple query which should not be using a join. The example uses the city data model. Please help me out with this.

Command:

python3 -m ln2sql.main -d database_store/city.sql -l lang_store/english.csv -j output.json -i "Count how many city there are with the name blob?"

Generated sql:

SELECT COUNT(*)
FROM city
INNER JOIN emp
ON city.id = emp.cityId
WHERE emp.name = 'with';

Not working to perform joins

The english statement 'What is the name of the professor who is teaching the field Algorithm ?' does not work. There comes an error saying 'there is at least one column 'field' which is not reachable from 'professor'.

Provide options to use alias

Hi Team,

In normal world, we usually use business names while writing.
These names may or may not be same as technical name of columns in databases.

For e.g. "idStudent" is usually written as "student id"

Need to have feature to use alias for such colums.

Regards,
Dhaval

how to use Thesaurus

Hi there
I did not understand where and how you are using Thesaurus in ln2sql . Plz suggest
Thanks in advance

Make package more usable

This has multiple parts:

a) It should be easy to pip install to use.

i.e I just want to setup the sql file to use once (generally you don't have to change that once setup) and just query as:

query = Ln2sql.to_query(sentence)

b) Thinking about reasonable defaults. eg. english as default, specifying the db file once when initialising the class and then doing as in a)

need help

I have problems dealing with the between command. And also there is a problem when you use spaces in the equivalences of a word can some one help me

find a bug

query.orderby when write to json has a bug like this

image

Steps to use ln2sql

I am sorry, I am not getting how to use this tool.
Can anybody tell me the uses or steps from scratch so that I can understand and use this feature.

Question about accuracy/efficiency

Hi,

I have read the paper and checked out the library, and I am considering using a similar method to the one used here to enable users to query a database with a natural language.

I saw in the other issue that you are no longer continuing this project, could you tell me what is the reason?
Is it due to performance or accuracy reason?
Or maybe you have find some specific limitations that make this "non machine learning" approach not useable in real life?

One of the limitation I could see would be that some words that are not a noun actually dictate the table/colmun values should be selected from (for example, "who takes the biology class?" -> "SELECT * FROM student WHERE ...").

Based on your experience with this approach, would you advice for or against using a similar technique in production?

Thank you in advance for the reply!

Multiple Group by

Unable to form query containing multiple Group by. Is it supported?
If yes, what should be the input sentence syntax?

Join on pk and fk columns with different name unsupported

Exemple of correct declarations of pk and fk but unsupported by ln2sql :

ALTER TABLE `city`
  ADD PRIMARY KEY (`id`),
  ADD KEY `id` (`id`);

ALTER TABLE `emp`
  ADD CONSTRAINT `emp_ibfk_1` FOREIGN KEY (`cityId`) REFERENCES `city` (`id`);

Value detection

All the values in NLP is detected as OOV in sql query. Is there any technique to identify the values in SQL generation

Stopword filtering doesn't work.

The query given in the Readme works fine, but if stopwords path is provided the query output is wrong?
The English stopword file has words like 'name' and 'eighty' which will definitely remove essential words from the sentence.
But using the nltk.corpus.stopwords also makes no difference and gives the same wrong query output.

Understanding parser

Hi everyone,
I'm facing difficulty in understanding how exactly key word parsing to map Natural language(NL) query words and database keywords is done.It would be greate if you can breifly explain me how exactly In2sql modue searches and maps words from NL query to database values like column names and cell values. Meanwhile I wish to know whether we can use a data structure like hash table for this problem as defined below

To create a map between columns and table
Columns | Table_names
Column1 | Table_name1
Column2 | Table_name2

So to check if any word from NLP querry is a column name just check whether word is a key of hash table defined above

And for column value or cell value create a hash table like
Values | Column_names
Value1 | Column1
Value2 | Column1
.
.
Valuen | Column_m
where there will be m columns and n values in total in database. So to check whether a word from NLP query is a cell value, check if the key exists in above hash table. Furthermore, in case of cell value repeating in more than one columns then use chaining concept for hash table's key.

Join between tables

Hi everyone,
I am facing difficulty in forming a question based on joins. How to write a question to access multiple tables using joins?Is there any chance?

Can you please provide an example with school.sql file available in the database_store

Use timestamp column in where clause

Unable to use timestamp, for example:
"what is minimum monoxide in pollution_data when time_stamp equals 2017-03-23"

my timestamp has both time and date.

Also tried using:
"what is minimum monoxide in pollution_data when extract(month from time_stamp) equals 3"

datatypes are:-
time_stamp timestamp with time zone,
monoxide double precision,

P.S.: not using real names of columns and table for privacy reasons

Parser not defined error

Hi Need one help πŸ‘
I am getting parser not defined after submitting the GUI form . PFA
parser_not_defined_error

Unable to use thesaurus file

Issue

I am unable to replicate the example from README but with -t thesaurus_store/th_english.dat flag (works fine without it):

python -m ln2sql.main -d database_store/city.sql -l lang_store/english.csv -t thesaurus_store/th_english.dat -i "Count how many city there are with the name blob?"

Error Message

Traceback (most recent call last):
  File "c:\users\user\appdata\local\continuum\anaconda3\Lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "c:\users\user\appdata\local\continuum\anaconda3\Lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "c:\users\user\desktop\projects\natural_language_sql\ln2sql-master\ln2sql\main.py", line 26, in <module>
    main()
  File "c:\users\user\desktop\projects\natural_language_sql\ln2sql-master\ln2sql\main.py", line 23, in main
    ).get_query(args.sentence)
  File "c:\users\user\desktop\projects\natural_language_sql\ln2sql-master\ln2sql\ln2sql.py", line 44, in get_query
    queries = self.parser.parse_sentence(input_sentence, self.stopwordsFilter)
  File "c:\users\user\desktop\projects\natural_language_sql\ln2sql-master\ln2sql\parser.py", line 711, in parse_sentence
    input_word_list[i] in self.database_object.get_table_by_name(table_name).equivalences):
TypeError: argument of type 'NoneType' is not iterable

System info and versions

Python 3.6.1
Windows 10
ln2sql version: 0.2

Spanish language.

I have not completely inspected this library, it looks very good and interesting, I would like to know some very simple instructions on how to add the Spanish language.
I guess it's the same way you add other languages, how could you do it.

'NoneType' object has no attribute 'group'

The following issue raises when I try to load the sql dump file.

if 'TABLE' in line:
111 table_name = re.search("(\w+)", line)
--> 112 table.name = table_name.group(1)
113 if self.thesaurus_object is not None:
114 table.equivalences = self.thesaurus_object.get_synonyms_of_a_word(table.name)

AttributeError: 'NoneType' object has no attribute 'group'

Kindly suggest a fix to this issue.

Parent module '' not loaded

When running gui this error came.
Traceback (most recent call last):
File "ln2sql/ln2sql_gui.py", line 6, in
from .ln2sql import Ln2sql
SystemError: Parent module '' not loaded, cannot perform relative import

Any Solution?

Descending order in ORDER BY

  • Add keywords in lang/*.csv
  • Identify DESC keyword(s) in order_by_phrase during parsing
  • Set to 1 the order attribut of the order_by object that is identified in descending order

Default column for table

Hello everyone,

Is there a way to fix a default column for a table when no column is specified.

For example, if I have a table "Bus", its default colum its "plate", so I can ask "Number of routes of bus 'BR1234'" instead of "Number of routes of bus with plate 'BR1234'".

Thanks to all people that worked on this amazin project.

Caveat Empetor: Postgresql

I think a huge sign should be placed in the readme file to inform users that this doesn't work with Postgres databases.
I spent hours trying to figure out why I was getting this traceback:
"AttributeError: 'NoneType' object has no attribute 'group'"

before going through the issues to discover that Postgres isn't supported.

French paper

Is there an English version of the french scientific paper? I don't speak french but I want to read that paper.
thank you

Python2 and 3 compatibility

When I moved the project to python3, I missed adding some backwards compatibility checks. Need to convert so it supports both python2.7+ and python3.5+

COUNT keyword undetected

The following entry python ln2sql.py -d database/tal.sql -l lang/french.csv -i "combien y a t'il d'élève dont le nom est EMMAR ?" returns :

SELECT COUNT(*)
FROM eleve
WHERE eleve.nom = OOV;

but python ln2sql.py -d database/tal.sql -l lang/french.csv -i "Combien y a t'il d'élève dont le nom est EMMAR ?" returns :

SELECT *
FROM eleve
WHERE eleve.nom = OOV;

JOIN statements

As mentioned in the documentation joins are supported but I can't seem to generate a SQL with JOIN. I have tried with a sql dump with 2 tables with proper foreign key and primary keys.

SQL dump : tables.sql.zip

Please provide an example natural language query for JOINs

Module not found Error

  1. Git cloned the package
  2. Ran setup.py install
  3. Tried python ln2sql_gui.py

Error:-

D:\work_parts\j\ln2sql\ln2sql>python ln2sql_gui.py
Traceback (most recent call last):
  File "ln2sql_gui.py", line 6, in <module>
    from .ln2sql import Ln2sql
ModuleNotFoundError: No module named '__main__.ln2sql'; '__main__' is not a package

Limitations of this projects + Some example tests

If you've just arrived and don't want to spend too much time setting up and testing this project for your specific use case, take a look at the following example cases where it didn't work as expected.

Question SQL Query
What is the id of the client with the username jdoe? SELECT client.id FROM client WHERE client.username = 'OOV';
How many clients have made a reservation with a rating of 5? SELECT COUNT(*) FROM client;
What is the name of the client who made the most reservations? SELECT client.name FROM client;
How many clients have made a reservation at more than 1 hotel? SELECT COUNT(*) FROM client;
How many reservations were made by clients whose mobile number contains the digits 555? SELECT COUNT(*) FROM reservation;
What is the highest-rated hotel in the database? SELECT * FROM hotel;
What is the lowest-rated hotel in the database? SELECT * FROM hotel;
How many clients are over the age of 30? SELECT COUNT(*) FROM client WHERE client.age = 'the';
What is the username of the oldest client in the database? SELECT client.username FROM client;
What is the username of the youngest client in the database? SELECT client.username FROM client;
How many reservations were made by clients under the age of 25? SELECT COUNT(*) FROM reservation INNER JOIN client ON reservation.clientId = client.id WHERE client.age = OOV;

Few more with relations:

Question SQL Query
What are the name of all the hotel where the client with id is 123 made reservation? SELECT hotel.name FROM hotel WHERE hotel.id = β€˜123’;
How many client have the same name as their username? SELECT COUNT(*) FROM client WHERE client.name = OOV AND client.username = OOV;
What is the username of the client with the most reservation? SELECT client.username FROM client;
What is the name of the hotel with the most reservation? SELECT hotel.name FROM hotel;
How many client have made reservation on the same date as another client? SELECT COUNT() FROM client; SELECT COUNT() FROM client;
What is the total number of reservation made at hotel with id is 777? SELECT COUNT(*) FROM reservation WHERE reservation.id = β€˜777’;
What are the names of all the hotel where the client with id is 999 made reservation with a rating of 4 or higher? SELECT * FROM hotel WHERE hotel.id = β€˜999’;
What is the age distribution of client who made reservation at hotel with id is 333? SELECT client.age FROM client WHERE client.id = β€˜333’;
What is the name of the client who made the highest-rated reservation? SELECT client.name FROM client;
How many client have a username that contains the word β€œhotel”? SELECT COUNT(*) FROM client WHERE client.username = OOV;
What is the id of the client with the highest-rated reservation at hotel with id is 222? SELECT client.id FROM client WHERE MAX(client.id) = β€˜222’;
How many reservation were made by client whose usernames contain the letter β€œa”? SELECT COUNT(*) FROM reservation;
What is the average rating for reservation made by client over the age of 40? SELECT AVG(reservation.rating) FROM reservation INNER JOIN client ON reservation.clientId = client.id WHERE client.age = β€˜the’;

Credits: @harshadk-sourcefuse for testing these out.

No table name found in sentence

Hi Team,

While running sample, "What is the average age of students whose name is Doe or age over 25?" it gives error "No table name found in sentence".

python -m ln2sql.main -d database_store/school.sql -l lang_store/english.csv -j output.json -i "What is the average age of students whose name is Doe or age over 25?"

It seems keyword "students" is not matching in database "school.sql".

Regards,
Dhaval

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.