palashkulsh / dbcrawler Goto Github PK
View Code? Open in Web Editor NEWcrawls database and makes insert statement for all the tables so that some data can be taken from all tables
crawls database and makes insert statement for all the tables so that some data can be taken from all tables
actor_info.actor_id=actor.actor_id,film.film_id=film_actor.film_id is valid
but
actor_info.actor_id=actor.actor_id, film.film_id=film_actor.film_id is not valid. This is due to white space trimming not considered while parsing.
column wise filtered dump creation may be supported
The main reason behind dbcrawler getting stuck is that each input generated is being checked/deep equaled to every other input that has been generated till now. This acts as a failsafe for circular queries or preventing same insert statement /data being generated twice or more time in final output.
For eg if row 1 of table a is generated twice due to some dependency then two same insert statments will be generated .This will cause the conflict when inserting (due to duplicate keys).
If the values retrieved for single select are too much then insert statement generated is too long as it generates the insert statement of the type "insert into table (columns...) values (first set of values),(2nd set of values),(3rd set of values)... and so on. Last list of values leads to sql being oversized and hence throwing error
The crawler is not able to read the generated schema file from /tmp directory because the sql module is not present in tmp module. It can read the schema file from inside the scope where sql is present.
[Error: not able to read generated schema file Error: Cannot find module 'sql']
There can be issues related to dbcrawler not working on windows system. The apparent reasons being
scout for more windows related issues
running dbcrawler on one of the servers gave not able to parse generated schema file
When the auto gen folder is inaccessible then there should be fallback location where schema can be generated at. one possible location is /tmp/ directory
dbcrawler breaks if there is @ or ) somewhere in the password
White space in seed data "actor.actor_id=3;actor.actor_id=4;film_actor.actor_id=5" throws error in dbcrawler . For eg. actor.actor_id=3;actor.actor_id=4; film_actor.actor_id = 5 will throw error as white space is not considered while parsing seed data.
version in commander should be provided from the actual version inside the package json file then only version would have some meaning.
Defaults from commander and other places should be removed as they wont run on any other system except mine (because database is not installed on it). and all the config parameters must be made compulsory
path to grammar files present in lib are being resolved incorrectly as lib is being resolved to current available lib path path. May be using ./lib/grammar files will solve the problem
there must be some sort of logging to show progress.
something was choking the crawler when giving the input
'sales_data.order_id=350465135112'
look into it
along with host and other information of database provide port option too so that if database is available at non standard port like 3307 instead of 3306 then it can be supplied by the user
To take full dump of a table dbcrawler should give full dump of the table
The generated sql file is not runnable because it lacks the semicolon after individual insert queries.
showing work progress on the screen assures the user that something is being done and that the program has not hung itself
should dbcrawler support providing data or not. this is crucial to future of crawler
Provide cli support for following items
giving input on commandline as [-c "" ] gives the error
message: 'Parse error on line 1:\n\n^\nExpecting 'STRING_LIT', got 'EOF'',
hash:
{ text: '',
token: 'EOF',
line: 0,
loc: { first_line: 1, first_column: 0, last_line: 1, last_column: 0 },
expected: [ ''STRING_LIT'' ] } }
some place util log some place console log. what horse shit is this. this shit is better than that shit
for powering collection apis dbcrawler will need collection like support that is join a table and filter the data that is being retrieved from that table.
when running dbcrawler the output must be the dump of only the given seed data.
2 things should be kept in mind
To run the crawler we have to pass/change the parameters from the file itself. Feature to pass changeable parameters from command line also will make the task of running the program easier.
So database parameters like host,password,database,user can be passed from command line .
Also crawling constraints can also be allowed to pass from command line though passing them from commandline would require extra parsing step or some other way to specify them.
currently all the queries are stored in finaldata variable inside the memory. This hogs up memory considerably. So better write each query as it is encountered or can also push them in queue which pops each query and writes it into file as it comes
currently there are multiple duplicate entries for single row, ideally there should not be
process.exit is not called at the end so the dbcrawler is not exiting after finishing successfuly
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.