erdrecognition's People
erdrecognition's Issues
Associate start and endpoints with entities and relationships
Using the coordinates of endpoints, associate the entities with each endpoint, whether it is a start or end, as well as the relationships. This can be done using a normal distance comparison between each entity and a given point. N^2 time.
Install TensorFlow and deps locally
Recognizing Attributes on ERD
Extract images from bounding boxes specified in JSON using PIL
Extract entities, relationships, and attributes, into separate images for OCR with pytesseract.
End goal: Images should be extracted in a subdirectory named after the image name, located in the same directory
Write clustering algorithm based on number of entities, relationships, and attributes
POST each picture to the GCP endpoint
Get the number of entities, attributes, and relationships
Associate the image name with the file
Perform K-means clustering algorithm on all images
End goal: Obtain a graph of all the clusters of images based on entities, attributes, and relationship.
Convert each image to a json file as formatted below
{
"filepath": "PATH_TO_FILE",
"filename": "0001.jpg",
"numEntities": 5,
"numRelationships": 4,
"numAttributes": 2,
"entities": [
{
"name": "Food Item",
"primaryKey": "Name",
"numEntityAttributes": 2,
"entityAttributes": [
"Price",
"Number of Calories"
]
}
],
"relationships": [
"Usual Side",
"Supervises",
etc.
],
"attributes": [
"String 1",
"String 2",
etc.
],
}
Label dataset for model
Get the dataset, and split it into training and testing
Use Tensorflow to split the dataset. Also, add dataset to the repository
Make POST request to GCP endpoint for each image
Send each image to endpoint, retreive json, and count entities and images. Store it in the format as mentioned in the other issue
Recognizing Entities on ERD
The goal is to recognize entities based on ERD diagrams. Read up on TensorFlow to see what is necessary to recognize the "boxes" in each diagram
Use OpenCV to connect the dots
Mask out all white from the image, and take a starting point and an ending point.
Is there a path between the two points?
If not, take another ending point. Repeat until a starting point has been paired with an ending point. Add this pairing to a json file formatted like so
[
{
"start": *COORDINATES FOR STARTING POINT*,
"end": *COORDINATES FOR ENDING POINT*I,
},
etc.
Remove text from dataset images
Annotate images for starting and ending points of lines
Annotate approximately 25 images on GCP and train the dataset
After training, evaluate the performance of the model by deploying it
If performance is not up to benchmark, annotate 25 more images and train until accuracy is achieved
Update README
Fill out README with problem statement and overarching goals of the project
Use PyTesseract to read cut-out entities, relationships, and attributes
Run PyTesseract code on each file in the directory created from earlier task, and read text associated with image. Return a json file formatted in the issue from earlier.
Recognizing Relationships on ERD
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.