georgia-tech-db / evadb Goto Github PK
View Code? Open in Web Editor NEWDatabase system for AI-powered apps
Home Page: https://evadb.ai/docs
License: Apache License 2.0
Database system for AI-powered apps
Home Page: https://evadb.ai/docs
License: Apache License 2.0
Need to report ROI, mAP for Faster RCNN.
We can utilized specialized neural networks since we only have 4 classes to detect.
In the current setup, we cannot run multiple queries even though the execute_async
returns control to the user immediately. We need the execute_async
to return a future or an id which the user can use to specify the query they want o spin on.
Adding support for this workflow using the following API changes.
.execute_async()
- returns a future.fetch_all(future)
- returns all the rowsIn our current code in FastRCNNObjectDetector, we have in line 101:
pred_t = [pred_score.index(x) for x in pred_score if x > self.threshold][-1]
But it is possible that, for some video, none of the pred_score is actually greater than self.threshold
leading to an empty pred_t
. This breaks the code.
Suggested fix:
while pred_t is empty:
reduce threshold by 0.05 and repeat the step
We can utilize multi-layered perceptron for textual data.
We can use a convolutional neural network to determine the color of the bounding box.
Pip
automatically installs latest pyarrow
library, in which some functions are deprecated.
To suppress those warnings, users can migrate to older pyarrow
version.
template -> abstract*
ex) loader_template -> abstract_loader
uadetrac_loader would be correct
Docker needs to start EVA server and export the port. @jiashenC
Right now, Eva does not support using custom-trained PyTorch models for inference.
To add this feature, we will need a few enhancements:
Need to be able to interpret complex queries such as ones with parenthesis.
Maybe I need to use an external logic library at this point?
Currently, the EVA server and client do not provide explicit messages to indicate that the connection is successfully established. Add these messages to the server and the client code.
10 frames uses GPU memory around 9G.
The format of current master branch UDF output, eva-reuse
and my SSD object detector is different. Need a sync format.
One metrics we should support first is the latency numbers. Collecting the end-to-end execution time and detailed latency analysis for each component (e..g, optimizer, data access, data transformation) when executing the query.
Motivation:
The confidence level needs to be adjustable by users. In order to do so, filters need to report statistics that are along a curve instead of one static value.
Need the manually labelled keypoint dict from Siddharth
In /data/ua_detrac/download.sh,
wget --no-check-certificate 'https://drive.google.com/uc?export=download&id=12xJc8S0Z7lYaAadsi2CoSK3WqH2OkUBu' -O DETRAC-Train_Annotations-XML.zip
fails for me. It is not able to download XML file.
color detection needs more work; possibly use some other library?
Currently, the client does not use the EVA parser for reading the UPLOAD
statement. As a result, we need to manually handle syntax and semantic errors in the statement. Currently, any errors in the syntax lead to the client shutting down. We need to either allow the client to use the EVA parser or add syntax conditions within the client as a special case for the UPLOAD statement.
To reproduce:
UPLOAD
. Eg:UPLOAD IMFILE 'data/ua_detrac/ua_detrac.mp4' PATH 'test_video.mp4';
We shouldn't allow concurrent select queries on the same cursor to ensure correctness.
cursor.execute('query1')
cursor.execute('query2')
The above code shouldn't be allowed. The user has to clear the cursor buffer before running the second query. Throw the following error.
InternalError: Unread result found
Using database like postgres to easily read and load the necessary data. Maybe use pickle to load the data. Currently, loading takes way too long. We shouldn't be continuously doing this.
We can have a object tracker to determine speed
In the current conda environment of Eva, the db_abi connect
method fails (with an asyncio error) when run from a jupyter notebook. This issue only arises on the Eva environment and not in a fresh conda environment. Possible fixes include upgrading the python version, checking for package conflicts etc.
Yao's paper mentions a way of making PPs on the fly. Need to look into that and implement.
The file src/storage/__init__.py
executes code when creating a storage object. This causes problems with the documentation engine as it builds itself by importing each package (from the __init__.py
files). This also causes some problems when building the documentation for some of the src/executor/
files such as plan_executor.py
and storage_executor
as those import the storage
package as well.
Currently, the documentation is skipping over this sub-directory but we discussed in the weekly meeting a potential quick fix to move the code out of this init file to abstract_storage_engine.py
and petastorm_storage_engine.py
We already check whether GPU is available through the torch API. https://github.com/georgia-tech-db/eva/blob/09e8a98ca0d80a03d6563a268a2281d26f714819/src/utils/generic_utils.py#L79
Can we simply just populate GPU from the torch API as well? Instead of asking manual config from user. https://github.com/georgia-tech-db/eva/blob/09e8a98ca0d80a03d6563a268a2281d26f714819/src/executor/execution_context.py#L70
It initially gave me some troubles because the required config is hidden in the code.
The LOAD DATA
command currently takes a significant amount of time to load the video into the database. Integrate optimizations from the eva-reuse project into eva to improve loading time.
Bigger datasets might not work on ordinary machines. We need to support reading / writing to h5py framework.
UNNEST: #143
JOIN: TBD
ARRAY_FUNCTIONs: TBD (https://www.postgresql.org/docs/8.4/functions-array.html)
-- GET frames with pedestrians
SELECT id, frame
FROM DETRAC
WHERE ['pedestrain'] <@ ObjDet(frame).labels;
-- GET frames with a pedestrian and a car
SELECT id, frame
FROM DETRAC
WHERE ['pedestrain', 'car'] <@ ObjDet(frame).labels;
-- GET frames with more than 5 cars
SELECT id, frame
FROM DETRAC
WHERE array_count(ObjDet(frame).labels, 'car') > 5;
-- GET frames with 2 pedestrians and 5 car
SELECT id, frame
FROM DETRAC
WHERE array_count(ObjDet(frame).labels, 'car') = 5
and array_count(ObjDet(frame).labels, 'pedestrian') = 2;
-- GET frames with red cars
SELECT id, frame
FROM DETRAC, UNNEST(ObjDet(frame)) as T(label, bbox)
WHERE label = 'car' and COLOR(frame, bbox) = 'red';
-- GET frames with cars masking 50% frame area
SELECT id, frame
FROM DETRAC, UNNEST(ObjDet(frame)) as T(label, bbox)
WHERE label = 'car' and AREA(frame, bbox) > 0.5;
-- GET bboxes of all red cars
SELECT id, frame, bboxes
FROM DETRAC, UNNEST(ObjDet(frame)) as T(label, bbox)
WHERE label = 'car' and AREA(frame, bbox) = 'red'
GROUPBY id;
-- GET first 100 frames with red car
SELECT id, frame
FROM DETRAC, UNNEST(ObjDet(frame)) as T(label, bbox)
WHERE label = 'car' and COLOR(frame, bbox) = 'red'
LIMIT 100;
Right now, when the client issues a command, the server does not provide a human-readable response to the request. This affects the useability of the system. We need to add request-dependent messages to the response.
The pytest importing is broken.
Error I got on latest master branch.
ModuleNotFoundError: No module named 'test.util'; 'test' is not a package
The only solution I find now is to delete the root directory __init__.py
file. The pytest official documentation claims it is not appropriate to put a __init__.py
at project root directory.
In the create_executor.py, we have
if (self.node.if_not_exists):
# check catalog if we already have this table
return
Due to this condition, we are not able to actually create tables for metadata. Temporarily disabling this for my tasks, but do we actually need this? if yes, how do we handle creation of new tables.
Steps to reproduce:
object_detection
notebook (specifically the upload cell)[Errno 13] Permission denied: '/tmp/test_video.mp4'
Performance difference for CV and PIL: https://www.kaggle.com/vfdev5/pil-vs-opencv
Option: https://discuss.pytorch.org/t/i-wonder-why-pytorch-uses-pil-not-the-cv2/19482
As an initial step for cost-based optimizer, I want to add SSD model along with FasterRCNN.
I wonder why some labels are marked as N/A for FasterRCNN like here?
Current version specified in the conda environment file is outdated and it doesn't support many detection pretrained models including video classification models
Not really an urgent bug. Just keep a note here.
Index out of range error when the faster rcnn model does not detect any object here.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.