Giter Club home page Giter Club logo

evadb's People

Contributors

abdullahshah avatar akhileshsiddhanti avatar albert-hen avatar alekhyam94 avatar americast avatar anirudh58 avatar aryan-rajoria avatar chitti-ankith avatar dependabot[bot] avatar gaurav274 avatar hershd23 avatar ishsiva avatar jaehobang avatar jarulraj avatar jiashenc avatar jineetd avatar karan-sarkar avatar kaushikravichandran avatar lorddarkula avatar pchunduri6 avatar pgluss avatar rodrigodlpontes avatar saiprashanth173 avatar sanjanag avatar sanmathik avatar snd96 avatar suryatejreddy avatar swati21 avatar xzdandy avatar yulaicui avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

evadb's Issues

UDFs - Object Detection

Need to report ROI, mAP for Faster RCNN.
We can utilized specialized neural networks since we only have 4 classes to detect.

Add support for concurrent queries

In the current setup, we cannot run multiple queries even though the execute_async returns control to the user immediately. We need the execute_async to return a future or an id which the user can use to specify the query they want o spin on.
Adding support for this workflow using the following API changes.

  1. .execute_async() - returns a future
  2. .fetch_all(future) - returns all the rows

Potential bug in FastRCNNObjectDetector

In our current code in FastRCNNObjectDetector, we have in line 101:

pred_t = [pred_score.index(x) for x in pred_score if x > self.threshold][-1]

But it is possible that, for some video, none of the pred_score is actually greater than self.threshold leading to an empty pred_t. This breaks the code.

Suggested fix:
while pred_t is empty:
reduce threshold by 0.05 and repeat the step

UDF - Textual Data

We can utilize multi-layered perceptron for textual data.
We can use a convolutional neural network to determine the color of the bounding box.

Query Template for Filters

The purpose of this issue is to discuss how filters (i.e. specialized NNs used as stand-ins for more expensive object detection models, as described in papers like NoScope and BlazeIt) should fit into EVA's queries.

Naming Convention

template -> abstract*
ex) loader_template -> abstract_loader
uadetrac_loader would be correct

Expression Tree Enhancement

  1. Make ExpressionType Enum auto() and add DELIMITER between different expression types.
    Add few basic util functionalities:
  2. Convert the expression tree to a list representation which can be used by the optimizer to reorder predicates.
  3. Extract predicate constant from a Comparison predicate
  4. Simplify predicate expression using sympy
  5. Build an expression tree from a list of predicates.

Uploading and using a custom model in Eva

Right now, Eva does not support using custom-trained PyTorch models for inference.
To add this feature, we will need a few enhancements:

  1. Add support for uploading a custom model to the server.
  2. Add functionality for uploading the class labels supported by the model, and the mapping between the model output and the labels.
  3. Add functionality for loading a custom model in PyTorch.

Query Optimizer - Extensibility

Need to be able to interpret complex queries such as ones with parenthesis.
Maybe I need to use an external logic library at this point?

Connection response on EVA server

Currently, the EVA server and client do not provide explicit messages to indicate that the connection is successfully established. Add these messages to the server and the client code.

Different UDF output format

The format of current master branch UDF output, eva-reuse and my SSD object detector is different. Need a sync format.

System metrics collecting support

One metrics we should support first is the latency numbers. Collecting the end-to-end execution time and detailed latency analysis for each component (e..g, optimizer, data access, data transformation) when executing the query.

Replace panda dataframe with pyspark dataframe

Motivation:

  1. Reduce dependency and code complexity.
  2. For some operations (e.g., join), the current implementation based on the Panda data frame relies on the assumption that the data can fit in the memory.

Filters - Curve instead of static values

The confidence level needs to be adjustable by users. In order to do so, filters need to report statistics that are along a curve instead of one static value.

download.sh link for downloading xml file is broken

In /data/ua_detrac/download.sh,

wget --no-check-certificate 'https://drive.google.com/uc?export=download&id=12xJc8S0Z7lYaAadsi2CoSK3WqH2OkUBu' -O DETRAC-Train_Annotations-XML.zip

fails for me. It is not able to download XML file.

Client does not use EVA parser to parse UPLOAD

Currently, the client does not use the EVA parser for reading the UPLOAD statement. As a result, we need to manually handle syntax and semantic errors in the statement. Currently, any errors in the syntax lead to the client shutting down. We need to either allow the client to use the EVA parser or add syntax conditions within the client as a special case for the UPLOAD statement.

To reproduce:

  1. Run the server and connect the client.
  2. Provide an invalid command beginning with the keyword UPLOAD. Eg:
UPLOAD IMFILE 'data/ua_detrac/ua_detrac.mp4' PATH 'test_video.mp4';

Disallow multiple queries on the same cursor to ensure cursor correctness

We shouldn't allow concurrent select queries on the same cursor to ensure correctness.

cursor.execute('query1')
cursor.execute('query2')

The above code shouldn't be allowed. The user has to clear the cursor buffer before running the second query. Throw the following error.
InternalError: Unread result found

Loader - database

Using database like postgres to easily read and load the necessary data. Maybe use pickle to load the data. Currently, loading takes way too long. We shouldn't be continuously doing this.

Incompatibility of eva environment with Jupyter notebook

In the current conda environment of Eva, the db_abi connect method fails (with an asyncio error) when run from a jupyter notebook. This issue only arises on the Eva environment and not in a fresh conda environment. Possible fixes include upgrading the python version, checking for package conflicts etc.

Documentation Bug: src.storage.__init__.py executes code

The file src/storage/__init__.py executes code when creating a storage object. This causes problems with the documentation engine as it builds itself by importing each package (from the __init__.py files). This also causes some problems when building the documentation for some of the src/executor/ files such as plan_executor.py and storage_executor as those import the storage package as well.

Currently, the documentation is skipping over this sub-directory but we discussed in the weekly meeting a potential quick fix to move the code out of this init file to abstract_storage_engine.py and petastorm_storage_engine.py

Populate GPU from torch cuda API

We already check whether GPU is available through the torch API. https://github.com/georgia-tech-db/eva/blob/09e8a98ca0d80a03d6563a268a2281d26f714819/src/utils/generic_utils.py#L79

Can we simply just populate GPU from the torch API as well? Instead of asking manual config from user. https://github.com/georgia-tech-db/eva/blob/09e8a98ca0d80a03d6563a268a2281d26f714819/src/executor/execution_context.py#L70

It initially gave me some troubles because the required config is hidden in the code.

Reduce video loading time

The LOAD DATA command currently takes a significant amount of time to load the video into the database. Integrate optimizations from the eva-reuse project into eva to improve loading time.

h5py support for dataset

Bigger datasets might not work on ordinary machines. We need to support reading / writing to h5py framework.

Filters

  1. Filters need to output a curve not scalar value

High priority Queries involving Object detection to be supported

UNNEST: #143
JOIN: TBD
ARRAY_FUNCTIONs: TBD (https://www.postgresql.org/docs/8.4/functions-array.html)

 
-- GET frames with pedestrians
SELECT id, frame
FROM DETRAC
WHERE ['pedestrain'] <@ ObjDet(frame).labels;

-- GET frames with a pedestrian and a car
SELECT id, frame
FROM DETRAC
WHERE ['pedestrain', 'car'] <@ ObjDet(frame).labels;

-- GET frames with more than 5 cars
SELECT id, frame
FROM DETRAC
WHERE array_count(ObjDet(frame).labels, 'car') > 5;

-- GET frames with 2 pedestrians and 5 car
SELECT id, frame
FROM DETRAC
WHERE array_count(ObjDet(frame).labels, 'car') = 5 
       and array_count(ObjDet(frame).labels, 'pedestrian') = 2;

-- GET frames with red cars
SELECT id, frame
FROM DETRAC, UNNEST(ObjDet(frame)) as T(label, bbox) 
WHERE label = 'car' and COLOR(frame, bbox) = 'red';

-- GET frames with cars masking 50% frame area
SELECT id, frame
FROM DETRAC, UNNEST(ObjDet(frame)) as T(label, bbox) 
WHERE label = 'car' and AREA(frame, bbox) > 0.5;

-- GET bboxes of all red cars
SELECT id, frame, bboxes
FROM DETRAC, UNNEST(ObjDet(frame)) as T(label, bbox)  
WHERE label = 'car' and AREA(frame, bbox) = 'red' 
GROUPBY id;

-- GET first 100 frames with red car
SELECT id, frame
FROM DETRAC, UNNEST(ObjDet(frame)) as T(label, bbox) 
WHERE label = 'car' and COLOR(frame, bbox) = 'red'
LIMIT 100;










Human-readable messages from server

Right now, when the client issues a command, the server does not provide a human-readable response to the request. This affects the useability of the system. We need to add request-dependent messages to the response.

  1. Connection established - Messages on server and client when a connection is established.
  2. Upload - Message on the server with size and name of video being uploaded. Message on the client with the size of the video uploaded.

Check return condition in the CreateExecutor

In the create_executor.py, we have

if (self.node.if_not_exists):
            # check catalog if we already have this table
            return

Due to this condition, we are not able to actually create tables for metadata. Temporarily disabling this for my tasks, but do we actually need this? if yes, how do we handle creation of new tables.

Upload fails when using with python notebook

Steps to reproduce:

  1. Checkout the tutorial branch or PR #161
  2. Start EVA server and jupyter notebook on ada1.
  3. Run the object_detection notebook (specifically the upload cell)
  4. Cmd fails with the following error.
    [Errno 13] Permission denied: '/tmp/test_video.mp4'

Optimizing data access overhead

  • Replace Batch.append (underlying pandas.append) into Batch.concat (underlying pandas.concat) if possible
  • Reduce the data transformation between panda.dataframe and spark.dataframe?

Some labels are marked as N/A why?

As an initial step for cost-based optimizer, I want to add SSD model along with FasterRCNN.

I wonder why some labels are marked as N/A for FasterRCNN like here?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.