statml / deepvideoanalytics Goto Github PK

This project forked from ml-lab/deepvideoanalytics

Analyze videos & images, perform detections, index frames & detected objects, search by examples.

Makefile 0.49% Python 40.12% Shell 0.01% Cuda 3.96% C 5.56% C++ 30.14% Nginx 0.01% CSS 2.70% HTML 12.95% PHP 0.03% Jupyter Notebook 4.01% Assembly 0.03%

deepvideoanalytics's Introduction

#Deep Video Analytics •

Author: Akshay Bhat, Cornell University.

Deep Video Analytics provides a platform for indexing and extracting information from videos and images. Deep learning detection and recognition algorithms are used for indexing individual frames / images along with detected objects. The goal of Deep Video analytics is to become a quickly customizable platform for developing visual & video analytics applications, while benefiting from seamless integration with state or the art models released by the vision research community.

self-promotion: If you are interested in Healthcare & Machine Learning please take a look at my another Open Source project Computational Healthcare

Features

Visual Search using Nearest Neighbors algorithm as a primary interface
Upload videos, multiple images (zip file with folder names as labels)
Provide Youtube url to be automatically processed/downloaded by youtube-dl
Metadata stored in Postgres
Operations (Querying, Frame extraction & Indexing) performed using celery tasks and RabbitMQ
Separate queues and workers for selection of machines with different specifications (GPU vs RAM)
Videos, frames, indexes, numpy vectors stored in media directory, served through nginx
Explore data, manually run code & tasks without UI via a jupyter notebook explore.ipynb
Some documentation on design decision, architecture and deployment.

Several models included out of the box

We take significant efforts to ensure that following models (code+weights included) work without having to write any code.

Indexing using Google inception V3 trained on Imagenet
Single Shot Detector (SSD) Multibox 300 training using VOC
Alexnet using Pytorch (disabled by default; set ALEX_ENABLE=1 in environment variable to use)
YOLO 9000 (disabled by default; set YOLO_ENABLE=1 in environment variable to use)
Face detection/alignment/recognition using MTCNN and Facenet
Facebook FAISS for fast approximate similarity search (Coming very soon!)

Potential algorithms/models

Text detection models
Soundnet (requires extracting mp3 audio)
Pytorch Squeezenet
Mapnet (requires converting models from Marvin)
Keras-js which uses Keras inception for client side indexing

Open Issues & To Do

Installation

Pre-built docker images (corrosponding to alpha version) for both CPU and GPU version are now available on Docker Hub.

On Mac, Windows and Linux machines without NVidia GPUs

You need to have latest version of Docker installed.

git clone https://github.com/AKSHAYUBHAT/DeepVideoAnalytics && cd DeepVideoAnalytics/docker && docker-compose up

Your machine NVidia GPU with Docker and nvidia-docker installed

Replace docker-compose by nvidia-docker-compose, the Dockerfile uses tensorflow gpu base image and appropriate version of pytorch. The Makefile for Darknet is also modified accordingly. This code was tested using an older NVidia Titan GPU and nvidia-docker.

pip install --upgrade nvidia-docker-compose
git clone https://github.com/AKSHAYUBHAT/DeepVideoAnalytics && cd DeepVideoAnalytics/docker_GPU && ./rebuild.sh && nvidia-docker-compose up

On AWS EC2 with a GPU enabled P2 instance

We provide an AMI will all dependancies such as docker & drivers. Start a P2.xlarge instance with ami-b3cc1fa5 (N. Virginia), ports 8000, 6006, 8888 open (preferably to only your IP) and run following command after ssh'ing into the machine.

cd deepvideoanalytics && git pull && cd docker_GPU && ./rebuild.sh && nvidia-docker-compose up

you can optionally specify "-d" at the end to detach it, but for the very first time its useful to read how each container is started. After approximately 3 ~ 5 minutes the user interface will appear on port 8000 of the instance ip. The Process used for AMI creation is here Security warning! The current GPU container uses nginx <-> uwsgi <-> django setup to ensure smooth playback of videos. However it runs nginix as root (though within the container). Considering that you can now modify AWS Security rules on-the-fly, I highly recommend allowing inbound traffic only from your own IP address.

On multiple machines with/without GPUs

Other than the shared media folder (ideally a mounted EFS or NFS), configuring Postgres and RabbitMQ is straightforward. Please read this regarding trade offs.

Options specified via environment variable

Following options can be specified in docker-compose.yml, or your envrionment.

ALEX_ENABLE=1 (to use Alexnet with PyTorch. Otherwise disabled by default)
YOLO_ENABLE=1 (to use YOLO 9000. Otherwise disabled by default)
SCENEDETECT_DISABLE=1 (to disable scene detection, Otherwise enabled by default)

Architecture

User Interface

Search

Past queries

Video list / detail

Frame detail

Libraries & Code used

Pytorch License
Darknet License
AdminLTE2 License
FabricJS License
Modified PySceneDetect License
Modified SSD-Tensorflow Individual files are marked as Apache
FAISS License (Non Commercial)
Facenet License
MTCNN TensorFlow port of MTCNN for face detection/alignment
Docker
Nvidia-docker
OpenCV
Numpy
FFMPEG
Tensorflow