Giter Club home page Giter Club logo

sisyphus's Introduction

๐Ÿ—ป Sisyphus

Scene similarity for weak object discovery & classification. Labelling images for building an object classification models is a labor intensive task (a.k.a rolling a ball uphill). This repository leverages image similarity to generate a weakly labeled dataset which can be cleaned up order of magnitude faster than going through the whole image/video corpus. The expected speedup is inversely proportional to rarity of the object of interest. We also provide a blockages/construction detection model trained on drive data from Berlin. and tools contained in this repository.

Table of Contents

๐Ÿ’ป Installation

Create a self-contained reproducible development environment & Get into the development environment

Example for running on CPUs:

make install dockerfile=Dockerfile.cpu dockerimage=moabitcoin/sfi-cpu
make run dockerimage=dockerimage=moabitcoin/sfi-cpu

Example for running on GPUs via nvidia-docker:

make install dockerfile=Dockerfile.gpu dockerimage=moabitcoin/sfi-gpu
make run dockerimage=moabitcoin/sfi-gpu runtime=nvidia

The Python source code directory is mounted into the container: if you modify it on the host it will get modified in the container, so you don't need to rebuild the image. To make data visible in the container set the datadir env var, e.g. to make your /tmp directory show up in /data inside the container run

make run datadir=/tmp

See the Makefile for options and more advanced targets.

๐ŸŽ‰ Usage

All tools can be invoked via

./bin/sfi --help
usage: sficmd [-h]  ...

optional arguments:
  -h, --help         show this help message and exit

commands:

    frames-extract   Extract video key frames w/intra frame similarity
    feature-extract  Extract image features w/ pre-trained resnet50
    feature-extract-vid
                     Extract features from videos wth 2(D+1) video model
    build-index      Builds a faiss index
    serve-index      Starts up the index http server
    query-index      Queries the index server for nearest neighbour
    model-train      Trains a classification model with a resnet50 backbone
    model-infer      Runs inference with a classification model
    model-export     Export a classification model to onnx

๐Ÿ“ท Frames vs. ๐Ÿ“น videos

Sisyphus works with images or videos. For image only corpus you can skip this step. For videos one option (recommended) is to extract key frames. For working directly with videos(s) pleas use Video feature extractor and skip step. Keyframe extraction can be either of the following two options. This option removed frame(s) with little to no motion and vastly reduced corpus size and duplicates in retrieval results.

FFMPEG keyframes

Keyframe extraction using ffmpeg. You can reconstruct video back from keyframes for sanity check.

  # Video to keyframes
  ./scripts/video-to-key-frames /path/to/video /tmp/frames/
  # Keyframes to video
  ./scripts/key-frames-to-video /tmp/result/ nearest.mp4

Image features

We also included an Experimental keyframe extractor built using intra-frame feature similarity (Slow).

  ./bin/sfi frames-extract --help

๐Ÿš€ Feature extraction

  ./bin/sfi feature-extract --help

Extracts high level MAC feature maps for all image frames from a pre-trained convolutional neural net(ResNet-50 + ILSVRC2012). Save the features in individual .npy files with the extracted feature maps in parallel to all image frames. We recommend running this step on GPUs.

๐Ÿ”ญ Feature extraction (Video)

If you prefer to work directly with video(s), please use 3D video classification model for feature extraction. Follow the instruction here. 3D video classification feature extraction tools within Sisyphus as Experimental.

  ./bin/sfi feature-extract-vid --help

๐Ÿค Building index

  ./bin/sfi index-build --help

Builds an index from the .npy feature maps for fast and efficient approximate nearest neighbour queries based on L2 distance. The quantizer for the index needs to get trained on a small subset of the feature maps to approximate the dataset's centroids. Depending on the feature map's spatial resolution (pooled vs. unpooled) we build and save multiple indices (one per depthwise feature map axis).

๐Ÿ“ผ Load index

  ./bin/sfi index-serve --help

Loads up the index (slow) and keeps it in memory to handle nearest neighbour queries (fast). Responds to queries by searching the index, aggregating results, and re-ranking them.

๐Ÿ”ฎ Query index

  ./bin/sfi index-query --help

Sends nearest neighbour requests against the query server and reports results to the user. The query and results are based on the .npy feature maps on which the index was build. The mapping from .npy files and images is saved in <index_file>.json.

๐Ÿš„ Build classifier

The last step would provide a quasi clean dataset which would need manual cleaning to be ready for training. Once you have cleaned the dataset you can run the following and training a classier. The weakly supervised model can run prediction on your image dataset. The --dataset option expects training/validation samples for each class partitioned under path/to/dataset/. F.ex

tree -d 2 /data/experiments/blockages/construction
โ”œโ”€โ”€ train
โ”‚ย ย  โ”œโ”€โ”€ background
โ”‚ย ย  โ””โ”€โ”€ construction
โ””โ”€โ”€ val
    โ”œโ”€โ”€ background
    โ””โ”€โ”€ construction

Trainig/Exporting/Inference tools

  # training
  ./bin/sfi model-train --help
  # inference
  ./bin/sfi model-infer --help
  # export
  ./bin/sfi model-export --help

sisyphus's People

Contributors

daniel-j-h avatar sandhawalia avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Forkers

barryzm lihyin

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.