Giter Club home page Giter Club logo

eco-efficient-video-understanding's Introduction

Code and models of paper. " ECO: Efficient Convolutional Network for Online Video Understanding, European Conference on Computer Vision (ECCV), 2018."

By Mohammadreza Zolfaghari, Kamaljeet Singh, Thomas Brox

NEW

๐Ÿ PyTorch implementation for ECO paper now is available here. Many thanks to @zhang-can.

Update

  • 2018.9.06: Providing PyTorch implementation
  • 2018.8.01: Scripts for online recognition and video captioning
  • 2018.7.30: Adding codes and models
  • 2018.4.17: Repository for ECO.

Introduction

This repository will contains all the required models and scripts for the paper ECO: Efficient Convolutional Network for Online Video Understanding.

In this work, we introduce a network architecture that takes long-term content into account and enables fast per-video processing at the same time. The architecture is based on merging long-term content already in the network rather than in a post-hoc fusion. Together with a sampling strategy, which exploits that neighboring frames are largely redundant, this yields high-quality action classification and video captioning at up to 230 videos per second, where each video can consist of a few hundred frames. The approach achieves competitive performance across all datasets while being 10x to 80x faster than state-of-the-art methods.

Results

Action Recognition on UCF101 and HMDB51 Video Captioning on MSVD dataset

Online Video Understanding Results

Model trained on UCF101 dataset Model trained on Something-Something dataset

Requirements

  1. Requirements for Python
  2. Requirements for Caffe (see: Caffe installation instructions)

Installation

Build Caffe

We used the following configurations with cmake:

  • Cuda 8

  • Python 3

  • Google protobuf 3.1

  • Opencv 3.2

    cd $caffe_3d/
    mkdir build && cd build
    cmake .. 
    make && make install

Usage

After successfully completing the installation, you are ready to run all the following experiments.

Data list format

```
    /path_to_video_folder number_of_frames video_label
```

Our script for creating kinetics data list.

Training

  1. Download the initialization and trained models:

        sh download_models.sh

This will download the following models:

  • Initialization models for 2D and 3D networks (bn_inception_kinetics and 112_c3d_resnet_18_kinetics)

  • Pre-trained models of ECO Lite and ECO Full on the following datasets:

    • Kinetics (400)
    • UCF101
    • HMDB51
    • SomethingSomething (v1)

    *We will provide the results and pre-trained models on Kinetics 600 and SomethingSomething V2 soon.

  1. Train ECO Lite on kinetics dataset:

     sh models_ECO_Lite/kinetics/run.sh
    

Different number of segments

Here, we explain how to modify ".prototxt" to train network with 8 segments.

  1. In the "VideoData" layer change we set num_segments: 8
  2. For transform_param of "VideoData" layer copy the following mean values 8 times:
    mean_value: [104]
    mean_value: [117]
    mean_value: [123]
    
  3. In the "r2Dto3D" layer set the shape as shape { dim: -1 dim: 8 dim: 96 dim: 28 dim: 28 }
  4. Set the kernel_size of "global_pool" layer as kernel_size: [2, 7, 7]

TODO

  1. Data
  2. Tables and Results
  3. Demo

Citation

If you use this code or ideas from the paper for your research, please cite our paper:

@inproceedings{ECO_eccv18,
author={Mohammadreza Zolfaghari and
               Kamaljeet Singh and
               Thomas Brox},
title={{ECO:} Efficient Convolutional Network for Online Video Understanding},	       
booktitle={ECCV},
year={2018}
}

Contact

Mohammadreza Zolfaghari

Questions can also be left as issues in the repository. We will be happy to answer them.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.