Giter Club home page Giter Club logo

action.sr_cnn's Introduction


Prerequisites

Caffe

  • clone and build caffe from here. This caffe version is based on Limin Wang's fork [1] contains merge_batch and weighted_sum layer. In addition it exposed some protected caffe functions in the matlab interface to emulate iter_size in matlab.
  • modify caffe_mex.m to the corresponding caffe matlab interface directory

Optical Flow

Bounding Boxes

  • We extracted 118 objects' bounding boxes in all video frames using Faster-RCNN [2] (retraining is required) and obtained filtered bounding boxes taking consideration of temporal coherency and motion saliency.
  • The extracted and processed bounding boxes for ucf-101 can be downloaded here. Place the downloaded mat files under imdb/cache.
  • If you wish to extract the bounding boxes yourself, you need to be able to run Ren Shaoqing's Faster-RCNN (most codes are migrated into this repository with minor modifications and more comments)
    • First generate raw object detection using faster_rcnn_{dataset}.m
    • Then use action/prepare_rois_context.m to process bounding boxes as described in the paper.

Test

datasets

create dataset.mat using imdb/get_{name}_dataset.m (Directories may need to be adjusted!) An example of generated ucf_dataset.mat

models

  • models/srcnn/{stream} contains model prototxt files

  • model weights can be downloaded in the following links

    Stream person+scene (the final proposed model in the paper)
    spatial split1 split2 split3
    flow split1 split2 split3
  • the reported two-stream results in the paper are yielded from summing spatial and temporal classification scores using weight 1 : 3.

  • other models mentioned in the paper experiments can be provided if the demand is large.

run

in matlab

% test spatial
test_spatial('model_path', path_to_weights, 'split', 1)
% test flow
`test_flow('model_path', path_to_weights, 'split', 1)`

Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems (pp. 91-99).

Wang, L., Xiong, Y., Wang, Z., & Qiao, Y. (2015). Towards good practices for very deep two-stream convnets. arXiv preprint arXiv:1507.02159.


Citation

Please cite the following if you find the code useful.

@inproceedings{wang2016two,
  title={Two-Stream SR-CNNs for Action Recognition in Videos},
  author={Wang, Yifan and Song, Jie and Wang, Limin and Van Gool, Luc and Hilliges, Otmar},
  year={2016},
  organization={BMVC}
}

Contact

Yifan Wang: [email protected]

action.sr_cnn's People

Contributors

yifita avatar

Watchers

James Cloos avatar Chengwu Liang avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.