Giter Club home page Giter Club logo

videolstm's Introduction

=======================================================================

Requirements

Latest Python2 (python2.7.*)
numpy + scipy
Theano
HDF5

How to Install

Official Installation Guide For Related Packages
    
    Numpy & Scipy:
        http://docs.scipy.org/doc/numpy/user/install.html
    Theano:
        http://deeplearning.net/software/theano/install.html
    HDF5:
        https://hdfgroup.org/HDF5/

How to Config Theano

Theano is the backbone of this project. To configure theano, view theano-config for more detailed help. You need to write the configuration to ~/.theanorc. The followingTheano configuration is recommended

For CPU users:

[global]
floatX = float32
device = cpu
mode = FAST_RUN
warn_float64 = warn

For GPU users(here the device can be any other GPU):

[global]
floatX = float32
device = gpu0
mode = FAST_RUN
warn_float64 = warn

=======================================================================

Library

4 main components:

iterator: data handler

layer: network layers, to construct a network, you have to have 3 kinds of layers

interface layer: declare input, mask, output
middle layer: construct the main network layers (a list of layers)
cost layer: construct the network cost

model: network model

optimizer: optimizer to optimize the model

Data Format

The data_file is an folder path with a list of hdf5 files for videos:

v_ApplyEyeMakeup_g08_c01.h5
v_ApplyEyeMakeup_g08_c02.h5
v_ApplyEyeMakeup_g08_c03.h5
v_ApplyEyeMakeup_g08_c04.h5
v_ApplyEyeMakeup_g08_c05.h5

Each hdf5 file stores all the frame features for this video row by row, i.e., a matrix with size (#frames, #featureDim)

The train_framenum.txt file contains number of frames for each video:

89
123
22
136

The train_filenames.txt file contains the video filenames relative to the root video directory:

v_ApplyEyeMakeup_g08_c01
v_ApplyEyeMakeup_g08_c02
v_ApplyEyeMakeup_g08_c03
v_ApplyEyeMakeup_g08_c04
v_ApplyEyeMakeup_g08_c05

The train_labels.txtfile for single-label datasets looks like

0
7
43

and for multi-label datasets:

0,0,0,0,0,0,0,1,0,0,0,0
0,0,0,0,0,0,0,1,0,0,0,0
0,0,0,0,0,0,1,1,0,0,0,0
0,0,0,0,0,0,0,0,0,0,0,1

The same format is required for the validation and test files too.

=======================================================================

Network Architectures

LSTM: LSTM

ALSTM: Attention LSTM

convLSTM: Convolutional LSTM

convALSTM: Convolutional Attention LSTM

motion ALSTM: Attention LSTM with motion-based attention

motion convALSTM: Convolutional Attention LSTM with motion-based attention

Example to run the scripts:

THEANO_FLAGS='floatX=float32,device=gpu0,mode=FAST_RUN,nvcc.fastmath=True' python evaluate_ucf101_rgb_LSTM.py
THEANO_FLAGS='floatX=float32,device=gpu1,mode=FAST_RUN,nvcc.fastmath=True' python evaluate_ucf101_flow_ALSTM.py

=======================================================================

Feature Extraction

We use extract_rgbcnn.py and extract_flowcnn.py scripts to extract feature maps (e.g. pool5 features) for rgb and flow input. While extract_rgbcnn_fc.py and extract_rgbcnn_fc.py are used to extract fc features.

Caffe network definition files and models:

rgb

prototxt: ucf101_action_rgb_vgg_16_deploy_features_fc7.prototxt, ucf101_action_rgb_vgg_16_deploy_features_pool5.prototxt

model: ucf101_action_rgb_vgg_16_split1.caffemodel

flow (single flow)

prototxt: ucf101_action_singleflow_vgg_16_deploy_features_fc7.prototxt, ucf101_action_singleflow_vgg_16_deploy_features_pool5.prototxt

model: ucf101_action_singleflow_vgg_16_split1.caffemodel

Example to run the scripts:

python extract_rgbcnn.py --model_def ucf101_action_rgb_vgg_16_deploy_features_pool5.prototxt --model ucf101_action_rgb_vgg_16_split1.caffemodel --gpu_id 0

videolstm's People

Contributors

zhenyangli avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.