Giter Club home page Giter Club logo

pseudo-3d-pytorch's Introduction

Pseudo-3D Residual Networks

This repo implements the network structure of P3D[1] with PyTorch, pre-trained model weights are converted from caffemodel, which is supported from the author's repo

NEWS!!!

First,

The prepared weights at the following section is transfered from the author's. However, due to a difference of pooling operation between CAFFE and PyTorch, The same weights will generate different size of feature map. Anyone that use this repo should know that: this difference will not bring any influence if you use P3D199 to finetune. Of course, you can modify by change the padding value of the pooling layer, then direct inference is also OK(code is updated already).

Second,

Recently, I got the opportunity to train the whole Kinetics data, so I am trying to train a more powerful p3d modelweight based on input size of 3x16x224x224. I will share the weights after the ddl of Anet18! please have a wait.

Requirements:

  • pytorch
  • numpy

Structure details

In the author's official repo, only P3D-199 is released. Besides this deepest P3D-199, I also implement P3D-63 and P3D-131, which are respectively modified from ResNet50-3D and ResNet101-3D, the two nets may bring more convenience to users who have only memory-limited GPUs.

Pretrained weights

(Pretrained weights of P3D63 and P3D131 are not yet supported)

(tips: I feel sorry to canceal the download urls of pretrained weights because of some private reasons. For more information you could send emails to me.) (New tips: Model weights now are available.)

1, P3D-199 trained on Kinetics dataset:

BaiduYun url

2, P3D-199 trianed on Kinetics Optical Flow (TVL1):

BaiduYun url

Example Code

from __future__ import print_function
from p3d_model import *
import torch

model = P3D199(pretrained=True,num_classes=400)
model = model.cuda()
data=torch.autograd.Variable(torch.rand(10,3,16,160,160)).cuda()   # if modality=='Flow', please change the 2nd dimension 3==>2
out=model(data)
print(out.size(),out)

Ablation settings

  1. ST-Structures:

    All P3D models in this repo support various forms of ST-Structures like ('A','B','C') ,('A','B') and ('A'), code is as follows.

    model = P3D63(ST_struc=('A','B'))
    model = P3D131(ST_struc=('C'))
    
  2. Flow and RGB models:

    Set parameter modality='RGB' as 'RGB' model, 'Flow' as flow model. Flow model i trained on TVL1 optical flow images.

    model= P3D199(pretrained=True,modality='Flow')
    
  3. Finetune the model

    when finetuning the models on your custom dataset, use get_optim_policies() to set different learning speed for different layers. e.g. When dataset is small, Only need to train several deepest layers, set slow_rate=0.8 in code, and change the following lr_mult,decay_mult.


please cite this repo if you take use of it.

Experiment Result (Out of the paper)

(All the following results are generated by End-to-End manners).

Some of them have outperforms state of the arts.

  • Action recognition(mean accuracy on UCF101):
modality/model RGB Flow Fusion
P3D199 (Sports-1M) 88.5% - -
P3D199 (Kinetics) 91.2% 92.4% 98.3%
  • Action localization(mAP on Thumos14):

steps: perframe+watershed

Step perframe localization
P3D199(Sports-1M 0.451 0.25
P3D199(Kinetics) 0.569(fused) 0.307

Reference:

[1]Learning Spatio-Temporal Representation with Pseudo-3D Residual,ICCV2017

pseudo-3d-pytorch's People

Contributors

qijiezhao avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.