Pseudo-3D Residual Networks

This repo implements the network structure of P3D[1] with PyTorch, pre-trained model weights are converted from caffemodel, which is supported from the author's repo

NEWS！！！

First,

The prepared weights at the following section is transfered from the author's. However, due to a difference of pooling operation between CAFFE and PyTorch, The same weights will generate different size of feature map. Anyone that use this repo should know that: this difference will not bring any influence if you use P3D199 to finetune. Of course, you can modify by change the padding value of the pooling layer, then direct inference is also OK(code is updated already).

Second,

Recently, I got the opportunity to train the whole Kinetics data, so I am trying to train a more powerful p3d modelweight based on input size of 3x16x224x224. I will share the weights after the ddl of Anet18! please have a wait.

Requirements:

pytorch
numpy

Structure details

In the author's official repo, only P3D-199 is released. Besides this deepest P3D-199, I also implement P3D-63 and P3D-131, which are respectively modified from ResNet50-3D and ResNet101-3D, the two nets may bring more convenience to users who have only memory-limited GPUs.

Pretrained weights

(Pretrained weights of P3D63 and P3D131 are not yet supported)

(tips: I feel sorry to canceal the download urls of pretrained weights because of some private reasons. For more information you could send emails to me.) (New tips: Model weights now are available.)

1, P3D-199 trained on Kinetics dataset:

BaiduYun url

2, P3D-199 trianed on Kinetics Optical Flow (TVL1):

BaiduYun url

Example Code

from __future__ import print_function
from p3d_model import *
import torch

model = P3D199(pretrained=True,num_classes=400)
model = model.cuda()
data=torch.autograd.Variable(torch.rand(10,3,16,160,160)).cuda()   # if modality=='Flow', please change the 2nd dimension 3==>2
out=model(data)
print(out.size(),out)

Ablation settings

ST-Structures:

All P3D models in this repo support various forms of ST-Structures like ('A','B','C') ,('A','B') and ('A'), code is as follows.
```
model = P3D63(ST_struc=('A','B'))
model = P3D131(ST_struc=('C'))
```
Flow and RGB models:

Set parameter modality='RGB' as 'RGB' model, 'Flow' as flow model. Flow model i trained on TVL1 optical flow images.
```
model= P3D199(pretrained=True,modality='Flow')
```
Finetune the model

when finetuning the models on your custom dataset, use get_optim_policies() to set different learning speed for different layers. e.g. When dataset is small, Only need to train several deepest layers, set slow_rate=0.8 in code, and change the following lr_mult,decay_mult.

please cite this repo if you take use of it.

Experiment Result (Out of the paper)

(All the following results are generated by End-to-End manners).

Some of them have outperforms state of the arts.

Action recognition(mean accuracy on UCF101):

modality/model	RGB	Flow	Fusion
P3D199 (Sports-1M)	88.5%	-	-
P3D199 (Kinetics)	91.2%	92.4%	98.3%

Action localization(mAP on Thumos14):

steps: perframe+watershed

Step	perframe	localization
P3D199(Sports-1M	0.451	0.25
P3D199(Kinetics)	0.569(fused)	0.307

Reference:

[1]Learning Spatio-Temporal Representation with Pseudo-3D Residual,ICCV2017

lvaleriu / pseudo-3d-pytorch Goto Github PK

pseudo-3d-pytorch's Introduction

Pseudo-3D Residual Networks

NEWS！！！

First,

Second,

Requirements:

Structure details

Pretrained weights

Example Code

Ablation settings

Experiment Result (Out of the paper)

(All the following results are generated by End-to-End manners).

steps: perframe+watershed

pseudo-3d-pytorch's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent