danbochman / real-time-action-recognition Goto Github PK

Real-time video classification from webcam feed utilizing NLN

Python 100.00%

real-time-action-recognition's Introduction

Non Local Network implementation on the UCF-101 dataset

This repository is a modification of the Two-Stream network based on :
Also utilizing of the Non-Local Block to enhance the spatial CNN of the Two-Stream network:

The main added feature of this repository is adding an inference method to the networks so you can see the model's predictions (Top-5 and their score) in real-time on a webcam feed

Demo

You can click the image to view the demo video ^

Usage

Prerequisites

Please note that this is repository was built on Python 2.7. Unfortunately, at the time of creating this repo, I did not have the best Git protocols and haven't made a proper requirements.txt - My apologies.

Training

If you want to train the model from scratch you need to download the UCF-101 data, I recommend visiting Jefferey's Huang repository linked above and follow his detailed instructions.

Inference

If you just want to run inference download the pre-trained model here:
Link to ResNet101 trained on UCF-101

Then run

python spatial_cnn_gpu --resume /PATH/TO/model_best.pth.tar --demo

You can run a cpu only version just by changing the script's name to spatial_cnn_cpu.py The best real-time results come from running only the Spatial CNN without the Temporal Stream on a GPU

Pre-trained Weights

We didn't include the pre-trained weights to the Non-Local-Network version because we didn't observe any improvement in performance by adding the Non-Local Blocks (NLBs). We believe that very big batch-sizes are required for NLBs to contribute to the precision's score, which we didn't have the resources for.

Reference Papers

real-time-action-recognition's People

Contributors

Stargazers

Watchers

Forkers

danielshafer25 the-faze shiqingshina at1693 faheemkhaskheli9 ronales wenjiebit aiboys yagyapandeya murthy95 edith-panda

real-time-action-recognition's Issues

AttributeError: 'module' object has no attribute 'float32

Hello Sir,
I am trying to use your code but I am getting this error
File "/home/mab73/anaconda2/lib/python2.7/site-packages/torchvision/transforms/transforms.py", line 60, in __call__ img = t(img) File "/home/mab73/anaconda2/lib/python2.7/site-packages/torchvision/transforms/transforms.py", line 163, in __call__ return F.normalize(tensor, self.mean, self.std, self.inplace) File "/home/mab73/anaconda2/lib/python2.7/site-packages/torchvision/transforms/functional.py", line 206, in normalize mean = torch.tensor(mean, dtype=torch.float32) AttributeError: 'module' object has no attribute 'float32'
I am using torch 0.3.1 with cuda 8.0. Any idea what is going on, are you using different torch version?

IOError: [Errno 2] No such file or directory: '/hdd/UCF-101/Data/jpegs_256/v_Swing_g09_c02/frame000031.jpg'

Hy sir! you suggest me that i should change my python version.
Then i shift to Linux Operating System because it already give me that version of python and i already have this OS on my laptop.
Sir the pickle error is solved...
But Sir I got another error which is this one

salmaucp@salmaucp-ThinkPad-T430s:~/Real-Time-Action-Recognition$ python spatial_cnn_cpu.py --resume model_best.pth.tar --demo
Namespace(batch_size=8, demo=True, epochs=500, evaluate=False, lr=0.0005, resume='model_best.pth.tar', start_epoch=0)
==> (Training video, Validation video):( 9537 3783 )
==> sampling testing frames
==> Training data : 9537 frames
Traceback (most recent call last):
File "spatial_cnn_cpu.py", line 333, in
main()
File "spatial_cnn_cpu.py", line 37, in main
train_loader, test_loader, test_video = data_loader.run()
File "/home/salmaucp/Real-Time-Action-Recognition/dataloader/spatial_dataloader.py", line 98, in run
train_loader = self.train()
File "/home/salmaucp/Real-Time-Action-Recognition/dataloader/spatial_dataloader.py", line 131, in train
print training_set[1][0]['img1'].size()
File "/home/salmaucp/Real-Time-Action-Recognition/dataloader/spatial_dataloader.py", line 59, in getitem
data[key] = self.load_ucf_image(video_name, index)
File "/home/salmaucp/Real-Time-Action-Recognition/dataloader/spatial_dataloader.py", line 29, in load_ucf_image
img = Image.open(path + 'frame{}.jpg'.format(str(index).zfill(6)))
File "/home/salmaucp/.local/lib/python2.7/site-packages/PIL/Image.py", line 2766, in open
fp = builtins.open(filename, "rb")
IOError: [Errno 2] No such file or directory: '/hdd/UCF-101/Data/jpegs_256/v_Swing_g09_c02/frame000031.jpg'

Sir I tried to solved it by downloading ucf jpeg 256 parts but i cant get required output
Sir please Guide me over this problem. kindly please guide me sir!!!
Thank You So mUch...

Custom Dataset Training

Hi @danbochman

Can we train it on Custom Video Data set with 8 classes.

Please help me..

Import error

File "spatial_cnn_cpu.py", line 5, in
from dataloader import UCF101_splitter
ImportError: cannot import name 'UCF101_splitter'

_pickle.UnpicklingError: the STRING opcode argument must be quoted

Hey while running the following command
python spatial_cnn_cpu.py --resume model_best.pth.tar --demo

I got this pickle error
File "spatial_cnn_cpu.py", line 38, in main
train_loader, test_loader, test_video = data_loader.run()
File "C:\Users\LeNoVo T430\Desktop\Real-Time-Action-Recognition\dataloader\spatial_dataloader.py", line 96, in run
self.load_frame_count()
File "C:\Users\LeNoVo T430\Desktop\Real-Time-Action-Recognition\dataloader\spatial_dataloader.py", line 85, in load_frame_count
dic_frame = pickle.load(file)

_pickle.UnpicklingError: the STRING opcode argument must be quoted

plz guide me how can i solve this problem

Accuracy problem

Hi
Thank you for your great work first!
However I got a question on the accuracy of the real time recognition
I just cloned your repo, modified some code to adapt to py 3.5 and deleted all code about training and evaling.
After downloading your pretrained model, by running spatial_cnn_cpu.py, I can get the recognition result but the accuracy is low and the confidence score is only about 0.45 maximum, quite different from the demo video given in the repo.
Is that in the demo video you are running on a gpu, using both spatial cnn and motion cnn, and finally fused the result?
Or in spatial_cnn_cpu.py you just use rgb as input, but in spatial_cnn_gpu, optflow also?

danbochman / real-time-action-recognition Goto Github PK

real-time-action-recognition's Introduction

Non Local Network implementation on the UCF-101 dataset

Demo

Usage

Prerequisites

Training

Inference

Pre-trained Weights

Reference Papers

real-time-action-recognition's People

Contributors

Stargazers

Watchers

Forkers

real-time-action-recognition's Issues

Recommend Projects

Recommend Topics

Recommend Org