Giter Club home page Giter Club logo

real-time-action-recognition's Introduction

Non Local Network implementation on the UCF-101 dataset

This repository is a modification of the Two-Stream network based on Jeffrey Huang's work:
Also utilizing AlexHex7's PyTorch implementation of the Non-Local Block to enhance the spatial CNN of the Two-Stream network:

The main added feature of this repository is adding an inference method to the networks so you can see the model's predictions (Top-5 and their score) in real-time on a webcam feed

Demo

Link to demo video
You can click the image to view the demo video ^

Usage

Prerequisites

Please note that this is repository was built on Python 2.7. Unfortunately, at the time of creating this repo, I did not have the best Git protocols and haven't made a proper requirements.txt - My apologies.

Training

If you want to train the model from scratch you need to download the UCF-101 data, I recommend visiting Jefferey's Huang repository linked above and follow his detailed instructions.

Inference

If you just want to run inference download the pre-trained model here:
Link to ResNet101 trained on UCF-101

Then run

python spatial_cnn_gpu --resume /PATH/TO/model_best.pth.tar --demo

You can run a cpu only version just by changing the script's name to spatial_cnn_cpu.py The best real-time results come from running only the Spatial CNN without the Temporal Stream on a GPU

Pre-trained Weights

We didn't include the pre-trained weights to the Non-Local-Network version because we didn't observe any improvement in performance by adding the Non-Local Blocks (NLBs). We believe that very big batch-sizes are required for NLBs to contribute to the precision's score, which we didn't have the resources for.

Reference Papers

real-time-action-recognition's People

Contributors

danbochman avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

real-time-action-recognition's Issues

AttributeError: 'module' object has no attribute 'float32

Hello Sir,
I am trying to use your code but I am getting this error
File "/home/mab73/anaconda2/lib/python2.7/site-packages/torchvision/transforms/transforms.py", line 60, in __call__ img = t(img) File "/home/mab73/anaconda2/lib/python2.7/site-packages/torchvision/transforms/transforms.py", line 163, in __call__ return F.normalize(tensor, self.mean, self.std, self.inplace) File "/home/mab73/anaconda2/lib/python2.7/site-packages/torchvision/transforms/functional.py", line 206, in normalize mean = torch.tensor(mean, dtype=torch.float32) AttributeError: 'module' object has no attribute 'float32'
I am using torch 0.3.1 with cuda 8.0. Any idea what is going on, are you using different torch version?

IOError: [Errno 2] No such file or directory: '/hdd/UCF-101/Data/jpegs_256/v_Swing_g09_c02/frame000031.jpg'

Hy sir! you suggest me that i should change my python version.
Then i shift to Linux Operating System because it already give me that version of python and i already have this OS on my laptop.
Sir the pickle error is solved...
But Sir I got another error which is this one

salmaucp@salmaucp-ThinkPad-T430s:~/Real-Time-Action-Recognition$ python spatial_cnn_cpu.py --resume model_best.pth.tar --demo
Namespace(batch_size=8, demo=True, epochs=500, evaluate=False, lr=0.0005, resume='model_best.pth.tar', start_epoch=0)
==> (Training video, Validation video):( 9537 3783 )
==> sampling testing frames
==> Training data : 9537 frames
Traceback (most recent call last):
File "spatial_cnn_cpu.py", line 333, in
main()
File "spatial_cnn_cpu.py", line 37, in main
train_loader, test_loader, test_video = data_loader.run()
File "/home/salmaucp/Real-Time-Action-Recognition/dataloader/spatial_dataloader.py", line 98, in run
train_loader = self.train()
File "/home/salmaucp/Real-Time-Action-Recognition/dataloader/spatial_dataloader.py", line 131, in train
print training_set[1][0]['img1'].size()
File "/home/salmaucp/Real-Time-Action-Recognition/dataloader/spatial_dataloader.py", line 59, in getitem
data[key] = self.load_ucf_image(video_name, index)
File "/home/salmaucp/Real-Time-Action-Recognition/dataloader/spatial_dataloader.py", line 29, in load_ucf_image
img = Image.open(path + 'frame{}.jpg'.format(str(index).zfill(6)))
File "/home/salmaucp/.local/lib/python2.7/site-packages/PIL/Image.py", line 2766, in open
fp = builtins.open(filename, "rb")
IOError: [Errno 2] No such file or directory: '/hdd/UCF-101/Data/jpegs_256/v_Swing_g09_c02/frame000031.jpg'

Sir I tried to solved it by downloading ucf jpeg 256 parts but i cant get required output
Sir please Guide me over this problem. kindly please guide me sir!!!
Thank You So mUch...

Import error

File "spatial_cnn_cpu.py", line 5, in
from dataloader import UCF101_splitter
ImportError: cannot import name 'UCF101_splitter'

_pickle.UnpicklingError: the STRING opcode argument must be quoted

Hey while running the following command
python spatial_cnn_cpu.py --resume model_best.pth.tar --demo

I got this pickle error
File "spatial_cnn_cpu.py", line 38, in main
train_loader, test_loader, test_video = data_loader.run()
File "C:\Users\LeNoVo T430\Desktop\Real-Time-Action-Recognition\dataloader\spatial_dataloader.py", line 96, in run
self.load_frame_count()
File "C:\Users\LeNoVo T430\Desktop\Real-Time-Action-Recognition\dataloader\spatial_dataloader.py", line 85, in load_frame_count
dic_frame = pickle.load(file)

_pickle.UnpicklingError: the STRING opcode argument must be quoted

plz guide me how can i solve this problem

Accuracy problem

Hi
Thank you for your great work first!
However I got a question on the accuracy of the real time recognition
I just cloned your repo, modified some code to adapt to py 3.5 and deleted all code about training and evaling.
After downloading your pretrained model, by running spatial_cnn_cpu.py, I can get the recognition result but the accuracy is low and the confidence score is only about 0.45 maximum, quite different from the demo video given in the repo.
Is that in the demo video you are running on a gpu, using both spatial cnn and motion cnn, and finally fused the result?
Or in spatial_cnn_cpu.py you just use rgb as input, but in spatial_cnn_gpu, optflow also?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.