Giter Club home page Giter Club logo

pytorch-action-recognition's Introduction

Action Recognition in Video Sequences using Deep Bi-directional LSTM with CNN Features

https://ieeexplore.ieee.org/document/8121994/

Features Extraction for YouTube Dataset

This Python script extracts features from videos in the YouTube Dataset using a pre-trained ResNet18 model. The extracted features can be used for training and testing LSTM models to perform action recognition.

Getting Started Prerequisites Python 3 PyTorch scikit-learn OpenCV

Dataset

Download the YouTube Dataset from the following link: https://www.crcv.ucf.edu/data/UCF_YouTube_Action.php

Usage

featrues_extraction.py Set the file path to your dataset in data_dir. Define the number of frames to extract features from in num_frames. Define the pre-processing steps for the images in transform. Load the pre-trained ResNet18 model using models.resnet18(pretrained=True).cuda(). Remove the last layer of the ResNet18 model to obtain the feature extractor using torch.nn.Sequential(*list(resnet.children())[:-1]). Loop over the videos in the dataset folder and extract features using the pre-trained ResNet18 model. Split the samples into training and testing sets and convert the labels to numerical labels using a LabelEncoder. Save the features and labels to numpy arrays.

LSTM Action Recognition Model train_LSTM.py This Python script implements a Long Short-Term Memory (LSTM) neural network for action recognition using features extracted from videos in the YouTube Dataset.

Load the features and labels from numpy arrays using torch.from_numpy(np.load('train_features.npy')).float() and torch.from_numpy(np.load('train_labels.npy')). Define the LSTM model using LSTMClassifier or MultiLayerBiLSTMClassifier classes. Define the loss function and optimizer using nn.CrossEntropyLoss() and optim.Adam() respectively. Train the LSTM model using a for loop over the desired number of epochs and batches. LSTM Model LSTMClassifier: a simple LSTM classifier that takes as input a tensor of shape (batch_size, num_frames, input_size) and outputs a tensor of shape (batch_size, num_classes). MultiLayerBiLSTMClassifier: a multi-layer bidirectional LSTM classifier that takes as input a tensor of shape (batch_size, num_frames, input_size) and outputs a tensor of shape (batch_size, num_classes).

Please cite the following paper

@article{ullah2017action, title={Action recognition in video sequences using deep bi-directional LSTM with CNN features}, author={Ullah, Amin and Ahmad, Jamil and Muhammad, Khan and Sajjad, Muhammad and Baik, Sung Wook}, journal={IEEE access}, volume={6}, pages={1155--1166}, year={2017}, publisher={IEEE} }

pytorch-action-recognition's People

Contributors

aminullah6264 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

pytorch-action-recognition's Issues

Lstm cnn shape

Hello authors,

Your paper is great motivation for lstm research.

  • I want to ask that you are using lstm models after extracting the cnn features. Can you tell the input size of the lstm model.
    Ex: (bs, frame_number, features size)

  • and this is different from conv2dlstm that is provided by the tensor flow ? Which applies lstm for each conv operation.

thank you

ill undocumented operates make this repo hard to follow

there are five matlab files to precess the data. but how this flow from one to a following one?
for example, how to make the label? in the 'onehotLabeling.m' what is the TrainLables variable? how the TotalFeaturesY variable be stored or transited to following operator?
in the python code, how to load the data?...
...

too many ill undocumented operates...make me hard to follow, even i spend a lot of time installing caffe, and to reproduce the amazing results in the paper.
if i miss something, please let me know.

High training accuracy, low testing accuracy

image
"Dear author, thank you very much for your open source work. I ran your code and found that the training accuracy is very high, but the testing accuracy is quite low. Is there any issue with the code?"

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.