Giter Club home page Giter Club logo

action-recognition's Introduction

Action Recognition in Video

This repo will serve as a playground where I investigate different approaches to solving the problem of action recognition in video.

I will mainly use the UCF-101 dataset.

Setup

$ cd data/              
$ bash download_ucf101.sh     # Downloads the UCF-101 dataset (~7.2 GB)
$ unrar x UCF101.rar          # Unrars dataset
$ unzip ucfTrainTestlist.zip  # Unzip train / test split
$ python3 extract_frames.py   # Extracts frames from the video (~26.2 GB, go grab a coffee for this)

ConvLSTM

The only approach investigated so far. Enables action recognition in video by a bi-directional LSTM operating on frame embeddings extracted by a pre-trained ResNet-152 (ImageNet).

The model is composed of:

  • A convolutional feature extractor (ResNet-152) which provides a latent representation of video frames
  • A bi-directional LSTM classifier which based on the latent representation of the video predicts the activity depicted

I have made a trained model available here.

Train

$ python3 train.py  --dataset_path data/UCF-101-frames/ \
                    --split_path data/ucfTrainTestlist \
                    --num_epochs 200 \
                    --sequence_length 40 \
                    --img_dim 112 \
                    --latent_dim 512

Test on Video

$ python3 test_on_video.py  --video_path data/UCF-101/SoccerPenalty/v_SoccerPenalty_g01_c01.avi \
                            --checkpoint_model model_checkpoints/ConvLSTM_150.pth

Results

The model reaches a classification accuracy of 91.27% accuracy on a randomly sampled test set, composed of 20% of the total amount of video sequences from UCF-101. Will re-train this model on the offical train / test splits and post results as soon as I have time.

action-recognition's People

Contributors

eriklindernoren avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

action-recognition's Issues

Running test_on_video.py encountered "unexpected keyword argument 'input_shape'" error

python3 test_on_video.py --video_path data/UCF-101/SoccerPenalty/v_SoccerPenalty_g01_c01.avi --checkpoint_model model_checkpoints/ConvLSTM_150.pth

Namespace(channels=3, checkpoint_model='model_checkpoints/ConvLSTM_150.pth', dataset_path='data/UCF-101-frames', image_dim=112, latent_dim=512, video_path='data/UCF-101/SoccerPenalty/v_SoccerPenalty_g01_c01.avi')
Traceback (most recent call last):
File "test_on_video.py", line 41, in
model = ConvLSTM(input_shape=input_shape, num_classes=len(labels), latent_dim=opt.latent_dim)
TypeError: init() got an unexpected keyword argument 'input_shape'

Paper link for this repository!!!

Hey, I am going to publish my work pretty soon and I want to cite your work. How can I cite your work , is there any paper link for this repository ?

how i can solve it

RuntimeError: Error(s) in loading state_dict for ConvLSTM:
Missing key(s) in state_dict: "lstm.lstm.weight_ih_l0_reverse", "lstm.lstm.weight_hh_l0_reverse", "lstm.lstm.bias_ih_l0_reverse", "lstm.lstm.bias_hh_l0_reverse", "output_layers.0.weight", "output_layers.0.bias", "output_layers.1.weight", "output_layers.1.bias", "output_layers.1.running_mean", "output_layers.1.running_var", "output_layers.3.weight", "output_layers.3.bias", "attention_layer.weight", "attention_layer.bias".
Unexpected key(s) in state_dict: "lstm.final.0.weight", "lstm.final.0.bias", "lstm.final.1.weight", "lstm.final.1.bias", "lstm.final.1.running_mean", "lstm.final.1.running_var", "lstm.final.1.num_batches_tracked", "lstm.final.3.weight", "lstm.final.3.bias"

list index out of range when start training

Hello, I'm trying to start training with the UCF-101 dataset.

I've done a few adaptations on your code to get where I am now.

I downloaded the ucf 101 dataset in .avi. I then extracted all of the frames using extract_frames.py

After, I downloaded the train and test split files for the dataset from here ,

yjxiong/temporal-segment-networks#177

for the split_path argument I'm passing the path to the folder ( named ucfTrainTestlist ) containing classInd.txt , testlist01.txt , trainlist01.txt

Here are the args im passing to start traning:

python train.py --dataset_path data/frames/-frames/data/frames --split_path ucfTrainTestlist/ --split_number 1

and here is the error I'm getting:

--- Epoch 0 ---
Traceback (most recent call last):
File "train.py", line 115, in
for batch_i, (X, y) in enumerate(train_dataloader):
File "C:\Users\Windows\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 819, in next
return self._process_data(data)
File "C:\Users\Windows\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 846, in _process_data
data.reraise()
File "C:\Users\Windows\Anaconda3\lib\site-packages\torch_utils.py", line 369, in reraise
raise self.exc_type(msg)
IndexError: Caught IndexError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "C:\Users\Windows\Anaconda3\lib\site-packages\torch\utils\data_utils\worker.py", line 178, in _worker_loop
data = fetcher.fetch(index)
File "C:\Users\Windows\Anaconda3\lib\site-packages\torch\utils\data_utils\fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "C:\Users\Windows\Anaconda3\lib\site-packages\torch\utils\data_utils\fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "C:\Users\Windows\Documents\Action-Recognition\dataset.py", line 78, in getitem
image_paths = self._pad_to_length(image_paths)
File "C:\Users\Windows\Documents\Action-Recognition\dataset.py", line 67, in _pad_to_length
left_pad = sequence[0]
IndexError: list index out of range

I point the dataset_path to the folder called frames and inside this folder the video frames are divided in sub folders. These sub folders are named after the names of each video.

Hello Mr.Linder-Norén

Hello Mr.Linder-Norén, I am very sorry to bother you.
I have downloaded your Action-Recognition code and have learned a lot.
But I still have some a question: Can you teach me how to use the Attention Module in you model?
Thank you very much for your reply.

when I try to use your pretrained model,it give some error....

Missing key(s) in state_dict: "lstm.lstm.weight_ih_l0_reverse", "lstm.lstm.weight_hh_l0_reverse", "lstm.lstm.bias_ih_l0_reverse", "lstm.lstm.bias_hh_l0_reverse", "output_layers.0.weight", "output_layers.0.bias", "output_layers.1.weight", "output_layers.1.bias", "output_layers.1.running_mean", "output_layers.1.running_var", "output_layers.3.weight", "output_layers.3.bias", "attention_layer.weight", "attention_layer.bias".
Unexpected key(s) in state_dict: "lstm.final.0.weight", "lstm.final.0.bias", "lstm.final.1.weight", "lstm.final.1.bias", "lstm.final.1.running_mean", "lstm.final.1.running_var", "lstm.final.1.num_batches_tracked", "lstm.final.3.weight", "lstm.final.3.bias".

ValueError: not enough values to unpack (expected 2, got 1)

Namespace(dataset_path='UCF-101')
Traceback (most recent call last):
File "extract_frames.py", line 31, in
sequence_type, sequence_name = video_path.split(".avi")[0].split("/")[-2:]
ValueError: not enough values to unpack (expected 2, got 1)

Softmax in Model Output, then using CE Loss

Thank you for the interesting work here.

I've just encountered one issue with the code. The ConvLSTM model outputs softmax as the last layer, but then in the training script CrossEntropyLoss is performed. CE Loss already performs a softmax on the input, so you do not want to do softmax on a softmax twice. Instead, the ConvLSTM should output the classification (Linear) layer prior to the Softmax to put into CE loss. The softmax probabilities can be computed later in the test set evaluation step to determine the test accuracy.

Please let me know if others agree with this small change to the code.

Also, what type of Attention is being used? Is it the dot-product?

Train issue

This project is really interesting.

I tried to train the model, but i always get a random list index out of range error during the training phase.

I used torch 1.2 till 1.3.1, cuda 10.1, always the same error.

Anyone has an idea how to fix that?

python3 train.py --dataset_path data/UCF-101-frames/ --split_path data/ucfTrainTestlist --num_epochs 200 --sequence_length 20 --img_dim 112 --latent_dim 512 --batch_size 64
Namespace(batch_size=64, channels=3, checkpoint_interval=5, checkpoint_model='', dataset_path='data/UCF-101-frames/', img_dim=112, latent_dim=512, num_epochs=200, sequence_length=20, split_number=1, split_path='data/ucfTrainTestlist')
cuda
--- Epoch 0 ---
[Epoch 0/200] [Batch 22/150] [Loss: 4.612639 (4.613988), Acc: 4.69% (2.31%)] ETA: 8:49:23.620145Traceback (most recent call last):
File "train.py", line 116, in
for batch_i, (X, y) in enumerate(train_dataloader):
File "/home/gary/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 801, in next
return self._process_data(data)
File "/home/gary/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 846, in _process_data
data.reraise()
File "/home/gary/.local/lib/python3.6/site-packages/torch/_utils.py", line 385, in reraise
raise self.exc_type(msg)
IndexError: Caught IndexError in DataLoader worker process 3.
Original Traceback (most recent call last):
File "/home/gary/.local/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
data = fetcher.fetch(index)
File "/home/gary/.local/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/gary/.local/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/opt/4tbdrive1/experiments/Action-Recognition/dataset.py", line 83, in getitem
image_paths = self._pad_to_length(image_paths)
File "/opt/4tbdrive1/experiments/Action-Recognition/dataset.py", line 67, in _pad_to_length
left_pad = sequence[0]
IndexError: list index out of range

size mismatch for output_layers.3.bias: copying a param with shape torch.Size([101]) from checkpoint, the shape in current model is torch.Size([105]).

I get this error when I run test_on_video.py file.


RuntimeError Traceback (most recent call last)
in ()
----> 1 model.load_state_dict(torch.load(checkpoint_model))
2 model.eval()

~/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/nn/modules/module.py in load_state_dict(self, state_dict, strict)
828 if len(error_msgs) > 0:
829 raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
--> 830 self.class.name, "\n\t".join(error_msgs)))
831 return _IncompatibleKeys(missing_keys, unexpected_keys)
832

RuntimeError: Error(s) in loading state_dict for ConvLSTM:
size mismatch for output_layers.3.weight: copying a param with shape torch.Size([101, 1024]) from checkpoint, the shape in current model is torch.Size([105, 1024]).
size mismatch for output_layers.3.bias: copying a param with shape torch.Size([101]) from checkpoint, the shape in current model is torch.Size([105]).

test error

Hello, I have all the requirements to run test_on_video.py but I keep getting a path_to_video error.

Here it is;

Traceback (most recent call last):
File "test_on_video.py", line 38, in
labels = sorted(list(set(os.listdir(opt.video_path))))
NotADirectoryError: [WinError 267] The directory name is invalid: 'C:/Users/Windows/Documents/Action-Recognition/test/v_Surfing_g03_c04.avi'

I have tried everything but it still not working.

What can be done?

AttributeError: 'Namespace' object has no attribute 'sequence_length'

While Testing on test.py, I am getting this error "'Namespace' object has no attribute 'sequence_length'" , and I trained my own model, not using the default one.And if I am specifying the sequence_length for example 40. Its is giving error "IndexError: Dimension out of range (expected to be in range of [-1, 0], but got 1)" Please help

Terminology mistake

Your model is different from ConvLSTM proposed in this paper: https://arxiv.org/abs/1506.04214, where 2D-LSTM is applied to output of each convolution layer in a CNN, usually used for pixel-level video prediction.

This implementation should be called CNN-LSTM.

Error: Regarding Loading Pre-Trained Weights

I am using pre-trained weights which is given by you. But I am facing this problem.Please can you guide me to resolve this issue.Thank you

Namespace(channels=3, checkpoint_model='ConvLSTM_150.pth', dataset_path='data/UCF-101-frames', image_dim=224, latent_dim=512, video_path='1.mp4')
Traceback (most recent call last):
File "test_on_video.py", line 49, in
model.load_state_dict(torch.load(opt.checkpoint_model))
File "/home/naeem/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 839, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for ConvLSTM:
Missing key(s) in state_dict: "lstm.lstm.weight_ih_l0_reverse", "lstm.lstm.weight_hh_l0_reverse", "lstm.lstm.bias_ih_l0_reverse", "lstm.lstm.bias_hh_l0_reverse", "lstm.output_layers.0.weight", "lstm.output_layers.0.bias", "lstm.output_layers.1.weight", "lstm.output_layers.1.bias", "lstm.output_layers.1.running_mean", "lstm.output_layers.1.running_var", "lstm.output_layers.3.weight", "lstm.output_layers.3.bias".
Unexpected key(s) in state_dict: "lstm.final.0.weight", "lstm.final.0.bias", "lstm.final.1.weight", "lstm.final.1.bias", "lstm.final.1.running_mean", "lstm.final.1.running_var", "lstm.final.1.num_batches_tracked", "lstm.final.3.weight", "lstm.final.3.bias".

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.