eriklindernoren / action-recognition Goto Github PK

Exploration of different solutions to action recognition in video, using neural networks implemented in PyTorch.

Shell 0.76% Python 99.24%

action-recognition video-classification pytorch

action-recognition's Introduction

Action Recognition in Video

This repo will serve as a playground where I investigate different approaches to solving the problem of action recognition in video.

I will mainly use the UCF-101 dataset.

Setup

$ cd data/              
$ bash download_ucf101.sh     # Downloads the UCF-101 dataset (~7.2 GB)
$ unrar x UCF101.rar          # Unrars dataset
$ unzip ucfTrainTestlist.zip  # Unzip train / test split
$ python3 extract_frames.py   # Extracts frames from the video (~26.2 GB, go grab a coffee for this)

ConvLSTM

The only approach investigated so far. Enables action recognition in video by a bi-directional LSTM operating on frame embeddings extracted by a pre-trained ResNet-152 (ImageNet).

The model is composed of:

A convolutional feature extractor (ResNet-152) which provides a latent representation of video frames
A bi-directional LSTM classifier which based on the latent representation of the video predicts the activity depicted

I have made a trained model available here.

Train

$ python3 train.py  --dataset_path data/UCF-101-frames/ \
                    --split_path data/ucfTrainTestlist \
                    --num_epochs 200 \
                    --sequence_length 40 \
                    --img_dim 112 \
                    --latent_dim 512

Test on Video

$ python3 test_on_video.py  --video_path data/UCF-101/SoccerPenalty/v_SoccerPenalty_g01_c01.avi \
                            --checkpoint_model model_checkpoints/ConvLSTM_150.pth

Results

The model reaches a classification accuracy of 91.27% accuracy on a randomly sampled test set, composed of 20% of the total amount of video sequences from UCF-101. Will re-train this model on the offical train / test splits and post results as soon as I have time.

action-recognition's People

Contributors

Stargazers

Watchers

action-recognition's Issues

how long it will cost on cpu machine when predict one video.

can anybody share the predict time cost?

when I try to use your pretrained model，it give some error....

Missing key(s) in state_dict: "lstm.lstm.weight_ih_l0_reverse", "lstm.lstm.weight_hh_l0_reverse", "lstm.lstm.bias_ih_l0_reverse", "lstm.lstm.bias_hh_l0_reverse", "output_layers.0.weight", "output_layers.0.bias", "output_layers.1.weight", "output_layers.1.bias", "output_layers.1.running_mean", "output_layers.1.running_var", "output_layers.3.weight", "output_layers.3.bias", "attention_layer.weight", "attention_layer.bias".
Unexpected key(s) in state_dict: "lstm.final.0.weight", "lstm.final.0.bias", "lstm.final.1.weight", "lstm.final.1.bias", "lstm.final.1.running_mean", "lstm.final.1.running_var", "lstm.final.1.num_batches_tracked", "lstm.final.3.weight", "lstm.final.3.bias".

ValueError: not enough values to unpack (expected 2, got 1)

Namespace(dataset_path='UCF-101')
Traceback (most recent call last):
File "extract_frames.py", line 31, in
sequence_type, sequence_name = video_path.split(".avi")[0].split("/")[-2:]
ValueError: not enough values to unpack (expected 2, got 1)

test error

Hello, I have all the requirements to run test_on_video.py but I keep getting a path_to_video error.

Here it is;

Traceback (most recent call last):
File "test_on_video.py", line 38, in
labels = sorted(list(set(os.listdir(opt.video_path))))
NotADirectoryError: [WinError 267] The directory name is invalid: 'C:/Users/Windows/Documents/Action-Recognition/test/v_Surfing_g03_c04.avi'

I have tried everything but it still not working.

What can be done?

size mismatch for output_layers.3.bias: copying a param with shape torch.Size([101]) from checkpoint, the shape in current model is torch.Size([105]).

I get this error when I run test_on_video.py file.

RuntimeError Traceback (most recent call last)
in ()
----> 1 model.load_state_dict(torch.load(checkpoint_model))
2 model.eval()

~/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/nn/modules/module.py in load_state_dict(self, state_dict, strict)
828 if len(error_msgs) > 0:
829 raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
--> 830 self.class.name, "\n\t".join(error_msgs)))
831 return _IncompatibleKeys(missing_keys, unexpected_keys)
832

RuntimeError: Error(s) in loading state_dict for ConvLSTM:
size mismatch for output_layers.3.weight: copying a param with shape torch.Size([101, 1024]) from checkpoint, the shape in current model is torch.Size([105, 1024]).
size mismatch for output_layers.3.bias: copying a param with shape torch.Size([101]) from checkpoint, the shape in current model is torch.Size([105]).

Hello Mr.Linder-Norén

Hello Mr.Linder-Norén, I am very sorry to bother you.
I have downloaded your Action-Recognition code and have learned a lot.
But I still have some a question: Can you teach me how to use the Attention Module in you model?
Thank you very much for your reply.

Paper link for this repository!!!

Hey, I am going to publish my work pretty soon and I want to cite your work. How can I cite your work , is there any paper link for this repository ?

how i can solve it

RuntimeError: Error(s) in loading state_dict for ConvLSTM:
Missing key(s) in state_dict: "lstm.lstm.weight_ih_l0_reverse", "lstm.lstm.weight_hh_l0_reverse", "lstm.lstm.bias_ih_l0_reverse", "lstm.lstm.bias_hh_l0_reverse", "output_layers.0.weight", "output_layers.0.bias", "output_layers.1.weight", "output_layers.1.bias", "output_layers.1.running_mean", "output_layers.1.running_var", "output_layers.3.weight", "output_layers.3.bias", "attention_layer.weight", "attention_layer.bias".
Unexpected key(s) in state_dict: "lstm.final.0.weight", "lstm.final.0.bias", "lstm.final.1.weight", "lstm.final.1.bias", "lstm.final.1.running_mean", "lstm.final.1.running_var", "lstm.final.1.num_batches_tracked", "lstm.final.3.weight", "lstm.final.3.bias"

latent_att not defined before.

Action-Recognition/models.py

Line 68 in b43ec09

latent_att = self.latent_attention(latent_att)

Error: Regarding Loading Pre-Trained Weights

I am using pre-trained weights which is given by you. But I am facing this problem.Please can you guide me to resolve this issue.Thank you

Namespace(channels=3, checkpoint_model='ConvLSTM_150.pth', dataset_path='data/UCF-101-frames', image_dim=224, latent_dim=512, video_path='1.mp4')
Traceback (most recent call last):
File "test_on_video.py", line 49, in
model.load_state_dict(torch.load(opt.checkpoint_model))
File "/home/naeem/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 839, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for ConvLSTM:
Missing key(s) in state_dict: "lstm.lstm.weight_ih_l0_reverse", "lstm.lstm.weight_hh_l0_reverse", "lstm.lstm.bias_ih_l0_reverse", "lstm.lstm.bias_hh_l0_reverse", "lstm.output_layers.0.weight", "lstm.output_layers.0.bias", "lstm.output_layers.1.weight", "lstm.output_layers.1.bias", "lstm.output_layers.1.running_mean", "lstm.output_layers.1.running_var", "lstm.output_layers.3.weight", "lstm.output_layers.3.bias".
Unexpected key(s) in state_dict: "lstm.final.0.weight", "lstm.final.0.bias", "lstm.final.1.weight", "lstm.final.1.bias", "lstm.final.1.running_mean", "lstm.final.1.running_var", "lstm.final.1.num_batches_tracked", "lstm.final.3.weight", "lstm.final.3.bias".

Small bug fix

Thanks for sharing the repo!

extract_frames(video_path, time_left)
should be
extract_frames(video_path)

Action-Recognition/data/extract_frames.py

Line 42 in b43ec09

extract_frames(video_path, time_left),

train.py: error: ambiguous option: --img_dim could match --img_dim_H, --img_dim_W

I get this error when I run train.py file.

    print(sequence[0])
IndexError: list index out of range

AttributeError: 'Namespace' object has no attribute 'sequence_length'

While Testing on test.py, I am getting this error "'Namespace' object has no attribute 'sequence_length'" , and I trained my own model, not using the default one.And if I am specifying the sequence_length for example 40. Its is giving error "IndexError: Dimension out of range (expected to be in range of [-1, 0], but got 1)" Please help

Testing performance on official UCF-101 split 1

I can only get about 76% on UCF-101 split 1 testing dataset and the model seems overfitting...
How can I fix the overfitting problem?

Train issue

This project is really interesting.

I tried to train the model, but i always get a random list index out of range error during the training phase.

I used torch 1.2 till 1.3.1, cuda 10.1, always the same error.

Anyone has an idea how to fix that?

python3 train.py --dataset_path data/UCF-101-frames/ --split_path data/ucfTrainTestlist --num_epochs 200 --sequence_length 20 --img_dim 112 --latent_dim 512 --batch_size 64
Namespace(batch_size=64, channels=3, checkpoint_interval=5, checkpoint_model='', dataset_path='data/UCF-101-frames/', img_dim=112, latent_dim=512, num_epochs=200, sequence_length=20, split_number=1, split_path='data/ucfTrainTestlist')
cuda
--- Epoch 0 ---
[Epoch 0/200] [Batch 22/150] [Loss: 4.612639 (4.613988), Acc: 4.69% (2.31%)] ETA: 8:49:23.620145Traceback (most recent call last):
File "train.py", line 116, in
for batch_i, (X, y) in enumerate(train_dataloader):
File "/home/gary/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 801, in next
return self._process_data(data)
File "/home/gary/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 846, in _process_data
data.reraise()
File "/home/gary/.local/lib/python3.6/site-packages/torch/_utils.py", line 385, in reraise
raise self.exc_type(msg)
IndexError: Caught IndexError in DataLoader worker process 3.
Original Traceback (most recent call last):
File "/home/gary/.local/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
data = fetcher.fetch(index)
File "/home/gary/.local/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/gary/.local/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/opt/4tbdrive1/experiments/Action-Recognition/dataset.py", line 83, in getitem
image_paths = self._pad_to_length(image_paths)
File "/opt/4tbdrive1/experiments/Action-Recognition/dataset.py", line 67, in _pad_to_length
left_pad = sequence[0]
IndexError: list index out of range

Inspired from your work, I updated this repo to use fastai2

You can have a look here:
https://github.com/tcapelle/action_recognition
Thanks you!

Running test_on_video.py encountered "unexpected keyword argument 'input_shape'" error

python3 test_on_video.py --video_path data/UCF-101/SoccerPenalty/v_SoccerPenalty_g01_c01.avi --checkpoint_model model_checkpoints/ConvLSTM_150.pth

Namespace(channels=3, checkpoint_model='model_checkpoints/ConvLSTM_150.pth', dataset_path='data/UCF-101-frames', image_dim=112, latent_dim=512, video_path='data/UCF-101/SoccerPenalty/v_SoccerPenalty_g01_c01.avi')
Traceback (most recent call last):
File "test_on_video.py", line 41, in
model = ConvLSTM(input_shape=input_shape, num_classes=len(labels), latent_dim=opt.latent_dim)
TypeError: init() got an unexpected keyword argument 'input_shape'

list index out of range when start training

Hello, I'm trying to start training with the UCF-101 dataset.

I've done a few adaptations on your code to get where I am now.

I downloaded the ucf 101 dataset in .avi. I then extracted all of the frames using extract_frames.py

After, I downloaded the train and test split files for the dataset from here ,

yjxiong/temporal-segment-networks#177

for the split_path argument I'm passing the path to the folder ( named ucfTrainTestlist ) containing classInd.txt , testlist01.txt , trainlist01.txt

Here are the args im passing to start traning:

python train.py --dataset_path data/frames/-frames/data/frames --split_path ucfTrainTestlist/ --split_number 1

and here is the error I'm getting:

--- Epoch 0 ---
Traceback (most recent call last):
File "train.py", line 115, in
for batch_i, (X, y) in enumerate(train_dataloader):
File "C:\Users\Windows\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 819, in next
return self._process_data(data)
File "C:\Users\Windows\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 846, in _process_data
data.reraise()
File "C:\Users\Windows\Anaconda3\lib\site-packages\torch_utils.py", line 369, in reraise
raise self.exc_type(msg)
IndexError: Caught IndexError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "C:\Users\Windows\Anaconda3\lib\site-packages\torch\utils\data_utils\worker.py", line 178, in _worker_loop
data = fetcher.fetch(index)
File "C:\Users\Windows\Anaconda3\lib\site-packages\torch\utils\data_utils\fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "C:\Users\Windows\Anaconda3\lib\site-packages\torch\utils\data_utils\fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "C:\Users\Windows\Documents\Action-Recognition\dataset.py", line 78, in getitem
image_paths = self._pad_to_length(image_paths)
File "C:\Users\Windows\Documents\Action-Recognition\dataset.py", line 67, in _pad_to_length
left_pad = sequence[0]
IndexError: list index out of range

I point the dataset_path to the folder called frames and inside this folder the video frames are divided in sub folders. These sub folders are named after the names of each video.

Softmax in Model Output, then using CE Loss

Thank you for the interesting work here.

I've just encountered one issue with the code. The ConvLSTM model outputs softmax as the last layer, but then in the training script CrossEntropyLoss is performed. CE Loss already performs a softmax on the input, so you do not want to do softmax on a softmax twice. Instead, the ConvLSTM should output the classification (Linear) layer prior to the Softmax to put into CE loss. The softmax probabilities can be computed later in the test set evaluation step to determine the test accuracy.

Please let me know if others agree with this small change to the code.

Also, what type of Attention is being used? Is it the dot-product?

Terminology mistake

Your model is different from ConvLSTM proposed in this paper: https://arxiv.org/abs/1506.04214, where 2D-LSTM is applied to output of each convolution layer in a CNN, usually used for pixel-level video prediction.

This implementation should be called CNN-LSTM.