zhoubolei / trn-pytorch Goto Github PK

Temporal Relation Networks

Home Page: http://relation.csail.mit.edu/

License: Other

Python 95.90% Shell 4.10%

trn-pytorch's Introduction

Temporal Relation Networks

NEW(April 20, 2020): Check out our recent CVPR'20 work Temporal Pyramid Networks (TPN) for action recognition, which outperforms TRN with a large margin and achieves close to SOTA results on many video benchmarks with RGB stream only. Bonus: bolei_juggling_v2.mp4 is attached in that repo!

We release the code of the Temporal Relation Networks, built on top of the TSN-pytorch codebase.

NEW (July 29, 2018): This work is accepted to ECCV'18, check the paper for the latest result. We also release the state of the art model trained on the Something-Something V2, see following instruction.

Note: always use git clone --recursive https://github.com/metalbubble/TRN-pytorch to clone this project Otherwise you will not be able to use the inception series CNN architecture.

Data preparation

Download the something-something dataset or jester dataset or charades dataset. Decompress them into some folder. Use process_dataset.py to generate the index files for train, val, and test split. Finally properly set up the train, validation, and category meta files in datasets_video.py.

For Something-Something-V2, we provide a utilty script extract_frames.py for converting the downloaded .webm videos into directories containing extracted frames. Additionally, the corresponding optic flow images can be downloaded from here.

Code

Core code to implement the Temporal Relation Network module is TRNmodule. It is plug-and-play on top of the TSN.

Training and Testing

The command to train single scale TRN

CUDA_VISIBLE_DEVICES=0,1 python main.py something RGB \
                     --arch BNInception --num_segments 3 \
                     --consensus_type TRN --batch-size 64

The command to train multi-scale TRN

CUDA_VISIBLE_DEVICES=0,1 python main.py something RGB \
                     --arch BNInception --num_segments 8 \
                     --consensus_type TRNmultiscale --batch-size 64

The command to test the single scale TRN

python test_models.py something RGB model/TRN_something_RGB_BNInception_TRN_segment3_best.pth.tar \
   --arch BNInception --crop_fusion_type TRN --test_segments 3

The command to test the multi-scale TRN

python test_models.py something RGB model/TRN_something_RGB_BNInception_TRNmultiscale_segment8_best.pth.tar \
   --arch BNInception --crop_fusion_type TRNmultiscale --test_segments 8

Pretrained models and demo code

Download pretrained models on Something-Something, Something-Something-V2, Jester, and Moments in Time

./download.sh

Download sample video and extracted frames. There will be mp4 video file and a folder containing the RGB frames for that video.

cd sample_data
./download_sample_data.sh

The sample video is the following

Test pretrained model trained on Something-Something-V2

python test_video.py --arch BNInception --dataset somethingv2 \
    --weights pretrain/TRN_somethingv2_RGB_BNInception_TRNmultiscale_segment8_best.pth.tar \
    --frame_folder sample_data/bolei_juggling

RESULT ON sample_data/bolei_juggling
0.500 -> Throwing something in the air and catching it
0.141 -> Throwing something in the air and letting it fall
0.072 -> Pretending to throw something
0.024 -> Throwing something
0.024 -> Hitting something with something

Test pretrained model trained on Moments in Time

python test_video.py --arch InceptionV3 --dataset moments \
    --weights pretrain/TRN_moments_RGB_InceptionV3_TRNmultiscale_segment8_best.pth.tar \
    --frame_folder sample_data/bolei_juggling

RESULT ON sample_data/bolei_juggling

0.982 -> juggling
0.003 -> flipping
0.003 -> spinning
0.003 -> smoking
0.002 -> whistling

Test pretrained model on mp4 video file

python test_video.py --arch InceptionV3 --dataset moments \
    --weights pretrain/TRN_moments_RGB_InceptionV3_TRNmultiscale_segment8_best.pth.tar \
    --video_file sample_data/bolei_juggling.mp4 --rendered_output sample_data/predicted_video.mp4

The command above uses ffmpeg to extract frames from the supplied video --video_file and optionally generates a new video --rendered_output from the frames used to make the prediction with the predicted category in the top-left corner.

Gesture recognition web-cam demo script python fps_dem_trn.py

TODO

TODO: Web-cam demo script
TODO: Visualization script
TODO: class-aware data augmentation

Reference:

B. Zhou, A. Andonian, and A. Torralba. Temporal Relational Reasoning in Videos. European Conference on Computer Vision (ECCV), 2018. PDF

@article{zhou2017temporalrelation,
    title = {Temporal Relational Reasoning in Videos},
    author = {Zhou, Bolei and Andonian, Alex and Oliva, Aude and Torralba, Antonio},
    journal={European Conference on Computer Vision},
    year={2018}
}

Acknowledgement

Our temporal relation network is plug-and-play on top of the TSN-Pytorch, but it could be extended to other network architectures easily. We thank Yuanjun Xiong for releasing TSN-Pytorch codebase. Something-something dataset and Jester dataset are from TwentyBN, we really appreciate their effort to build such nice video datasets. Please refer to their dataset website for the proper usage of the data.

trn-pytorch's People

Contributors

Stargazers

Watchers

Forkers

iqbal-chowdhury bityangke zcrwind quxiaofeng willdamon ml-lab zhangxgu choiyeren shubhampachori12110095 yetianjhu hi1049 erinchen824 leviswind librar127 ptee w530248323 nationalflag zhuxinqimac zaie dongzhuoyao absaravanan mitrydoug liviust hzhang57 zhang-can lilimeng hyzcn qijiezhao amoliu stevenlol baiyancheng20 brave731 lvaleriu terminiter ai-is-light bennnun aimeng100 pplntech lingeo liya2001 vinocherish yoosan zhdai hbcbh1999 mynameiziji xiaoweihappy123 qingsong99 xuanthuong ningweikang astangul yunwenhuang ideaplexus giphy chenxingqiang ramana459 nathanielwei dreadlord1984 shuidongliu linhanxiao nemonameless tartaruszen tdinzju zhouliangleo cvnovice95 bikong2 akampsmark fangwudi liushuchun wivj4zm4fpg0 kavap w1001766 156aasdfg solacex mislam5285 qibaolian wangweidamon sherlockls fcorencoret chenle2017 uob-vil gamzeakyol chanajianyu kwanegx lanlianhuaer mkz0930 junedgar pmorerio wenming2014 wh-forker coderskychen gaopeng-eugene weilongzheng parety so2jia loochao chenxhhh anjingxing sampathchanda xiyacao preformiostudios

trn-pytorch's Issues

RuntimeError： loading_state_dict for BNInception

size mismatch for conv1_7x7_s2_bn.weight: copying a param of torch.Size([64]) from checkpoint, where the shape is torch.Size([1, 64]) in current model.
size mismatch for conv1_7x7_s2_bn.bias: copying a param of torch.Size([64]) from checkpoint, where the shape is torch.Size([1, 64]) in current model.

how to use the something-to-something dataset for your model.

i have download the something-to-something dataset from “https://www.twentybn.com/datasets/something-something” and four files which is train.csv-validation.csv-test.csv-labels.csv .
but i don't how to use it(RGB pictures) ,can you give me some example?
i hope know more details.
thanks~

Data prep for Moments in Time or AVA datasets

Thank you for releasing the code. The data preparation guidelines and scripts are especially helpful. Could you point me to similar guidelines/scripts/repos to process the Moments in Time or AVA datasets?

Thank you again for the setting up this repo,

How long does something-something-v2 dataset takes？

I trained this network about 5 days, but it stucks at dataset.py. It keeps repeating getitem(), and a video would be repeat more than ten times, I want to ask that if someone have the same problem. Does it caused by path error？ Because I store dataset at a mobile hard disk drive. Thanks a lot.
code：

def __getitem__(self, index):
        record = self.video_list[index]
        # check this is a legit video folder
        print('getitem program begining...')
       
        while not os.path.exists(os.path.join(self.root_path, record.path, self.image_tmpl.format(1))): `#当`
            print('Building index:path,num_frames and label, %s'%os.path.join(self.root_path, record.path, `self.image_tmpl.format(1)))`
            index = np.random.randint(len(self.video_list))
            record = self.video_list[index]
        print("getitem while has ended...")
        if not self.test_mode:
![1](https://user-images.githubusercontent.com/30710152/50369829-678dd780-05d6-11e9-8c2e-1c9de3db4cf6.png)

            segment_indices = self._sample_indices(record) if self.random_shift else self._get_val_indices(record)
            print('test_mode')
        else:
            segment_indices = self._get_test_indices(record)
            print('train_mode')

        return self.get(record, segment_indices)

terminal：
repeating

building index： path,num_frames, label /media/zhuc/jessica/zhuc_folder/something_frames/53907/000001.jpg

Testing in Webcam?

@metalbubble Is it possible to test this model in webcam? If it's possible what are the changes I have to do?

Convergence of the training

Hi I have used the following script to run the training on the jester but it looks like the accuracy does't improve after several epochs. Could you provide me the hyperparameter you used for the provided pretrained model.

CUDA_VISIBLE_DEVICES=0,1 python main.py jester RGB
--arch BNInception --num_segments 8
--consensus_type TRNmultiscale --batch-size 64

In addition, I could not resume the training from the pretrained model using

CUDA_VISIBLE_DEVICES=0,1 python main.py jester RGB
--arch BNInception --num_segments 8
--consensus_type TRNmultiscale --batch-size 64
--resume pretrain/TRN_jester_RGB_BNInception_TRNmultiscale_segment8_best.pth.tar

It printed out the following message and then quit
=> loading checkpoint 'pretrain/TRN_jester_RGB_BNInception_TRNmultiscale_segment8_best.pth.tar'
=> loaded checkpoint 'False' (epoch 120)
video number:118562
video number:14787
group: first_conv_weight has 1 params, lr_mult: 1, decay_mult: 1
group: first_conv_bias has 1 params, lr_mult: 2, decay_mult: 0
group: normal_weight has 83 params, lr_mult: 1, decay_mult: 1
group: normal_bias has 83 params, lr_mult: 2, decay_mult: 0

But I have successfully resume training from my own checkpoint.

Training on a new dataset

How would you set up your input data for process_dataset.py if you wanted to train on your own dataset?

Testing and submission scripts

Thank you for your remarkable work!

Can you provide the script for generating the .csv file for submission?

AttributeError: module 'torch' has no attribute 'no_grad'

root@GPU-K2-U16-c8n5:/ai/trn-pytorch# python3 test_video.py --arch InceptionV3 --dataset something \

--weights pretrain/TRN_something_RGB_BNInception_TRNmultiscale_segment8_best.pth.tar \
--video_file sample_data/juggling.mp4 --rendered_output sample_data/predicted_video.mp4

Multi-Scale Temporal Relation Network Module in use ['8-frame relation', '7-frame relation', '6-frame relation', '5-frame relation', '4-frame relation', '3-frame relation', '2-frame relation']
Freezing BatchNorm2D except the first one.
Extracting frames using ffmpeg...
Traceback (most recent call last):
File "test_video.py", line 136, in
with torch.no_grad():
AttributeError: module 'torch' has no attribute 'no_grad'

I changed the mode ,when I run ,It make those error.
Can you help me.
Many thanks.

how to generate the index files something-something v2

use something-something v2 dataset,how to generate the index files for train, val, and test split

How can I achieve the effect shown in the demo video?

I mean not only classifying the action, but also detecting when the action occurred.

How to train something v2 dataset?

about own datasets

can this model be trained on custom datasets?,do share any ideas.Thanks in advance

out of memory when training or testing

I have changed test_segments and test_crops to 2,but I still get the error "out of memory".Hope you can help me,thx!

Hardware Configuration:
GPU: GTX1080TI 11G
CPU: 16G

`Initializing TSN with base model: BNInception.
TSN Configurations:
input_modality: RGB
num_segments: 2
new_length: 1
consensus_module: TRNmultiscale
dropout_ratio: 0.8
img_feature_dim: 256

/home/jesse/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py:482: UserWarning: src is not broadcastable to dst, but they have the same number of elements. Falling back to deprecated pointwise behavior.
own_state[name].copy_(param)
('Multi-Scale Temporal Relation Network Module in use', ['2-frame relation'])
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1513363039688/work/torch/lib/THC/generic/THCStorage.c line=82 error=2 : out of memory
./test_rgb_something.sh: 行 2: 16617 段错误 (核心已转储) python test_models.py something RGB model/TRN_something_RGB_BNInception_TRNmultiscale_segment8_best.pth.tar --arch BNInception --crop_fusion_type TRNmultiscale --test_segments 2 --test_crops 2
`

size mismatch while testing pretrained model

I tried to test the pretrained model.. while I'm running this code

python3 test_video.py --arch InceptionV3 --dataset moments \ --weights pretrain/TRN_moments_RGB_InceptionV3_TRNmultiscale_segment8_best.pth.tar \ --frame_folder sample_data/juggling_frames

I got this error

Traceback (most recent call last): File "test_video.py", line 105, in <module> img_feature_dim=args.img_feature_dim, print_spec=False) File "/content/drive/My Drive/TRN-pytorch/models.py", line 43, in __init__ self._prepare_base_model(base_model) File "/content/drive/My Drive/TRN-pytorch/models.py", line 120, in _prepare_base_model self.base_model = getattr(model_zoo, base_model)() File "/content/drive/My Drive/TRN-pytorch/model_zoo/bninception/pytorch_load.py", line 67, in __init__ super(InceptionV3, self).__init__(model_path=model_path, weight_url=weight_url, num_classes=num_classes) File "/content/drive/My Drive/TRN-pytorch/model_zoo/bninception/pytorch_load.py", line 35, in __init__ self.load_state_dict(torch.utils.model_zoo.load_url(weight_url)) File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 719, in load_state_dict self.__class__.__name__, "\n\t".join(error_msgs))) RuntimeError: Error(s) in loading state_dict for InceptionV3: size mismatch for conv_batchnorm.weight: copying a param of torch.Size([32]) from checkpoint, where the shape is torch.Size([1, 32]) in current model. size mismatch for conv_batchnorm.bias: copying a param of torch.Size([32]) from checkpoint, where the shape is torch.Size([1, 32]) in current model. size mismatch for conv_batchnorm.running_mean: copying a param of torch.Size([32]) from checkpoint, where the shape is torch.Size([1, 32]) in current model. size mismatch for conv_batchnorm.running_var: copying a param of torch.Size([32]) from checkpoint, where the shape is torch.Size([1, 32]) in current model. size mismatch for conv_1_batchnorm.weight: copying a param of torch.Size([32]) from checkpoint, where the shape is torch.Size([1, 32]) in current model. size mismatch for conv_1_batchnorm.bias: copying a param of torch.Size([32]) from checkpoint, where the shape is torch.Size([1, 32]) in current model. size mismatch for conv_1_batchnorm.running_mean: copying a param of torch.Size([32]) from checkpoint, where the shape is torch.Size([1, 32]) in current model. size mismatch for conv_1_batchnorm.running_var: copying a param of torch.Size([32]) from checkpoint, where the shape is torch.Size([1, 32]) in current model. size mismatch for conv_2_batchnorm.weight: copying a param of torch.Size([64]) from checkpoint, where the shape is torch.Size([1, 64]) in current model.

can anyone suggest me how to over come this issue

Help:Reg Action Recognition Dataset

Hello @metalbubble

Can you please let me know, if we can use 30 distinct videos to train an action(Example: Using 30 different shoplifting videos to train shoplifting) or Do I need to duplicate same video multiple times for all 30 videos??

Please clarify me, As I am new to action recognition.

compare with RNN

Hi,

Thanks for such a solid work. The code reliably reproduces results in the paper.

May I ask have you tried replacing the TRN module with RNN units like LSTM/GRU?

Error when testing

I tried to download sample data. However, there is no file named bolei_juggling.mp4 but just one file named juggling.mp4 after downloading.

How to get the BN-Inception pretrained on ImageNet?

Hi,
paying attention to the relationship between frames is a good idea to handle the temporal reasoning, your work is excellent,and the code is awesome!
I want to train the model on my own, but i don't know where to get the BN-Inception pretrained on ImageNet for pytorch. Could you give me some clues?
thanks~

Can we have different segment numbers for training and testing?

In TSN, we can train with 5 segments but test with 25 segments for fair comparison with other state-of-the-art approaches. Can we do the same thing for TRN? It seems that training and testing need to have the same segment number. If so, how can we compare with other methods?

Thank you.

AttributeError: module 'model_zoo' has no attribute 'InceptionV3'

Hi,
I am getting module model_zoo has no attribute InceptionV3 when I am running the test_video.py as mentioned. How can I resolve this?
Thanks

optical flow of somethong to something v1?

Can you provide the optical flow of something-something v1? Or can you tell me how to extract it? Since there are only jpegs in the dataset (not a video file). Thanks~

How about k=3 in the experiments influence the final results?

Extracting lower level features

Thanks for sharing this code. It works great as advertised for testing.
So, this is not an issue but a question.
What simple modification could be made (to test_video.py) to extract out some lower level features (before the "logits", say each of the vectors for each N-frame relation right after the g_theta MLP)??? (I've not coded in pytorch, so I have no idea where to start - now if it was Caffe, that's a different story.) I'm curious how well such features might do in transfer learning for other videos not part of the training. Or, do you think the final (akin to an fc8 layer) is the best for that? You did a TSNE plot in your paper-- I assume those were from right after the h_phi MLP?)
Thanks!

Is this program only run 2-seconds short video?

python3 test_video.py --arch InceptionV3 --dataset something
--weights pretrain/TRN_something_RGB_BNInception_TRNmultiscale_segment8_best.pth.tar
--video_file sample_data/001.mp4 --rendered_output sample_data/001_output.mp4

I upload a video name 001.mp4 have 17-seconds, after runing , the output video named 001_output.mp4 only 2-seconds . the other 15-second is missing.
Is this program only run 2-seconds short video?

how to use t-SNE to visualize?

test_video.py - AttributeError: 'NoneType' object has no attribute 'groups'

Hi,

I tried to run on the pre-trained example and got the following error I am running on Windows 10, Anaconda, Python 3.6, Pytorch 0.3.0 etc, any idea why?:

(momentsintime) C:\Users\Pablo\TRN-pytorch>python test_video.py --arch InceptionV3 --dataset moments --weight pretrain\TRN_moments_RGB_InceptionV3_TRNmultiscale_segment8_best.pth.tar --video_file 'C:\Users\Pablo\TRN-pytorch\sample_data\bolei_juggling.mp4' --rendered_output 'C:\Users\Pablo\TRN-pytorch\sample_data\predicted_video.mp4'
Multi-Scale Temporal Relation Network Module in use ['8-frame relation', '7-frame relation', '6-frame relation', '5-frame relation', '4-frame relation', '3-frame relation', '2-frame relation']
Freezing BatchNorm2D except the first one.
Extracting frames using ffmpeg...
Traceback (most recent call last):
File "test_video.py", line 133, in
frames = extract_frames(args.video_file, args.test_segments)
File "test_video.py", line 33, in extract_frames
duration = re_duration.search(str(output[1])).groups()[0]
AttributeError: 'NoneType' object has no attribute 'groups'

How to train on Charades dataset?

Hi,

If I want to train the TRN model on Charades dataset, how can I change the args?

Thanks a lot!

Sincerely yours,
Chris

are you have the optical flow for 'something to something'?

thanks!!
are you have the optical flow for 'something to something'?

Moments dataset

Hello，Thank you for sharing your excellent code.
how is it trained in the moments dataset?
I downloaded the moments Mini dataset, which contains only moments_categories.txt, trainSet.csv, validationSet.csv.
How do I get the files needed in return_moments?
Can moments streaming datasets provide download addresses?

Is there a single-scale TRN network that improves on the UCF101 data set?

RuntimeError: CUDA out of memory.

Hi. Does anyone know how to solve this problem?

CUDA_VISIBLE_DEVICES=0 python test_video.py --arch InceptionV3 --dataset moments --weight pretrain/TRN_moments_RGB_InceptionV3_TRNmultiscale_segment8_best_v0.4.pth.tar --frame_folder sample_data/bolei_juggling
/home/kaiiuk/action_recognition/TRN-pytorch/models.py:87: UserWarning: nn.init.normal is now deprecated in favor of nn.init.normal_.
normal(self.new_fc.weight, 0, std)
/home/kaiiuk/action_recognition/TRN-pytorch/models.py:88: UserWarning: nn.init.constant is now deprecated in favor of nn.init.constant_.
constant(self.new_fc.bias, 0)
Multi-Scale Temporal Relation Network Module in use ['8-frame relation', '7-frame relation', '6-frame relation', '5-frame relation', '4-frame relation', '3-frame relation', '2-frame relation']
Freezing BatchNorm2D except the first one.
/home/kaiiuk/anaconda3/envs/deep-person-reid/lib/python3.7/site-packages/torchvision/transforms/transforms.py:187: UserWarning: The use of the transforms.Scale transform is deprecated, please use transforms.Resize instead.
warnings.warn("The use of the transforms.Scale transform is deprecated, " +
Loading frames in sample_data/bolei_juggling
Traceback (most recent call last):
File "test_video.py", line 140, in
logits = net(input_var)
File "/home/kaiiuk/anaconda3/envs/deep-person-reid/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/kaiiuk/action_recognition/TRN-pytorch/models.py", line 220, in forward
base_out = self.base_model(input.view((-1, sample_len) + input.size()[-2:]))
File "/home/kaiiuk/anaconda3/envs/deep-person-reid/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/kaiiuk/action_recognition/TRN-pytorch/model_zoo/bninception/pytorch_load.py", line 49, in forward
data_dict[op[2]] = getattr(self, op[0])(data_dict[op[-1]])
File "/home/kaiiuk/anaconda3/envs/deep-person-reid/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/kaiiuk/anaconda3/envs/deep-person-reid/lib/python3.7/site-packages/torch/nn/modules/pooling.py", line 148, in forward
self.return_indices)
File "/home/kaiiuk/anaconda3/envs/deep-person-reid/lib/python3.7/site-packages/torch/_jit_internal.py", line 132, in fn
return if_false(*args, **kwargs)
File "/home/kaiiuk/anaconda3/envs/deep-person-reid/lib/python3.7/site-packages/torch/nn/functional.py", line 425, in _max_pool2d
input, kernel_size, stride, padding, dilation, ceil_mode)[0]
File "/home/kaiiuk/anaconda3/envs/deep-person-reid/lib/python3.7/site-packages/torch/nn/functional.py", line 417, in max_pool2d_with_indices
return torch._C._nn.max_pool2d_with_indices(input, kernel_size, _stride, padding, dilation, ceil_mode)
RuntimeError: CUDA out of memory. Tried to allocate 143.62 MiB (GPU 0; 7.92 GiB total capacity; 2.93 GiB already allocated; 94.38 MiB free; 14.45 MiB cached)

Training runs infinitely

I tried to train my own dataset. When I tried to run this training code

!python main.py something RGB 
--arch BNInception --num_segments 3 
--consensus_type TRN --batch-size 2

My training is running infinitely, please check the screen shot below.

can anyone clarify about the training process

Thanks in advance

process_dataset.py of Charades for action recognition

Firstly, thank you so much for your excellent code and paper.

I'm trying to adopt TRN for action recognition with Charades Dataset (Charades_v1). However, the data preprocessing of Charades_v1 seems not as easy as something-something because there are multi-class labels in one video.

- actions:  
Semicolon-separated list of "class start end" triplets for each actions in the video, such as c092 11.90 21.20;c147 0.00 12.60

Should I divide one video into serveral segments?
Could you please provide the process_dataset.py of Charades you used for action recognition which can be adopted to generate the index files for train, val, and test split (charades/category.txt,charades/train_segments.txt and charades/test_segments.txt).

Thank you.

list_file for moments

Hi,
Thanks a lot for providing the open-source code!

  May you provide the .csv list file for moments dataset as here:  https://github.com/metalbubble/TRN-pytorch/blob/master/datasets_video.py

  Highly appreciate your time and help!

RuntimeError: cuda runtime error (2) : out of memory at /pytorch

Hi,

I am trying to execute against example video or frame_folder both give the same error message any ideas:

ubuntu@ubuntu-pc:~/TRN-pytorch$ python3 test_video.py --arch BNInception --dataset something --weight pretrain/TRN_something_RGB_BNInception_TRNmultiscale_segment8_best.pth.tar --video_file sample_data/bolei_juggling.mp4
Multi-Scale Temporal Relation Network Module in use ['8-frame relation', '7-frame relation', '6-frame relation', '5-frame relation', '4-frame relation', '3-frame relation', '2-frame relation']
Freezing BatchNorm2D except the first one.
Extracting frames using ffmpeg...
THCudaCheck FAIL file=/pytorch/torch/lib/THC/generic/THCStorage.cu line=58 error=2 : out of memory
Traceback (most recent call last):
File "test_video.py", line 140, in
logits = net(input_var)
File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 325, in call
result = self.forward(*input, **kwargs)
File "/home/ubuntu/TRN-pytorch/models.py", line 220, in forward
base_out = self.base_model(input.view((-1, sample_len) + input.size()[-2:]))
File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 325, in call
result = self.forward(*input, **kwargs)
File "/home/ubuntu/TRN-pytorch/model_zoo/bninception/pytorch_load.py", line 49, in forward
data_dict[op[2]] = getattr(self, op[0])(data_dict[op[-1]])
File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 325, in call
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/batchnorm.py", line 37, in forward
self.training, self.momentum, self.eps)
File "/usr/local/lib/python3.5/dist-packages/torch/nn/functional.py", line 1013, in batch_norm
return f(input, weight, bias)
RuntimeError: cuda runtime error (2) : out of memory at /pytorch/torch/lib/THC/generic/THCStorage.cu:58

How to finetune the model on UCF101 and HMDB51

Hi,

I am wondering how to finetune TRN model on UCF101 and HMDB51.
Thanks a lot!

Sincerely yours,
Chris.

Hi,is there some code can be comments

Thank you for you present your code.
Recently,I am studing your code. I have a question. Can I comment these code?

while not os.path.exists(os.path.join(self.root_path, record.path, self.image_tmpl.format(1))):
print(os.path.join(self.root_path, record.path, self.image_tmpl.format(1)))
index = np.random.randint(len(self.video_list))
record = self.video_list[index]

Training seems doesn't work

I had trained on my own dataset over 12 hours on 1080ti ,but the result still Prec@1 13.333, Loss 1.98.My dataset is 7 classes and 100 video-clips for each class. Could anyone give me some suggests ?

size mismatch error

Hi, I've found this error in another issue but my pytorch version is already higher than others.
It is 1.0.

I use this script for my jester dataset.

CUDA_VISIBLE_DEVICES=0,1 python main.py jester RGB --arch BNInception --num_segments 8 --consensus_type TRNmultiscale --batch-size 6

And I got this error.

Error running example

Hello,

I cloned the repo as specified and tried running the example, but I got the following message:

While copying the parameter named "conv_batchnorm.weight", whose dimensions in the model are torch.Size([32]) and whose dimensions in the checkpoint are torch.Size([1, 32]).
While copying the parameter named "conv_batchnorm.bias", whose dimensions in the model are torch.Size([32]) and whose dimensions in the checkpoint are torch.Size([1, 32]).
While copying the parameter named "conv_batchnorm.running_mean", whose dimensions in the model are torch.Size([32]) and whose dimensions in the checkpoint are torch.Size([1, 32]).
While copying the parameter named "conv_batchnorm.running_var", whose dimensions in the model are torch.Size([32]) and whose dimensions in the checkpoint are torch.Size([1, 32]).
While copying the parameter named "conv_1_batchnorm.weight", whose dimensions in the model are torch.Size([32]) and whose dimensions in the checkpoint are torch.Size([1, 32]).
While copying the parameter named "conv_1_batchnorm.bias", whose dimensions in the model are torch.Size([32]) and whose dimensions in the checkpoint are torch.Size([1, 32]).
While copying the parameter named "conv_1_batchnorm.running_mean", whose dimensions in the model are torch.Size([32]) and whose dimensions in the checkpoint are torch.Size([1, 32]).
While copying the parameter named "conv_1_batchnorm.running_var", whose dimensions in the model are torch.Size([32]) and whose dimensions in the checkpoint are torch.Size([1, 32]).
...
While copying the parameter named "mixed_10_tower_1_mixed_conv_1_batchnorm.running_var", whose dimensions in the model are torch.Size([384]) and whose dimensions in the checkpoint are torch.Size([1, 384]).
While copying the parameter named "mixed_10_tower_2_conv_batchnorm.weight", whose dimensions in the model are torch.Size([192]) and whose dimensions in the checkpoint are torch.Size([1, 192]).
While copying the parameter named "mixed_10_tower_2_conv_batchnorm.bias", whose dimensions in the model are torch.Size([192]) and whose dimensions in the checkpoint are torch.Size([1, 192]).
While copying the parameter named "mixed_10_tower_2_conv_batchnorm.running_mean", whose dimensions in the model are torch.Size([192]) and whose dimensions in the checkpoint are torch.Size([1, 192]).
While copying the parameter named "mixed_10_tower_2_conv_batchnorm.running_var", whose dimensions in the model are torch.Size([192]) and whose dimensions in the checkpoint are torch.Size([1, 192]).

The command I used was the following:

python test_video.py --frame_folder ~/35345/ --weight TRN_moments_RGB_InceptionV3_TRNmultiscale_segment8_best.pth.tar --arch InceptionV3

Tried running the example in the repo and with a frames folder of my own and got the same message.

Thanks in advance for any help you can provide

Have a problem about the "Test pretrained model on mp4 video file".

@metalbubble
Firstly, thank you so much for your excellent code and paper.
When I try to repeat your experiment in the step of "Test pretrained model on mp4 video file", the showed result is good, but I can not find the predicted_video.mp4 in the sample_data folder, could you tell me the reason?
And whether this step support the input video of .h264/.avi? I try them both, but I can not get the result.
Thank you so much again.(The result is showed as below)

Another question, where can I find the pretrained model "TRN_something_RGB_BNInception_TRN_segment3_best.pth.tar "?

....../something_something/TRN-pytorch$ python test_video.py --arch InceptionV3 --dataset moments --weight pretrain/TRN_moments_RGB_InceptionV3_TRNmultiscale_segment8_best.pth.tar --video_file sample_data/juggling.mp4 --rendered_output sample_data/predicted_video.mp4
('Multi-Scale Temporal Relation Network Module in use', ['8-frame relation', '7-frame relation', '6-frame relation', '5-frame relation', '4-frame relation', '3-frame relation', '2-frame relation'])
Freezing BatchNorm2D except the first one.
Extracting frames using ffmpeg...
RESULT ON sample_data/juggling.mp4
1.000 -> juggling
0.000 -> catching
0.000 -> balancing
0.000 -> spinning
0.000 -> throwing
[MoviePy] >>>> Building video sample_data/predicted_video.mp4
[MoviePy] Writing video sample_data/predicted_video.mp4
89%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉ | 8/9 [00:00<00:00, 60.99it/s]
[MoviePy] Done.
[MoviePy] >>>> Video ready: sample_data/predicted_video.mp4

RuntimeError: invalid argument 2: out of range at /opt/conda/conda-bld/pytorch_1525796793591/work/torch/lib/THC/generic/THCTensor.c:23 when run 'test_rgb_something.sh'

Traceback (most recent call last):
File "test_models.py", line 78, in
img_feature_dim=args.img_feature_dim,
File "/media/data/kmy/TRN-ATT/models.py", line 60, in init
self.consensus = TRNmodule.return_TRN(consensus_type, self.img_feature_dim, self.num_segments, num_class)
File "/media/data/kmy/TRN-ATT/TRNmodule.py", line 283, in return_TRN
TRNmodel = RelationModuleMultiScaleWithAtt(img_feature_dim, num_frames, num_class)
File "/media/data/kmy/TRN-ATT/TRNmodule.py", line 91, in init
nn.Linear(len(self.scales)*self.num_class, len(self.scales)),
File "/home/kongmy/anaconda3/envs/pytorch/lib/python3.5/site-packages/torch/nn/modules/linear.py", line 46, in init
self.reset_parameters()
File "/home/kongmy/anaconda3/envs/pytorch/lib/python3.5/site-packages/torch/nn/modules/linear.py", line 49, in reset_parameters
stdv = 1. / math.sqrt(self.weight.size(1))
RuntimeError: invalid argument 2: out of range at /opt/conda/conda-bld/pytorch_1525796793591/work/torch/lib/THC/generic/THCTensor.c:23

I can't figure out what happened. Is the module code not same in train model?
Why only encountered this problem during the test??

Can anyone help me ? Thanks

Dimensions in model doesn't match the dimensions in the checkpoint

I was trying to run the test_video.py and got the following error messages.

Traceback (most recent call last):
File "test_video.py", line 104, in
img_feature_dim=args.img_feature_dim, print_spec=False)
File "/home/vbalab/projects/SimpleMovementDetection/TRN-pytorch/models.py", line 43, in init
self._prepare_base_model(base_model)
File "/home/vbalab/projects/SimpleMovementDetection/TRN-pytorch/models.py", line 120, in _prepare_base_model
self.base_model = getattr(model_zoo, base_model)()
File "/home/vbalab/projects/SimpleMovementDetection/TRN-pytorch/model_zoo/bninception/pytorch_load.py", line 67, in init
super(InceptionV3, self).init(model_path=model_path, weight_url=weight_url, num_classes=num_classes)
File "/home/vbalab/projects/SimpleMovementDetection/TRN-pytorch/model_zoo/bninception/pytorch_load.py", line 35, in init
self.load_state_dict(torch.utils.model_zoo.load_url(weight_url))
File "/home/vbalab/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 721, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for InceptionV3:
While copying the parameter named "conv_batchnorm.weight", whose dimensions in the model are torch.Size([32]) and whose dimensions in the checkpoint are torch.Size([1, 32]).
While copying the parameter named "conv_batchnorm.bias", whose dimensions in the model are torch.Size([32]) and whose dimensions in the checkpoint are torch.Size([1, 32]).
While copying the parameter named "conv_batchnorm.running_mean", whose dimensions in the model are torch.Size([32]) and whose dimensions in the checkpoint are torch.Size([1, 32]).
While copying the parameter named "conv_batchnorm.running_var", whose dimensions in the model are torch.Size([32]) and whose dimensions in the checkpoint are torch.Size([1, 32]).
While copying the parameter named "conv_1_batchnorm.weight", whose dimensions in the model are torch.Size([32]) and whose dimensions in the checkpoint are torch.Size([1, 32]).
While copying the parameter named "conv_1_batchnorm.bias", whose dimensions in the model are torch.Size([32]) and whose dimensions in the checkpoint are torch.Size([1, 32]).
While copying the parameter named "conv_1_batchnorm.running_mean", whose dimensions in the model are torch.Size([32]) and whose dimensions in the checkpoint are torch.Size([1, 32]).
While copying the parameter named "conv_1_batchnorm.running_var", whose dimensions in the model are torch.Size([32]) and whose dimensions in the checkpoint are torch.Size([1, 32]).
While copying the parameter named "conv_2_batchnorm.weight", whose dimensions in the model are torch.Size([64]) and whose dimensions in the checkpoint are torch.Size([1, 64]).
While copying the parameter named "conv_2_batchnorm.bias", whose dimensions in the model are torch.Size([64]) and whose dimensions in the checkpoint are torch.Size([1, 64]).
While copying the parameter named "conv_2_batchnorm.running_mean", whose dimensions in the model are torch.Size([64]) and whose dimensions in the checkpoint are torch.Size([1, 64]).
.......

whats it means why not use the same reshape as tsn

https://github.com/metalbubble/TRN-pytorch/blob/ae3b888be69f8f7f679882dea7ec73513b4b659e/test_models.py#L153

Can I know about the version of pytorch used int this project?

I have recently tried to finetune the TRN model, and the size mismatch issue occured. I guess I am facing the same problem in issue "Dimensions in model doesn't match the dimensions in the checkpoint" and want to install same version pytorch on anaconda to solve this problem.

sh test_video.sh with AttributeError: 'NoneType' object has no attribute 'groups'

sh test_video.sh
Multi-Scale Temporal Relation Network Module in use ['8-frame relation', '7-frame relation', '6-frame relation', '5-frame relation', '4-frame relation', '3-frame relation', '2-frame relation']
Freezing BatchNorm2D except the first one.
Extracting frames using ffmpeg...
Traceback (most recent call last):
File "test_video.py", line 133, in
frames = extract_frames(args.video_file, args.test_segments)
File "test_video.py", line 35, in extract_frames
duration = re_duration.search(str(output[1])).groups()[0]
AttributeError: 'NoneType' object has no attribute 'groups'

Instability in Inference Output

When I tried loading a fixed model and running inference on some fixed input, I realized the outputted probability of the top N classes are inconsistent. Is there an explanation for where randomness might be introduced in the testing process?

Running TRN without CUDA

Is there a workaround to this for running TRN on just cpu?

Valeries-MacBook-Pro:TRN-pytorch valeriechen$ python test_video.py --arch BNInception --dataset something --weight pretrain/TRN_something_RGB_BNInception_TRNmultiscale_segment8_best.pth.tar --frame_folder sample_data/bolei_juggling
Multi-Scale Temporal Relation Network Module in use ['8-frame relation', '7-frame relation', '6-frame relation', '5-frame relation', '4-frame relation', '3-frame relation', '2-frame relation']
Traceback (most recent call last):
File "test_video.py", line 112, in
net.cuda().eval()
File "/anaconda2/envs/mypython3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 216, in cuda
return self._apply(lambda t: t.cuda(device))
File "/anaconda2/envs/mypython3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 146, in _apply
module._apply(fn)
File "/anaconda2/envs/mypython3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 146, in _apply
module._apply(fn)
File "/anaconda2/envs/mypython3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 152, in _apply
param.data = fn(param.data)
File "/anaconda2/envs/mypython3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 216, in
return self._apply(lambda t: t.cuda(device))
File "/anaconda2/envs/mypython3/lib/python3.6/site-packages/torch/_utils.py", line 69, in cuda
return new_type(self.size()).copy(self, async)
File "/anaconda2/envs/mypython3/lib/python3.6/site-packages/torch/cuda/init.py", line 384, in _lazy_new
_lazy_init()
File "/anaconda2/envs/mypython3/lib/python3.6/site-packages/torch/cuda/init.py", line 141, in _lazy_init
_check_driver()
File "/anaconda2/envs/mypython3/lib/python3.6/site-packages/torch/cuda/init.py", line 55, in _check_driver
raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled

Input shape of the model

Hello and thanks a lot for publishing the code for your work. I'm using a different loading pipeline to load my custom dataset, where I load a batch of videos and I get a tensor of shape:

batch_size x channels x video_length x height x height. (Except for the batch_size, which I define according to my GPU memory, the rest of the values are the same in my dataset.)

I would like to know the shape of tensors that are created in train_loader in main.py without downloading the benchmark datasets. Is there a way to figure this out? What should I change in transforms.py to get this right for my dataset?

Thanks in advance.

zhoubolei / trn-pytorch Goto Github PK

trn-pytorch's Introduction

Temporal Relation Networks

Data preparation

Code

Training and Testing

Pretrained models and demo code

TODO

Reference:

Acknowledgement

trn-pytorch's People

Contributors

Stargazers

Watchers

Forkers

trn-pytorch's Issues

Recommend Projects

Recommend Topics

Recommend Org