swathikirans / gsm Goto Github PK

View Code? Open in Web Editor NEW

151.0 151.0 17.0 43.65 MB

Gate-Shift Networks for Video Action Recognition - CVPR 2020

License: Other

Python 68.19% Shell 1.56% Lua 1.45% Jupyter Notebook 1.67% C++ 25.22% Makefile 0.24% Dockerfile 0.05% Starlark 1.63%

gsm's People

Contributors

Stargazers

Watchers

Forkers

yuan-2703 avijit99 vinodganesan cong-wu bqhuyy xrosliang firedfree fjchange endeavour10020 shenxingyu 0hanc zherlock030 ee19acmtech11009 amo5 cmwlh krisandchris dralmadani

gsm's Issues

about the gate shift module

HI!
How does GSM selectively integrate spatial and temporal information through gating？
When the gating values are different, which py file in the project can be found the differencing and averaging operation for the temporal feature ?
hope for your reply,thanks!

Can you share me the link to the supplementary document of this paper ? I can not understand the model ensembles well. Thank you very much!

Different dimension in figure 2 and figure 3 in your paper

In your paper, you show your GSM design in Figure 2 with a component "133 convolutions". However, in Figure 3, you show your GSM implementation with a component "333 convolutions". Why? Does that mean I need to do 3D convolutions in GSM?

These lines of code seem to be redundant, can I delete the code from line 47 to line 50？

GSM/gsm.py

Line 47 in 43e8eba

 y_group1 = y_group1.view(batchSize, 2, self.fPlane // 4, self.num_segments, *shape[1:]).permute(0, 2, 1, 3, 4, 

These lines of code seem to be redundant, can I delete the code from line 47 to line 50？
In other words, do these few lines of code have a special purpose?

Nan or Inf found in input tensor.

Hellow, I just want to train your methods in my machine with something-v1 dataset,
And in my machine:
(1)Pytorch-1.2
(2)Python-3.7
(4)TensorboardX-also is suitable
(5)4-Gpus

However, when I run the train scripts, Loss is always 'nan' and the outputing the warning:
Nan or Inf found in input tensor.
I have tried to handle this nut by:
(1)turn the learning rate smaller, but this not work;
(2)check the 'loss.backward()',before and after the backpropragate , I print the losss , I find that before backpropragate, the loss is normal, while after this operation, loss=nan

What's more ,
(3)I also checked the 'datasets_video.py', It appears that in your fuction 'return_something():' it doesn't need the file 'filename_categories.txt' compared with other methods, I am also confused about this.

So could you please show some light to me about those two questions.
Thx very much!

RuntimeError: invalid argument 2 while testing

While testing the model after training, I'm getting the following error:

RuntimeError: invalid argument 2: size '[0 x 16 x 64 x 27 x 27]' is invalid for input with 186624 elements at /opt/conda/conda-bld/pytorch_1535491974311/work/aten/src/TH/THStorage.cpp:84

Do you have any idea to fix this?

Confused with the codes using grads clipping and accumulation simultaneously

Take args.iter_size==2 for example, I think the clipped and accumulated grads of your codes are clip(clip(grads1)+grads2), not clip(grads1+grads2), which makes more sense for me.

I haven't run the code yet, I just wonder whether this is a problem.

Training issue on num_segment=12

Hi, I successfully run your network on Somethin-v1 with num_segment=8

However, when I use num_segment=12, after 1st epoch, I am receiving the following error:
RuntimeError: shape '[-1, 8, 4, 27, 27]' is invalid for input of size 34992

Any ideas?

how many gpu used when training

Hi, I wonder to know how many gpus you used when training models on something-something datasets, and the batch size on each gpu.

How to get the result in Tab 3

In your paper, you show the result on something-v1 dataset in Tab 3. How to get them. Do I need to train 4 models with different segment parameter (8, 12, 16, 24)?

pretrained models download failed

How can I download these pretrained models

How does the batch_size parameter influence the accuracy

Hi, I try to train the model on my computer on Diving48 dataset. But if I set the batch_size 16, the training program will be out of my GPU memory. So I try to train the model with batch_size 14 and I get 27.02 classAccuracy, 34.49 Prec@1 and 62.69 Prec@5, which are lower than the result in your paper. Is there something wrong with my training setting or is the batch_size parameter so important.

Training setting:
python main.py diving48 RGB --arch InceptionV3 --num_segments 16 --consensus_type avg --batch-size 14 --iter_size 2 --dropout 0.5 --lr 0.01 --warmup 10 --epochs 60 --eval-freq 5 --gd 20 --run_iter 1 -j 16 --npb --gsm

Testing setting:
python test_models.py diving48 RGB model/diving48_InceptionV3_avg_segment16_batch14_epochs60_best.pth.tar --arch InceptionV3 --crop_fusion_type avg --test_segments 16 --test_crops 1 --num_clips 2 --gsm --save_scores

Too long training time on Something-V1

It seems like with my 2 GPU training Something-v1 dataset takes ~3.5 days for 60 epochs.

Do we need to train 60 epochs to get desirable results or it can be obtained with fewer epochs?

Could you tell me please, can we get your results with less number of epochs(f/e, 30-40)? Did you try that? Cause training time is too long

Could you share your .log file please if it is possible?

P/s: I am training num_segments=8 case

Can you share the Diving48 dataset

I think you did a great work. But I find that there is something wrong with Diving48 dataset's official website and I can't dowload the dataset from it. So, would you mind to share the dataset in some other ways.

Difference in Result

Thanks for nice work . I have trained a model on my own dataset which have three classes and every class have 20 videos each . I have formated the dataset in the somehting-something-v1 format and start the training. Durning training I have got testing accuracy as follow

After completing the training, I have tested the model on the same data, it giving me as follow

Class Accuracy 36.67%
Overall Prec@1 36.67% Prec@5 100.00%

can you explain the result ,is it ok or something wrong ?

how apply grad-cam on GSM

dose this repository has a way to apply grad-cam ? however, I try to use this https://github.com/ramprs/grad-cam to apply grad-cam but does not work with me

please help to use the grad cam on my model

Performance difference in Diving48 dataset

Hi, Swathikirans

Thank you for sharing your nice work.

I trained your algorithm using Diving48 dataset, however, my result is lower than 40.27%.

Below my configuration:
python3 main.py diving48 RGB --split 1 --arch InceptionV3 --num_segments 16 --consensus_type avg --batch-size 8 --iter_size 1 --dropout 0.7 --lr 0.01 --warmup 10 --epochs 20 --eval-freq 5 --gd 20 --run_iter 1 -j 16 --npb --gsm

Can not understand why I am receiving poor performance.

Any ideas? Please, if you can help me.

Thank you!

Exact Configuration for reproducing results on Diving48

Hi,

I would like to achieve the performance you mentioned in the paper (~40%). I am training the model with the following configuration which after 15 epochs gave me 18.65% accuracy,

python3 main.py diving48 RGB --split 1 --arch InceptionV3 --num_segments 16 --consensus_type avg \
--batch-size 8 --iter_size 2 --dropout 0.5 --lr 0.01 --warmup 10 --epochs 20 \
--eval-freq 5 --gd 20 --run_iter 1 -j 16 --npb --gsm

Can you please provide the exact configuration used?

Run model on Input Video

Thanks for nice work. Can you provide the inference script for running the model on the input Video. Thanks

How to construct the bn_inception_gsm.yaml file

How did you construct the bn_inception_gsm.yaml file？Can you elaborate? Thanks.

How to reproduce the result on EPIC-Kitchens dataset

In you paper, you show a great result on EPIC-Kitchens dataset. How to reproduce that. Is there any code for it? Thanks.

test_rgb.sh cannot work. And get an error:RuntimeError: shape '[0, 8, 64, 28, 28]' is invalid for input of size 50176

Thanks for your beautiful work.
I trained the .sh file python main.py something-v1 RGB --arch BNInception \ --num_segments 8 --consensus_type avg \ --batch-size 16 --iter_size 2 --dropout 0.5 \ --lr 0.01 --warmup 10 --epochs 60 --eval-freq 5 \ --gd 20 --run_iter 1 -j 16 --npb --gsm
I got a model "something-v1_BNInception_avg_segment8_checkpoint.pth.tar"
the test_rgb.sh: python test_models.py somethong-v1 RGB models/something-v1_BNInception_avg_segment8_checkpoint.pth.tar \ --arch BNInception --crop_fusion_type avg --test_segments 8 --test_crops 1 --num_clips 1 --gsm

When I run the test._rgb.sh, the error arised
**
File "/data/users/xuyang/xuyang/Downloads/GSM-master/gsm.py", line 31, in forward
x = x.view(batchSize, self.num_segments, shape).permute(0, 2, 1, 3, 4).contiguous()
RuntimeError: shape '[0, 8, 64, 28, 28]' is invalid for input of size 50176*

I try some ways, but the error still asised.
Please guide me. Look forward to your reply!

batch_size?

hello! I'm find your batch_size=32 in paper while batch_size=16 in github. Why are they not equal?Thanks!

How to set the number of frames in the code when we train the models? What is the meaning of "num_segments" in the code ? Thank you very much !

Implementing GSM by using custom dataset

I want to implement this GSM model on a custom dataset. Could you please kindly let me know to what all files I have to make changes to in order to adapt this model to Custom dataset? As far as I know I have to make changes to "main.py", "dataset_video" and "opts". Do I need to make changes in any other file?Could you please kindly help me with this?