swathikirans / gsm Goto Github PK
View Code? Open in Web Editor NEWGate-Shift Networks for Video Action Recognition - CVPR 2020
License: Other
Gate-Shift Networks for Video Action Recognition - CVPR 2020
License: Other
HI!
How does GSM selectively integrate spatial and temporal information through gating?
When the gating values are different, which py file in the project can be found the differencing and averaging operation for the temporal feature ?
hope for your reply,thanks!
In your paper, you show your GSM design in Figure 2 with a component "133 convolutions". However, in Figure 3, you show your GSM implementation with a component "333 convolutions". Why? Does that mean I need to do 3D convolutions in GSM?
Line 47 in 43e8eba
Hellow, I just want to train your methods in my machine with something-v1 dataset,
And in my machine:
(1)Pytorch-1.2
(2)Python-3.7
(4)TensorboardX-also is suitable
(5)4-Gpus
However, when I run the train scripts, Loss is always 'nan' and the outputing the warning:
Nan or Inf found in input tensor.
I have tried to handle this nut by:
(1)turn the learning rate smaller, but this not work;
(2)check the 'loss.backward()',before and after the backpropragate , I print the losss , I find that before backpropragate, the loss is normal, while after this operation, loss=nan
What's more ,
(3)I also checked the 'datasets_video.py', It appears that in your fuction 'return_something():' it doesn't need the file 'filename_categories.txt' compared with other methods, I am also confused about this.
So could you please show some light to me about those two questions.
Thx very much!
While testing the model after training, I'm getting the following error:
RuntimeError: invalid argument 2: size '[0 x 16 x 64 x 27 x 27]' is invalid for input with 186624 elements at /opt/conda/conda-bld/pytorch_1535491974311/work/aten/src/TH/THStorage.cpp:84
Do you have any idea to fix this?
Take args.iter_size==2 for example, I think the clipped and accumulated grads of your codes are clip(clip(grads1)+grads2), not clip(grads1+grads2), which makes more sense for me.
I haven't run the code yet, I just wonder whether this is a problem.
Hi, I successfully run your network on Somethin-v1 with num_segment=8
However, when I use num_segment=12, after 1st epoch, I am receiving the following error:
RuntimeError: shape '[-1, 8, 4, 27, 27]' is invalid for input of size 34992
Any ideas?
Hi, I wonder to know how many gpus you used when training models on something-something datasets, and the batch size on each gpu.
In your paper, you show the result on something-v1 dataset in Tab 3. How to get them. Do I need to train 4 models with different segment parameter (8, 12, 16, 24)?
How can I download these pretrained models
Hi, I try to train the model on my computer on Diving48 dataset. But if I set the batch_size 16, the training program will be out of my GPU memory. So I try to train the model with batch_size 14 and I get 27.02 classAccuracy, 34.49 Prec@1 and 62.69 Prec@5, which are lower than the result in your paper. Is there something wrong with my training setting or is the batch_size parameter so important.
Training setting:
python main.py diving48 RGB --arch InceptionV3 --num_segments 16 --consensus_type avg --batch-size 14 --iter_size 2 --dropout 0.5 --lr 0.01 --warmup 10 --epochs 60 --eval-freq 5 --gd 20 --run_iter 1 -j 16 --npb --gsm
Testing setting:
python test_models.py diving48 RGB model/diving48_InceptionV3_avg_segment16_batch14_epochs60_best.pth.tar --arch InceptionV3 --crop_fusion_type avg --test_segments 16 --test_crops 1 --num_clips 2 --gsm --save_scores
It seems like with my 2 GPU training Something-v1 dataset takes ~3.5 days for 60 epochs.
Do we need to train 60 epochs to get desirable results or it can be obtained with fewer epochs?
Could you tell me please, can we get your results with less number of epochs(f/e, 30-40)? Did you try that? Cause training time is too long
Could you share your .log file please if it is possible?
P/s: I am training num_segments=8 case
I think you did a great work. But I find that there is something wrong with Diving48 dataset's official website and I can't dowload the dataset from it. So, would you mind to share the dataset in some other ways.
Thanks for nice work . I have trained a model on my own dataset which have three classes and every class have 20 videos each . I have formated the dataset in the somehting-something-v1 format and start the training. Durning training I have got testing accuracy as follow
After completing the training, I have tested the model on the same data, it giving me as follow
Class Accuracy 36.67%
Overall Prec@1 36.67% Prec@5 100.00%
can you explain the result ,is it ok or something wrong ?
dose this repository has a way to apply grad-cam ? however, I try to use this https://github.com/ramprs/grad-cam to apply grad-cam but does not work with me
please help to use the grad cam on my model
Hi, Swathikirans
Thank you for sharing your nice work.
I trained your algorithm using Diving48 dataset, however, my result is lower than 40.27%.
Below my configuration:
python3 main.py diving48 RGB --split 1 --arch InceptionV3 --num_segments 16 --consensus_type avg --batch-size 8 --iter_size 1 --dropout 0.7 --lr 0.01 --warmup 10 --epochs 20 --eval-freq 5 --gd 20 --run_iter 1 -j 16 --npb --gsm
Can not understand why I am receiving poor performance.
Any ideas? Please, if you can help me.
Thank you!
Hi,
I would like to achieve the performance you mentioned in the paper (~40%). I am training the model with the following configuration which after 15 epochs gave me 18.65% accuracy,
python3 main.py diving48 RGB --split 1 --arch InceptionV3 --num_segments 16 --consensus_type avg \
--batch-size 8 --iter_size 2 --dropout 0.5 --lr 0.01 --warmup 10 --epochs 20 \
--eval-freq 5 --gd 20 --run_iter 1 -j 16 --npb --gsm
Can you please provide the exact configuration used?
Thanks for nice work. Can you provide the inference script for running the model on the input Video. Thanks
How did you construct the bn_inception_gsm.yaml file?Can you elaborate? Thanks.
In you paper, you show a great result on EPIC-Kitchens dataset. How to reproduce that. Is there any code for it? Thanks.
Thanks for your beautiful work.
I trained the .sh file python main.py something-v1 RGB --arch BNInception \ --num_segments 8 --consensus_type avg \ --batch-size 16 --iter_size 2 --dropout 0.5 \ --lr 0.01 --warmup 10 --epochs 60 --eval-freq 5 \ --gd 20 --run_iter 1 -j 16 --npb --gsm
I got a model "something-v1_BNInception_avg_segment8_checkpoint.pth.tar"
the test_rgb.sh: python test_models.py somethong-v1 RGB models/something-v1_BNInception_avg_segment8_checkpoint.pth.tar \ --arch BNInception --crop_fusion_type avg --test_segments 8 --test_crops 1 --num_clips 1 --gsm
When I run the test._rgb.sh, the error arised
**
File "/data/users/xuyang/xuyang/Downloads/GSM-master/gsm.py", line 31, in forward
x = x.view(batchSize, self.num_segments, shape).permute(0, 2, 1, 3, 4).contiguous()
RuntimeError: shape '[0, 8, 64, 28, 28]' is invalid for input of size 50176*
I try some ways, but the error still asised.
Please guide me. Look forward to your reply!
hello! I'm find your batch_size=32 in paper while batch_size=16 in github. Why are they not equal?Thanks!
I want to implement this GSM model on a custom dataset. Could you please kindly let me know to what all files I have to make changes to in order to adapt this model to Custom dataset? As far as I know I have to make changes to "main.py", "dataset_video" and "opts". Do I need to make changes in any other file?Could you please kindly help me with this?
As a novice, I saw that the author used different frames to train the model in the paper. Could you tell me how to set the parameters to change the frame number?Thank you very much for your help.
thanks for your work. I want know if there is the theoretical basis about the double input can improve the acc.
In the paper, which layer of features is used in Figure 5 to make t-sne
could you share the visualization code with me?thanks.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.