sujoyp / wtalc-pytorch Goto Github PK

View Code? Open in Web Editor NEW

128.0 5.0 25.0 36.75 MB

W-TALC: Weakly-supervised Temporal Activity Localization and Classification

License: MIT License

Python 100.00%

action-detection weakly-supervised-detection deep-neural-networks action-recognition

wtalc-pytorch's Introduction

W-TALC: Weakly-supervised Temporal Activity Localization and Classification

Overview

This package is a PyTorch implementation of the paper W-TALC: Weakly-supervised Temporal Activity Localization and Classification, by Sujoy Paul, Sourya Roy and Amit K Roy-Chowdhury and published at ECCV 2018. The TensorFlow implementation can be found here.

Dependencies

This package uses or depends on the the following packages:

PyTorch 0.4.1, Tensorboard Logger 0.1.0
Python 3.6
numpy, scipy among others

Data

The features for Thumos14 and ActivityNet1.2 dataset can be downloaded here. The annotations are included with this package.

Running

This code can be run using two diferent datasets - Thumos14 and Thumos14reduced. The later dataset contain only the data points which has temporal boundaries of Thumos14. There are two options of features only for Thumos14reduced. The dataset name (with other parameters can be changed in options.py). The file to be executed is main.py. The results can be viewed using tensorboard logger or the text file named .log generated during execution. The options for I3D features are the ones mentioned in options.py. For UNT features, the options to be used are as follows:

python main.py --max-seqlen 1200 --lr 0.00001 --feature-type UNT

Citation

Please cite the following work if you use this package.

@inproceedings{paul2018w,
  title={W-TALC: Weakly-supervised Temporal Activity Localization and Classification},
  author={Paul, Sujoy and Roy, Sourya and Roy-Chowdhury, Amit K},
  booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
  pages={563--579},
  year={2018}
}

Contact

Please contact the first author of the associated paper - Sujoy Paul ([email protected]) for any further queries.

wtalc-pytorch's People

Contributors

Stargazers

Watchers

wtalc-pytorch's Issues

Can you share the UntrimmedNet feature for THUMOS 14

Awesome work!

Can you share the UntrimmedNet feature for THUMOS 14 and THUMOS 14 reduced

FPS of ActivityNet?

Hi, could you please tell me that is still 25 fps used when extracting features of ActivityNet?

Can you share the UntrimmedNet feature for ActivityNet1.2

Hello, I find the link you provide only contains the i3d features for ActivityNet1.2. I wonder if you can provide the UntrimmedNet feature for ActivityNet1.2?

extracted_fps of Thumos14reduced-I3D-JOINTFeatures

Hi! Congratulations to your great work!

I've got some trouble with the extracted_fps.

In the file extracted_fps.npy, it's recorded that frames of each video is extracted at 10 fps.

However, i find that this does not match the temporal dimension of feature.

For every video, duration * extracted_fps / 16 = feature.shape[0].
Using this equation, I find that every video is extracted at about 25 fps.

Is there something wrong? Did I make any mistake? I cannot find the reason...

RuntimeError: The size of tensor a (101) must match the size of tensor b (20) at non-singleton dimension 1

regarding result

I tried to run this code it's giving map as .43 and classification score as 7.5 why it is so?

Visualization

Can you provide any script for visualizing the data from the video using annotations for THUMOS14 ?

Specific composition of features

The length of the feature for each time instant is 2048; I want to know which part of the feature comes from the RGB stream, the first half or the second half?

Could you provide the pre-trained encoder?

Dear Sir/Madam,

Thanks for the code. It is really helpful for me to understand the temporal action localization. However, I want to try from the extracting features step but cannot find the I3D model checkpoint pre-trained on Kinetics-400 in https://github.com/deepmind/kinetics-i3d for Pytorch. Could you provide the RGB and flow .pth checkpoints for your encoders?

Thank you very much.

Overfitting problem when using MIL Loss only and some other details

Hi there:

I run the code and now I have several questions:

When using MIL Loss only (I set the Lambda=1.0), there is an obvious overfitting problem and the highest mAP@tIoU=0.5 is 10.31. However, the paper is reported about 17.0 when Lambda=1.0. I am wondering why. Can you share some instruction to reproduce the result?
When adding CAS Loss and training with MIL Loss, the result seems quite promising. The mAP@tIoU=0.5 can achieve around 25.0 which is way too better than your paper. I am wondering why. Is the result reported on Tensorboard can be regarded the final result?

Both Classification and Localization Performance is higher in my experiment than your eccv18 paper in Thumos14

Hi @sujoyp,

Thanks for sharing this code! It is quite easy to hand on. I have a minor question:

I have settled down this code in my machine for thumos14reduced. While running, the best performance echoed on the screen in terms of classification map is 96.56%, and localization [email protected] is 23.67%. However, in your paper, it is 85.6% and 22.8% respectively. I got confused about the difference, could you please help? Thanks in advance.

Regarding feature extraction

Do you have extracted features from every frame or do you have taken some limited frames of the videos.

About Qualitative Results

Hicould you tell me how to achieve the visualizations showed in the Qualitative Results ? How to draw them?
Thank you very much

features download

i cant download features from your provided link, something broken. Could you show me another link?

ActivityNet 1.2 feature

Hi, will you release the feature of ActivityNet 1.2?

Specific parameter setting when training on activityNet v1.2

Hi, I trained your pytorch code on activityNet v1.2 dataset. But I can only get the results as follows.
[email protected]= 47, [email protected]= 44, [email protected]=40, [email protected]=37, [email protected]= 33....it is much lower than what you posted. I think there's something wrong with the parameters. Can you share the specfic paramaters when training on activitNetv1.2 ? I would appreciate it very much if I could receive your reply.

difference between labels.npy and labels_all.npy

Hi. labels.npy is called only at detectionMAP.py as gtlabels and labels_all.npy at video_dataset.py as labels.
At first time, I thought labels_all.npy means set of labels.npy but when I compare 2 files with same index, it show different results.
Then what's difference between labels.npy and labels_all.npy?

I believe labels_all.npy is wrong

For example, labels[226] has multiple segments of 'HammerThrow', but labels_all[226] is empty even if they indicate the same video.
There are some other examples (238, 242, ...)

How did you get labels_all.npy?

overfitting

I used my own small dataset to extract I3D features. Then use main.py to train. Overfitting happened in training set. Do you have any suggestions?

The model results vary greatly

Hello, when i ran the pytorch source code(learning rate=0.0001, dataset=Thumos14reduced), I ran into a problem.
I ran ten times, but the results of several of them were quite different. When IoU=0.5, some can reach about 25%, but some can't reach it, only about 24% .what caused this? Is it caused by randomness?

sujoyp / wtalc-pytorch Goto Github PK

wtalc-pytorch's Introduction

W-TALC: Weakly-supervised Temporal Activity Localization and Classification

Overview

Dependencies

Data

Running

Citation

Contact

wtalc-pytorch's People

Contributors

Stargazers

Watchers

Forkers

wtalc-pytorch's Issues

Recommend Projects

Recommend Topics

Recommend Org