Giter Club home page Giter Club logo

wtalc-pytorch's Introduction

W-TALC: Weakly-supervised Temporal Activity Localization and Classification

Overview

This package is a PyTorch implementation of the paper W-TALC: Weakly-supervised Temporal Activity Localization and Classification, by Sujoy Paul, Sourya Roy and Amit K Roy-Chowdhury and published at ECCV 2018. The TensorFlow implementation can be found here.

Dependencies

This package uses or depends on the the following packages:

  1. PyTorch 0.4.1, Tensorboard Logger 0.1.0
  2. Python 3.6
  3. numpy, scipy among others

Data

The features for Thumos14 and ActivityNet1.2 dataset can be downloaded here. The annotations are included with this package.

Running

This code can be run using two diferent datasets - Thumos14 and Thumos14reduced. The later dataset contain only the data points which has temporal boundaries of Thumos14. There are two options of features only for Thumos14reduced. The dataset name (with other parameters can be changed in options.py). The file to be executed is main.py. The results can be viewed using tensorboard logger or the text file named .log generated during execution. The options for I3D features are the ones mentioned in options.py. For UNT features, the options to be used are as follows:

python main.py --max-seqlen 1200 --lr 0.00001 --feature-type UNT

Citation

Please cite the following work if you use this package.

@inproceedings{paul2018w,
  title={W-TALC: Weakly-supervised Temporal Activity Localization and Classification},
  author={Paul, Sujoy and Roy, Sourya and Roy-Chowdhury, Amit K},
  booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
  pages={563--579},
  year={2018}
}

Contact

Please contact the first author of the associated paper - Sujoy Paul ([email protected]) for any further queries.

wtalc-pytorch's People

Contributors

sujoyp avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

wtalc-pytorch's Issues

FPS of ActivityNet?

Hi, could you please tell me that is still 25 fps used when extracting features of ActivityNet?

extracted_fps of Thumos14reduced-I3D-JOINTFeatures

Hi! Congratulations to your great work!

I've got some trouble with the extracted_fps.

In the file extracted_fps.npy, it's recorded that frames of each video is extracted at 10 fps.

However, i find that this does not match the temporal dimension of feature.

For every video, duration * extracted_fps / 16 = feature.shape[0].
Using this equation, I find that every video is extracted at about 25 fps.

Is there something wrong? Did I make any mistake? I cannot find the reason...

regarding result

I tried to run this code it's giving map as .43 and classification score as 7.5 why it is so?

Visualization

Can you provide any script for visualizing the data from the video using annotations for THUMOS14 ?

Specific composition of features

The length of the feature for each time instant is 2048; I want to know which part of the feature comes from the RGB stream, the first half or the second half?

Could you provide the pre-trained encoder?

Dear Sir/Madam,

Thanks for the code. It is really helpful for me to understand the temporal action localization. However, I want to try from the extracting features step but cannot find the I3D model checkpoint pre-trained on Kinetics-400 in https://github.com/deepmind/kinetics-i3d for Pytorch. Could you provide the RGB and flow .pth checkpoints for your encoders?

Thank you very much.

Overfitting problem when using MIL Loss only and some other details

Hi there:

I run the code and now I have several questions:

  1. When using MIL Loss only (I set the Lambda=1.0), there is an obvious overfitting problem and the highest mAP@tIoU=0.5 is 10.31. However, the paper is reported about 17.0 when Lambda=1.0. I am wondering why. Can you share some instruction to reproduce the result?
    image

  2. When adding CAS Loss and training with MIL Loss, the result seems quite promising. The mAP@tIoU=0.5 can achieve around 25.0 which is way too better than your paper. I am wondering why. Is the result reported on Tensorboard can be regarded the final result?
    image

Both Classification and Localization Performance is higher in my experiment than your eccv18 paper in Thumos14

Hi @sujoyp,

Thanks for sharing this code! It is quite easy to hand on. I have a minor question:

I have settled down this code in my machine for thumos14reduced. While running, the best performance echoed on the screen in terms of classification map is 96.56%, and localization [email protected] is 23.67%. However, in your paper, it is 85.6% and 22.8% respectively. I got confused about the difference, could you please help? Thanks in advance.

About Qualitative Results

Hicould you tell me how to achieve the visualizations showed in the Qualitative Results ? How to draw them?
Thank you very much

features download

i cant download features from your provided link, something broken. Could you show me another link?

Specific parameter setting when training on activityNet v1.2

Hi, I trained your pytorch code on activityNet v1.2 dataset. But I can only get the results as follows.
[email protected]= 47, [email protected]= 44, [email protected]=40, [email protected]=37, [email protected]= 33....it is much lower than what you posted. I think there's something wrong with the parameters. Can you share the specfic paramaters when training on activitNetv1.2 ? I would appreciate it very much if I could receive your reply.

difference between labels.npy and labels_all.npy

Hi. labels.npy is called only at detectionMAP.py as gtlabels and labels_all.npy at video_dataset.py as labels.
At first time, I thought labels_all.npy means set of labels.npy but when I compare 2 files with same index, it show different results.
Then what's difference between labels.npy and labels_all.npy?

I believe labels_all.npy is wrong

For example, labels[226] has multiple segments of 'HammerThrow', but labels_all[226] is empty even if they indicate the same video.
There are some other examples (238, 242, ...)

How did you get labels_all.npy?

overfitting

I used my own small dataset to extract I3D features. Then use main.py to train. Overfitting happened in training set. Do you have any suggestions?
image

The model results vary greatly

Hello, when i ran the pytorch source code(learning rate=0.0001, dataset=Thumos14reduced), I ran into a problem.
I ran ten times, but the results of several of them were quite different. When IoU=0.5, some can reach about 25%, but some can't reach it, only about 24% .what caused this? Is it caused by randomness?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.