sming256 / opentad Goto Github PK

View Code? Open in Web Editor NEW

77.0 3.0 3.0 186 KB

OpenTAD is an open-source temporal action detection (TAD) toolbox based on PyTorch.

License: Apache License 2.0

Python 97.67% C++ 0.82% Cuda 1.32% Shell 0.18%

temporal-action-detection temporal-action-localization video-understanding

opentad's People

Contributors

Stargazers

Watchers

Forkers

frostinassiky castrol68

opentad's Issues

Training loss drops sharply to Nan on my own dataset

May I ask why my loss suddenly decreased to Nan during training using my own dataset? Looking forward to your answer,thanks。

How to get the classifier files of activitynet

such as cuhk_val_simp_7.json new_3ensemble_uniformerv2_large_only_global_anet_16x10x3.json

No recurring indicator results

official

Backbone	GPUs	Setting	Frames	Img Size	[email protected]	[email protected]	[email protected]	[email protected]	[email protected]	ave. mAP
VideoMAE-S	2	AdaTAD	768	160	83.90	79.01	72.38	61.57	48.27	69.03
VideoMAE-B	2	AdaTAD	768	160	85.95	81.86	75.02	63.29	49.56	71.14
VideoMAE-L	2	AdaTAD	768	160	87.17	83.58	76.88	66.81	53.13	73.51
VideoMAE-H	2	AdaTAD	768	160	88.42	84.63	78.72	69.04	53.95	74.95
VideoMAEV2-g	2	AdaTAD	768	160	88.63	85.39	79.17	68.34	53.79	75.06
VideoMAEV2-g	2	AdaTAD	1536	224	89.93	86.83	81.24	69.97	57.36	77.07

e2e_thumos_videomaev2_g_768x1_160_adapter.py

2024-04-22 20:36:07 Train INFO: [Train]: Epoch 41 started
2024-04-22 20:39:06 Train INFO: [Train]: [041][00050/00099]  Loss=0.3434  cls_loss=0.1847  reg_loss=0.1587  lr_backbone=6.8e-05  lr_det=6.8e-05  mem=30703MB
2024-04-22 20:41:54 Train INFO: [Train]: [041][00099/00099]  Loss=0.3310  cls_loss=0.1753  reg_loss=0.1557  lr_backbone=6.7e-05  lr_det=6.7e-05  mem=30703MB
2024-04-22 20:50:28 Train INFO: Evaluation starts...
2024-04-22 20:50:48 Train INFO: Loaded annotations from validation subset.
2024-04-22 20:50:48 Train INFO: Number of ground truth instances: 3325
2024-04-22 20:50:48 Train INFO: Number of predictions: 422000
2024-04-22 20:50:48 Train INFO: Fixed threshold for tiou score: [0.3, 0.4, 0.5, 0.6, 0.7]
2024-04-22 20:50:48 Train INFO: **Average-mAP: 74.85 (%)**
2024-04-22 20:50:48 Train INFO: mAP at tIoU 0.30 is 88.80%
2024-04-22 20:50:48 Train INFO: mAP at tIoU 0.40 is 85.10%
2024-04-22 20:50:48 Train INFO: mAP at tIoU 0.50 is 78.95%
2024-04-22 20:50:48 Train INFO: mAP at tIoU 0.60 is 68.09%
2024-04-22 20:50:48 Train INFO: mAP at tIoU 0.70 is 53.30%

e2e_thumos_videomaev2_g_768x2_224_adapter.py

2024-04-23 08:09:01 Train INFO: [Train]: Epoch 39 started
2024-04-23 08:18:31 Train INFO: [Train]: [039][00050/00099]  Loss=0.2967  cls_loss=0.1572  reg_loss=0.1395  lr_backbone=1.4e-04  lr_det=7.1e-05  mem=51851MB
2024-04-23 08:27:33 Train INFO: [Train]: [039][00099/00099]  Loss=0.3542  cls_loss=0.1892  reg_loss=0.1650  lr_backbone=1.4e-04  lr_det=7.0e-05  mem=51851MB
2024-04-23 09:00:06 Train INFO: Evaluation starts...
2024-04-23 09:00:26 Train INFO: Loaded annotations from validation subset.
2024-04-23 09:00:26 Train INFO: Number of ground truth instances: 3325
2024-04-23 09:00:26 Train INFO: Number of predictions: 422000
2024-04-23 09:00:26 Train INFO: Fixed threshold for tiou score: [0.3, 0.4, 0.5, 0.6, 0.7]
2024-04-23 09:00:26 Train INFO: **Average-mAP: 75.73 (%)**
2024-04-23 09:00:26 Train INFO: mAP at tIoU 0.30 is 88.47%
2024-04-23 09:00:26 Train INFO: mAP at tIoU 0.40 is 85.66%
2024-04-23 09:00:26 Train INFO: mAP at tIoU 0.50 is 79.79%
2024-04-23 09:00:26 Train INFO: mAP at tIoU 0.60 is 69.55%
2024-04-23 09:00:26 Train INFO: mAP at tIoU 0.70 is 55.19%

Train again without changing anything

e2e_thumos_videomaev2_g_768x2_224_adapter.py

2024-04-20 07:39:00 Train INFO: [Train]: Epoch 39 started
2024-04-20 07:48:33 Train INFO: [Train]: [039][00050/00099]  Loss=0.3000  cls_loss=0.1612  reg_loss=0.1388  lr_backbone=1.4e-04  lr_det=7.1e-05  mem=51859MB
2024-04-20 07:57:37 Train INFO: [Train]: [039][00099/00099]  Loss=0.3349  cls_loss=0.1770  reg_loss=0.1578  lr_backbone=1.4e-04  lr_det=7.0e-05  mem=51859MB
2024-04-20 08:30:23 Train INFO: Evaluation starts...
2024-04-20 08:30:42 Train INFO: Loaded annotations from validation subset.
2024-04-20 08:30:42 Train INFO: Number of ground truth instances: 3325
2024-04-20 08:30:42 Train INFO: Number of predictions: 422000
2024-04-20 08:30:42 Train INFO: Fixed threshold for tiou score: [0.3, 0.4, 0.5, 0.6, 0.7]
2024-04-20 08:30:42 Train INFO: **Average-mAP: 76.32 (%)**
2024-04-20 08:30:42 Train INFO: mAP at tIoU 0.30 is 89.55%
2024-04-20 08:30:42 Train INFO: mAP at tIoU 0.40 is 86.40%
2024-04-20 08:30:42 Train INFO: mAP at tIoU 0.50 is 79.45%
2024-04-20 08:30:42 Train INFO: mAP at tIoU 0.60 is 70.78%
2024-04-20 08:30:42 Train INFO: mAP at tIoU 0.70 is 55.43%

Cannot use gdown to download Anet raw video data

When I use this command to download _Anet_videos_15fps_short256.zip from google drive.

gdown [download link]

I got this error:

Failed to retrieve file url:

        Cannot retrieve the public link of the file. You may need to change
        the permission to 'Anyone with the link', or have had many accesses.
        Check FAQ in https://github.com/wkentaro/gdown?tab=readme-ov-file#faq.

You may still be able to access the file from the browser:

        [download link]

but Gdown can't. Please check connections and permissions.

Could you please change the permission to 'Anyone with the link'?

Does Adatad support multi-label temporal action detection?

Congrats on the great work! I was wondering if Adatad supports multi-label temporal action detection (e.g. for multi-thumos)?
Thanks!

Roadmap and Feedback

We keep this issue open to collect feature requests and feedback from users, and thus keep improving this codebase.

If you didn't find the features you need in the Road Map, please leave a message here.

Thank you!

question about `scale_factor` in AdaTAD

I noticed that for anet you use scale_factor = 4 to account for the ViT backbone downsampling, but use scale_factor = 1 for thumos although it uses the same backbone. Can you please explain the logic?

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.