sming256 / opentad Goto Github PK
View Code? Open in Web Editor NEWOpenTAD is an open-source temporal action detection (TAD) toolbox based on PyTorch.
License: Apache License 2.0
OpenTAD is an open-source temporal action detection (TAD) toolbox based on PyTorch.
License: Apache License 2.0
such as cuhk_val_simp_7.json new_3ensemble_uniformerv2_large_only_global_anet_16x10x3.json
official
Backbone | GPUs | Setting | Frames | Img Size | [email protected] | [email protected] | [email protected] | [email protected] | [email protected] | ave. mAP |
---|---|---|---|---|---|---|---|---|---|---|
VideoMAE-S | 2 | AdaTAD | 768 | 160 | 83.90 | 79.01 | 72.38 | 61.57 | 48.27 | 69.03 |
VideoMAE-B | 2 | AdaTAD | 768 | 160 | 85.95 | 81.86 | 75.02 | 63.29 | 49.56 | 71.14 |
VideoMAE-L | 2 | AdaTAD | 768 | 160 | 87.17 | 83.58 | 76.88 | 66.81 | 53.13 | 73.51 |
VideoMAE-H | 2 | AdaTAD | 768 | 160 | 88.42 | 84.63 | 78.72 | 69.04 | 53.95 | 74.95 |
VideoMAEV2-g | 2 | AdaTAD | 768 | 160 | 88.63 | 85.39 | 79.17 | 68.34 | 53.79 | 75.06 |
VideoMAEV2-g | 2 | AdaTAD | 1536 | 224 | 89.93 | 86.83 | 81.24 | 69.97 | 57.36 | 77.07 |
e2e_thumos_videomaev2_g_768x1_160_adapter.py
2024-04-22 20:36:07 Train INFO: [Train]: Epoch 41 started
2024-04-22 20:39:06 Train INFO: [Train]: [041][00050/00099] Loss=0.3434 cls_loss=0.1847 reg_loss=0.1587 lr_backbone=6.8e-05 lr_det=6.8e-05 mem=30703MB
2024-04-22 20:41:54 Train INFO: [Train]: [041][00099/00099] Loss=0.3310 cls_loss=0.1753 reg_loss=0.1557 lr_backbone=6.7e-05 lr_det=6.7e-05 mem=30703MB
2024-04-22 20:50:28 Train INFO: Evaluation starts...
2024-04-22 20:50:48 Train INFO: Loaded annotations from validation subset.
2024-04-22 20:50:48 Train INFO: Number of ground truth instances: 3325
2024-04-22 20:50:48 Train INFO: Number of predictions: 422000
2024-04-22 20:50:48 Train INFO: Fixed threshold for tiou score: [0.3, 0.4, 0.5, 0.6, 0.7]
2024-04-22 20:50:48 Train INFO: **Average-mAP: 74.85 (%)**
2024-04-22 20:50:48 Train INFO: mAP at tIoU 0.30 is 88.80%
2024-04-22 20:50:48 Train INFO: mAP at tIoU 0.40 is 85.10%
2024-04-22 20:50:48 Train INFO: mAP at tIoU 0.50 is 78.95%
2024-04-22 20:50:48 Train INFO: mAP at tIoU 0.60 is 68.09%
2024-04-22 20:50:48 Train INFO: mAP at tIoU 0.70 is 53.30%
e2e_thumos_videomaev2_g_768x2_224_adapter.py
2024-04-23 08:09:01 Train INFO: [Train]: Epoch 39 started
2024-04-23 08:18:31 Train INFO: [Train]: [039][00050/00099] Loss=0.2967 cls_loss=0.1572 reg_loss=0.1395 lr_backbone=1.4e-04 lr_det=7.1e-05 mem=51851MB
2024-04-23 08:27:33 Train INFO: [Train]: [039][00099/00099] Loss=0.3542 cls_loss=0.1892 reg_loss=0.1650 lr_backbone=1.4e-04 lr_det=7.0e-05 mem=51851MB
2024-04-23 09:00:06 Train INFO: Evaluation starts...
2024-04-23 09:00:26 Train INFO: Loaded annotations from validation subset.
2024-04-23 09:00:26 Train INFO: Number of ground truth instances: 3325
2024-04-23 09:00:26 Train INFO: Number of predictions: 422000
2024-04-23 09:00:26 Train INFO: Fixed threshold for tiou score: [0.3, 0.4, 0.5, 0.6, 0.7]
2024-04-23 09:00:26 Train INFO: **Average-mAP: 75.73 (%)**
2024-04-23 09:00:26 Train INFO: mAP at tIoU 0.30 is 88.47%
2024-04-23 09:00:26 Train INFO: mAP at tIoU 0.40 is 85.66%
2024-04-23 09:00:26 Train INFO: mAP at tIoU 0.50 is 79.79%
2024-04-23 09:00:26 Train INFO: mAP at tIoU 0.60 is 69.55%
2024-04-23 09:00:26 Train INFO: mAP at tIoU 0.70 is 55.19%
Train again without changing anything
e2e_thumos_videomaev2_g_768x2_224_adapter.py
2024-04-20 07:39:00 Train INFO: [Train]: Epoch 39 started
2024-04-20 07:48:33 Train INFO: [Train]: [039][00050/00099] Loss=0.3000 cls_loss=0.1612 reg_loss=0.1388 lr_backbone=1.4e-04 lr_det=7.1e-05 mem=51859MB
2024-04-20 07:57:37 Train INFO: [Train]: [039][00099/00099] Loss=0.3349 cls_loss=0.1770 reg_loss=0.1578 lr_backbone=1.4e-04 lr_det=7.0e-05 mem=51859MB
2024-04-20 08:30:23 Train INFO: Evaluation starts...
2024-04-20 08:30:42 Train INFO: Loaded annotations from validation subset.
2024-04-20 08:30:42 Train INFO: Number of ground truth instances: 3325
2024-04-20 08:30:42 Train INFO: Number of predictions: 422000
2024-04-20 08:30:42 Train INFO: Fixed threshold for tiou score: [0.3, 0.4, 0.5, 0.6, 0.7]
2024-04-20 08:30:42 Train INFO: **Average-mAP: 76.32 (%)**
2024-04-20 08:30:42 Train INFO: mAP at tIoU 0.30 is 89.55%
2024-04-20 08:30:42 Train INFO: mAP at tIoU 0.40 is 86.40%
2024-04-20 08:30:42 Train INFO: mAP at tIoU 0.50 is 79.45%
2024-04-20 08:30:42 Train INFO: mAP at tIoU 0.60 is 70.78%
2024-04-20 08:30:42 Train INFO: mAP at tIoU 0.70 is 55.43%
When I use this command to download _Anet_videos_15fps_short256.zip from google drive.
gdown [download link]
I got this error:
Failed to retrieve file url:
Cannot retrieve the public link of the file. You may need to change
the permission to 'Anyone with the link', or have had many accesses.
Check FAQ in https://github.com/wkentaro/gdown?tab=readme-ov-file#faq.
You may still be able to access the file from the browser:
[download link]
but Gdown can't. Please check connections and permissions.
Could you please change the permission to 'Anyone with the link'?
Congrats on the great work! I was wondering if Adatad supports multi-label temporal action detection (e.g. for multi-thumos)?
Thanks!
We keep this issue open to collect feature requests and feedback from users, and thus keep improving this codebase.
If you didn't find the features you need in the Road Map, please leave a message here.
Thank you!
I noticed that for anet you use scale_factor = 4
to account for the ViT backbone downsampling, but use scale_factor = 1
for thumos although it uses the same backbone. Can you please explain the logic?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.