Code for CVPR2021 paper "Learning Salient Boundary Feature for Anchor-free Temporal Action Localization"

License: Other

Python 96.70% C++ 0.93% Cuda 2.37%

actiondetection-afsd's People

Contributors

Stargazers

Watchers

actiondetection-afsd's Issues

I3D backbone's trainset

what's the dataset when you train the I3D backbone model? Is the imagenet?

ActivityNet files?

Hi @linchuming ,

 Thanks for your sharing code. I wonder whether you can also upload the npy files of ActivityNet dataset?
 Thank you.

Could you provide the baseline code which only use "Basic Prediction Module" for your work?

I modified your code for baseline results.I deleted all the network structure and loss function after the "Basic Prediction Module" and got a very pool results.But you provided the basekine results in your paper 43.1 31.0 19.0 in table(a). I only modified the following three .py files:BDNet.py,multisegment_loss.py and train.py.

I only leave the two loss functions called "loss_loc_val" and "loss_conf_val" and delete others and I got 0.05981531185934582,0.029753291292292032 and 0.008829597885094938

I don't know how you achieve the baseline results in your paper.

请问这个项目可以改造成实时视频流场景下的行为检测吗？

刚准备研究这一块的东西，请问一下目前有可以完成实时视频流场景下的行为检测的方法吗？

thumos14_gt.json

I noticed that the number of videos in the thumos14_gt. json file is 410, it seems that there are 3 videos missing in the test part, now I check that 'video_test_0000270' is not in the thumos14_gt. json, does this affect the evaluation result?

about the trainning on ActivityNet1.3

based on the paper，should i use the code “python3 AFSD/anet/train_init.py configs/anet.yaml --lw=1 --cw=1 --piou=0.5” to train the net. The lw=1 is right? Why does my loss increase when I train?

Data download links

Hi,

Could you please provide the download links for the THUMOS14 RGB data numpy files instead of the Weiyun link provided here? I am not able to access the link https://share.weiyun.com/bP62lmHj.

Something either on GDrive or a link with wget access could work.

Thank you for your help!

Could you explain the the meaning of function augment_() in AFSD/common/thumos_dataset.py?

I can not understand your operation in that function augment_(). What is the meaning of new_input and new_annos?I am completely confused.Thank you!

How long is the training time

thanks for your nice work, and can you provide the details about training GPU and training time?

GPU_NUM of AFSD for ActivityNet v1.3

Hi,

Thank you for your open source work.

As for ActivityNet, how many GPUs are used in your paper?

Thanks!

CUDA error for setup.py

Unzip Test zip data failed

Hi Chuming, nice work, but when I unzip TH14_Test_set_mp4.zip file, the password required?

about thumos14

Hello, do you remove the background data provided by the thumos 14 dataset during training and testing?

about activitynet1.3

@linchuming 您好我在运行python3 AFSD/anet_data/video2npy.py THREAD_NUM生成 RGB npy 输入数据时，遇到一个问题，当采样视频的总时长超过1分钟时，ret, frame = cap.read()，ret为false，count = cap.get(cv2.CAP_PROP_FRAME_COUNT)为770。但是同样的count为770，但是采样视频总时长不超过1分钟时，ret是为true。我不知道这是什么问题，您能帮帮我吗？还有一个神奇的现象是，我把不能正确读帧的视频下载到我本地笔记本电脑上时，这些都可以读取。

long video training

ActivityNet是一段视频只有一个action，如果我想用一段长视频包含多组action该如何训练&测试呢？

What version of opencv-python are you using?

Thanks for your sharing! When I attempt to transfer ant mp4 file to .npy file, some mp4 file could not be read. I guess it's cv2 version problem. So could you tell us what version of opencv-python are you using?

Missing default.yaml for video2npy.py

Hello! I'm new here and I find that there lacks a file named default.yaml when I am trying to transform video to npy by myself. Expecting for your reply, thanks!

A question about the "th" which defined as the minimal action length wmin in one video?

hello ,i am new here! Thank you for your great work.In my case, setting l _trip loss does not improve my models's performance!
I am wonder why do you chose the minimal action length "wmin" in one video as the length of inserted clips, in real ,is it a hyperparameter ? Could you please give me some advices?

Error during training

Thank you so much for your great work. I receive this error when I train on my costume dataset based on Thumos. I followed the all of your templates for data annotations. Would you please help me?

0% 0/18218 [00:00<?, ?it/s]/home/nomad/anaconda3/envs/AFSD/lib/python3.8/site-packages/torch/nn/functional.py:3103: UserWarning: The default behavior for interpolate/upsample with float scale_factor changed in 1.6.0 to align with other frameworks/libraries, and now uses scale_factor directly, instead of relying on the computed output size. If you wish to restore the old behavior, please set recompute_scale_factor=True. See the documentation of nn.Upsample for details.
warnings.warn("The default behavior for interpolate/upsample with float scale_factor changed "
0% 17/18218 [00:11<2:03:32, 2.46it/s, loss=58.30155]/opt/conda/conda-bld/pytorch_1603729096996/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [0,0,0], thread: [31,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1603729096996/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [0,0,0], thread: [32,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1603729096996/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [0,0,0], thread: [33,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1603729096996/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [0,0,0], thread: [34,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1603729096996/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [0,0,0], thread: [35,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1603729096996/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [0,0,0], thread: [36,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1603729096996/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [0,0,0], thread: [37,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1603729096996/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [0,0,0], thread: [38,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1603729096996/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [0,0,0], thread: [39,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1603729096996/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [0,0,0], thread: [44,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1603729096996/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [0,0,0], thread: [45,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1603729096996/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [0,0,0], thread: [46,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1603729096996/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [0,0,0], thread: [47,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1603729096996/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [0,0,0], thread: [48,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1603729096996/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [0,0,0], thread: [49,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1603729096996/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [0,0,0], thread: [50,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1603729096996/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [0,0,0], thread: [51,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1603729096996/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [0,0,0], thread: [52,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1603729096996/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [0,0,0], thread: [53,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1603729096996/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [0,0,0], thread: [54,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1603729096996/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [0,0,0], thread: [55,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1603729096996/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:115: operator(): block: [0,0,0], thread: [56,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
0% 17/18218 [00:12<3:36:36, 1.40it/s, loss=58.30155]
Traceback (most recent call last):
File "AFSD/thumos14/train.py", line 281, in
run_one_epoch(i, net, optimizer, train_data_loader, len(train_dataset) // batch_size)
File "AFSD/thumos14/train.py", line 174, in run_one_epoch
loss_ct, loss_start, loss_end = forward_one_epoch(
File "AFSD/thumos14/train.py", line 137, in forward_one_epoch
loss_l, loss_c, loss_prop_l, loss_prop_c, loss_ct = CPD_Loss(
File "/home/matthew/anaconda3/envs/AFSD/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/matthew/ActionDetection-AFSD/AFSD/thumos14/multisegment_loss.py", line 254, in forward
N = max(pos.sum(), 1)
RuntimeError: CUDA error: device-side assert triggered

How to evaluate inference speed?

Hi, @linchuming,

I wonder how to evaluate inference speed? i.e. results reported in Table 3 from paper.

custom data

你好，请问，这个框架应用于自定义的数据，该如何构建数据格式，比如一个视频是一个动作从start到end的视频

about feature extraction

Hi, have you tried to use I3D pre-extracted features? Since this methods involves finetuning of I3D models,
which may result in unfair comparison with other methods.

about AFSD in activitynet1.3

@linchuming Hello, you use cuhk_ val_ simp_ share. json file when AFSD predicts the class of proposals in activitynet1.3 datasets. Does the model that gets the json file use the temporary boundary annotations of the training set in activitynet1.3 datasets when training the video classifier? Is the video classification score file predicted by untrimmednet network?

An error and a warning when run setup.py

UserWarning: Error checking compiler version for cl: [WinError 2] 系统找不到指定的文件。
warnings.warn('Error checking compiler version for {}: {}'.format(compiler, error))
error: command 'C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\BIN\x86_amd64\cl.exe' failed with exit status 2
@linchuming

rescal flow to [-1,1]

in gen_denseflow_npy.py :
Following I3D data preprocessing, for the flow stream, we convert the videos to grayscale,and pixel values are truncated to the range [-20, 20], then rescaled between -1 and 1. We only use the first two output dimensions, and apply the same cropping as for RGB.
but i dont see the operation '' recal flow from [-20, 20] to [-1, 1]''
Thanks for you work

No such file or directory: 'cuhk-val/cuhk_val_simp_share.json'

Thank you very much for your work！
When I run AFSD/anet/test.py file,I met this problem.
Hope to get your answer！

A question about fusion

I have trained the flow model and rgb model by myself, and the results are better than the original results separately. But when I use the fusion method to test the model, the final results are even worse. How should this be explained?

ActivityNet v1.3 data preprocess & infernece

I downloaded all the sampled video data（32.4G), the total number of these videos is 14950. But the total number of all npy files I get after running step 3 is only 11171. When I run the RGB model inference I also get some FileNotFoundError like "No such file or directory: 'datasets/activitynet/train_val_npy_112/v_JDg--pjY5gg.npy'". I wish i can use some help.

Does the code only support for 1 gpu?

Hi @linchuming,

  Thanks for sharing the code. I wonder whether this code only supports for 1 gpu training?

Architecture of Pyramid Feature Network

Hello,

Firstly, I'd like to thank you much for publishing the code and congratulations about the CVPR'21 paper.

I would like to have an overview of the architecture of pyramid feature module in your pipeline, it is noted to be shared in supplementary, but unfortunately I cannot get access to it.
Could you please share the pdf file of supplementary?

support for multi-GPU

我在复现代码的过程当中发现这个repo不支持多卡，在这里把我个人的解决方法写到这里把，希望作者可以更新一下多卡版本
采用4块V100进行训练，修改的地方：
train.py->def forward_one_epoch(net, clips, targets, scores=None, training=True, ssl=True):

  if training:
       if ssl:
           tar = targets[0]
           pro = torch.stack([tar,tar,tar,tar],dim=0)
           output_dict = net(clips, proposals=pro, ssl=ssl)
       else:
           output_dict = net(clips, ssl=False)
           output_dict['priors'] = output_dict['priors'][0:126:]

Why we use the function ScaleExp() in BDNet.py?

I checked the code carefully according to the formula(1 and 3) in the paper. I could not understand why we use ScaleExp()?In the code,"l_segment = new_priors - segments[:, :, :1]".Do we divide the both parts of the formula 3 by 2^l？Thank you!!

video2npy.py for activitynet cannot read video frames.

I am trying to extract RGB frames by following ActivityNet Readme.

However, when I run video2npy.py, it cannot read frames for some videos .
In detail, VideoCapture.read() returns False while get(cv2.CAP_PROP_FRAME_COUNT) returns 770 frames.

The videos are not scaled to 112x112. (The videos are also generated by transform_videos.py.)
One of width and height is 112, but another was different.
It seems like that they keep the original aspect ratio during resizing.

Is that a problem? Then, how could I fix this?

Data Pre-processing for untrimmed videos on non-standard data

Hi,
Congratulations on such a nice work! Also, thank you for open-sourcing the code!
We are trying to use this code on our raw untrimmed videos and want to use this framework for temporal action localization.

We have our own non-standard data with 15 minutes of videos on avg at 30fps and a higher resolution (~500X900). We also have multiple actions in the videos.

For the activity net, I see that the max frames are specified to be 768

Could you please suggest if we need to split video into clips and what would be the length of each clip? Do we need to sample 256/768 frames uniformly? Or should we split clips based on the actions? Could you please point to any starter code that we could refer?

Thanks.

The issue of Multi-GPU Training

Thanks for adding the code for multi-GPU training. However, changing the value of ngpu in config.py does not seem to work. For example, if ngpu=4, the program still trains only on GPU=0 instead of 4 GPU=0,1,2,3

setup.py in 3090

你好，我在我自己的电脑上（cuda11.2）可以进行setup.py并运行后续程序，但是在3090的服务器中（cuda11.1 cuda11.4）进行训练时，在boundary_max_pooling_cuda处总是会报错 cuda runtime error（209）:no kernel image is available foe execution on the device.
我调整了好多torch和cuda版本，但好像并不是版本不匹配的问题
能帮帮我吗谢谢你

about

CUDA_runtime error (98)

非常感谢开源的工作！我在使用代码时会报错CUDA_runtime error (98)。
报错位置为：AFSD/prop_pooling/boundary_max_pooling_kernel.cu:110
我猜想应该是CUDA拓展出现了问题。
我的环境信息：
pytorch 1.4.0
torchvision 0.5.0
cuda: 10.0
另外：不知道有没有CPU版本的boundary_max_pooling_kernel呢？非常感谢！

Actual number of classes and class indice

Hello,

Thank you for your great work,

I found out that the number of classes in the config file for thumos14 dataset is the actual number of classes + 1. Here, thumos14 dataset has 20 classes while the config file is set to 21. I also tried it in my costume dataset, and I found out that the number of classes in the config file should be set to the number of actual classes + 1. Otherwise, it gives an error. So, what is that extra class? How can I find the original class indices after the action detection is complete?

Query regarding the Input Video processing

Hi , i observed that the video for ANet dataset is trimmed off to have 768 frames , most likely to fit GPU. But my question , when feeding the data to I3D backbone , is it sent as ( batch, channel = 3 , temporal = 768 , height, width ) dimension ? or you break it up into windows of 16 and repeatedly fit in the data ?

confidence score for long videos

Hi,
I checked your codes on very long videos and I got a lot of false positives. What will be the best values for conf_thresh and top_k?

Calculate accuracy for each class

Hello,
Thanks for your great work, Is there any code/tool to calculate the accuracy/mAP for each individual class?

Has anyone reproduced the results of rgb model on THUMOS14 dataset?

I have trained the AFSD rgb model on THUMOS14 dataset as described in Implementation Details, and the experiment results are as follows:

0.3 | 0.4 | 0.5 | 0.6 | 0.7 | Avg.
57.7 | 52.5 | 44.6 | 35.1 | 23.4 | 42.6

However, the results are still about 1.0 lower than the value in the paper.
Could you offer help and figure out this problem?
Thanks a lot.

Question about "bounds = [[0, 30], [15, 60], [30, 120], [60, 240], [96, 768], [256, 768]]"

ActionDetection-AFSD/AFSD/anet/multisegment_loss.py

Line 128 in fcdf2a0

bounds = [[0, 30], [15, 60], [30, 120], [60, 240], [96, 768], [256, 768]]

How to understand the use of bounds?
Thanks!

请问i3d预训练模型flow_imagenet.pt和rgb_imagenet.pt模型来源于哪里？

是自己训练的吗？预训练时模型的图片输入维度是多少

Why the "max_frame" is 768 ?

How can i set "max_frame" for very long video ?
Thanks!

Usage of UntrimmedNet Result used during post-processing of ActivityNet

Hi,

Congrats for your awesome work.

I just want to know why is the Untrimmednet result used during post process ? After reading your paper, it is evident that this work is a localization network ( classification + proposals ) , so why is the UntrimmedNet coming here ? Isnt this network supposed to give you action classification as well ?

Thanks in advance

Error during training

Thank you very much for your great work. I am getting this error while training on Thumos dataset. can you help me?
100% 200/200 [01:02<00:00, 3.20it/s]
0% 0/7842 [01:36<?, ?it/s]
Traceback (most recent call last):
File "AFSD/thumos14/train.py", line 279, in
run_one_epoch(i, net, optimizer, train_data_loader, len(train_dataset) // batch_size)
File "AFSD/thumos14/train.py", line 170, in run_one_epoch
for n_iter, (clips, targets, scores, ssl_clips, ssl_targets, flags) in enumerate(pbar):
File "D:\anaconda\envs\yyf\lib\site-packages\tqdm\std.py", line 1195, in iter
for obj in iterable:
File "D:\anaconda\envs\yyf\lib\site-packages\torch\utils\data\dataloader.py", line 355, in iter
return self._get_iterator()
File "D:\anaconda\envs\yyf\lib\site-packages\torch\utils\data\dataloader.py", line 301, in _get_iterator
return _MultiProcessingDataLoaderIter(self)
File "D:\anaconda\envs\yyf\lib\site-packages\torch\utils\data\dataloader.py", line 914, in init
w.start()
File "D:\anaconda\envs\yyf\lib\multiprocessing\process.py", line 121, in start
self._popen = self._Popen(self)
File "D:\anaconda\envs\yyf\lib\multiprocessing\context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "D:\anaconda\envs\yyf\lib\multiprocessing\context.py", line 327, in _Popen
return Popen(process_obj)
File "D:\anaconda\envs\yyf\lib\multiprocessing\popen_spawn_win32.py", line 93, in init
reduction.dump(process_obj, to_child)
File "D:\anaconda\envs\yyf\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
MemoryError
Traceback (most recent call last):
File "", line 1, in
File "D:\anaconda\envs\yyf\lib\multiprocessing\spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "D:\anaconda\envs\yyf\lib\multiprocessing\spawn.py", line 126, in _main
self = reduction.pickle.load(from_parent)
EOFError: Ran out of input

flow data generation

Prepare the pre-processed RGB data.
How to pre-process RGB data?

what is flow model?

Hi, You didn't refer to any words about "flow" in your paper. I want to know whether you use the optical flow model or not?

tencentyouturesearch / actiondetection-afsd Goto Github PK

actiondetection-afsd's People

Contributors

Stargazers

Watchers

Forkers

actiondetection-afsd's Issues

Recommend Projects

Recommend Topics

Recommend Org