mcg-nju / moc-detector Goto Github PK

[ECCV 2020] Actions as Moving Points

License: MIT License

Shell 1.37% Python 59.72% Cuda 20.22% C 18.69%

moc-detector's Introduction

Actions as Moving Points

Pytorch implementation of Actions as Moving Points (ECCV 2020).

View each action instance as a trajectory of moving points.

Visualization results on validation set. (GIFs will take a few minutes to load......)

(Note that the relative low scores are due to the property of the focal loss.)

News & Updates

Jul. 08, 2020 - First release of codes.

Jul. 24, 2020 - Update ucf-pretrained JHMDB model and speed test codes.

Aug. 02, 2020 - Update visualization codes. Extract frames from a video and get the detection result (like above gifs).

Aug. 17, 2020 - Now our visualization supports instance level detection results (reflects video mAP).

Aug. 23, 2020 - We upload MOC with ResNet-18 in Backbone.

MOC Detector Overview

We present a new action tubelet detection framework, termed as MovingCenter Detector (MOC-detector), by treating an action instance as a trajectory of moving points. MOC-detector is decomposed into three crucial head branches:

(1) Center Branch for instance center detection and action recognition.
(2) Movement Branch for movement estimation at adjacent frames to form moving point trajectories.
(3) Box Branch for spatial extent detection by directly regressing bounding box size at the estimated center point of each frame.

MOC-Detector Usage

1. Installation

Please refer to Installation.md for installation instructions.

2. Dataset

Please refer to Dataset.md for dataset setup instructions.

3. Evaluation

You can follow the instructions in Evaluation.md to evaluate our model and reproduce the results in original paper.

4. Train

You can follow the instructions in Train.md to train our models.

5. Visualization

You can follow the instructions in Visualization.md to get visualization results.

References

Data augmentation codes from ACT.
Evaluation codes from ACT.
DLA-34 backbone codes from CenterNet.

ACT LICENSE

CenterNet LICENSE

See more in NOTICE

Citation

If you find this code is useful in your research, please cite:

@InProceedings{li2020actions,
    title={Actions as Moving Points},
    author={Yixuan Li and Zixu Wang and Limin Wang and Gangshan Wu},
    booktitle={arXiv preprint arXiv:2001.04608},
    year={2020}
}

moc-detector's People

Contributors

Stargazers

Watchers

moc-detector's Issues

how to infer with only rgb model?

The optical flow is hard for me to generate. I learn about your network can run independently to infer videos, and is there any command line shortcuts I can use, or maybe I need to change the codes by myself？

I use DCNv2 have a problem

I run this command:

show this log:

my run environment is:

CUDA version is 9.0

Confused about the `sample_cuboids` constraints

Hi, thanks for your work! Helps a lot!

I have a question about the constrains in sample_cuboids

MOC-Detector/src/ACT_utils/ACT_aug.py

Lines 128 to 142 in 686b26e

 constraints = batch_sampler['sample_constraint'] 

 ious = np.array([np.mean(iou2d(t, sampled_cuboid)) for t in sum(tubes.values(), [])]) 

 if ious.size == 0: # empty gt 

 isample += 1 

 continue 

 if 'min_jaccard_overlap' in constraints and ious.max() >= constraints['min_jaccard_overlap']: 

 sampled_cuboids.append(sampled_cuboid) 

 isample += 1 

 continue 

 if 'max_jaccard_overlap' in constraints and ious.min() >= constraints['max_jaccard_overlap']: 

 sampled_cuboids.append(sampled_cuboid) 

 isample += 1 

 continue

In the above codes, ious is the ious between the sampled cuboid and the gt tubes. And the constrains are:

min_jaccard_overlap: I thought it means all ious shoud be larger than min_jaccard_overlap, aka ious.min() >= constraints['min_jaccard_overlap']. However, in the above code ious.max() >= constraints['min_jaccard_overlap'], which means at least one gt tube has iou > min_jaccard_overlap
max_jaccard_overlap: I thought it means all ious should be smaller than max_jaccard_overlap, aka ious.max() <= constraints['max_jaccard_overlap']. However, in the above code ious.min() >= constraints['max_jaccard_overlap'], which means all ious should be larger than max_jaccard_overlap

Could you please explain these constrains?

train on the one gpu error

packages/torch/_utils.py", line 459, in
return [_get_device_attr(lambda m: m.get_device_properties(i)) for i in device_ids]
File "/usr/local/lib/python3.8/dist-packages/torch/cuda/init.py", line 309, in get_device_properties
raise AssertionError("Invalid device id")
AssertionError: Invalid device id

How to modify if you want to train on a single GPU

Decode output file from normal_inference.py to image

I am trying your normal interface
I want to see the output image for visualization, but the output of your file is pkl file and I don't know how to decode it? Please guide me.
I have read normal_inference.py and only a little understand that you put K images in to predict, but I can not convert your input back to the image, thereby better understanding how you predict.
Hope you can help me, thanks!
Sorry about my English.
Have a great day!

Disable cudnn batch normalization

You provided commands for disabling cudnn BN:

# PYTORCH=/path/to/pytorch # usually ~/anaconda3/envs/MOC/lib/python3.5.2/site-packages/
# for pytorch v0.4.1
sed -i "1254s/torch\.backends\.cudnn\.enabled/False/g" ${PYTORCH}/torch/nn/functional.py

I would like to be 100% sure that this only affects the cudnn environment-wise. Because you pointed out as a comment above., if I run this, I won't modify other envs' cudnn settings on my anaconda, right?

I had nightmares when I did something similar and wasted a couple days to reset everything back. I hope you got my point. Thanks.

python3 vis_det.py --vname **.mp4

Traceback (most recent call last):
  File "vis_det.py", line 174, in <module>
    det()
  File "vis_det.py", line 148, in det
    opt = opts().parse()
  File "/MOC-Detector/src/vis/tiny_opt.py", line 77, in parse
    opt.vname = opt.vname.split('_')[1] + '/' + opt.vname
IndexError: list index out of range

Some questions about spatial-temporal action detection

Thank you for your such awesome work !
I'm a new one to spatial-temporal action detection. I have some questions about spatial-temporal action detection. In my opinion, this task not only detects each person/object in the video (spatial), but also detects the temporal boundary (start and end) of an action instance. In the paper, I found it focus on detecting the spatial boundary, So I'd like to know how to detect both spatial and temporal boundary of an action instance. Please forgive me if my understanding is wrong. Looking forward to your reply.

training problem

I run this command

show

help me pls.

undefined symbol: __cudaRegisterFatBinaryEnd.

When I run det.py, it raises an error:
ImportError: /mnt/data/code/MOC-Detector/src/network/DCNv2/_ext/dcn_v2/_dcn_v2.so: undefined symbol: __cudaRegisterFatBinaryEnd.
It seems that my pytorch version is not compatible with the cuda version. But my devices can not support cuda 9.0. How can I solve this problem?

Question about `down_ratio`

Hi, thanks for your work! I'm trying to reproduce this model now.

I have a question about down_ratio in Sampler

Related codes are listed as follows

        input_h = self._resize_height
        input_w = self._resize_width
        output_h = input_h // self.opt.down_ratio
        output_w = input_w // self.opt.down_ratio

        ....

        # resize the original img and it's GT bbox
        for ilabel in gt_bbox:
            for itube in range(len(gt_bbox[ilabel])):
                gt_bbox[ilabel][itube][:, 0] = gt_bbox[ilabel][itube][:, 0] / original_w * output_w
                gt_bbox[ilabel][itube][:, 1] = gt_bbox[ilabel][itube][:, 1] / original_h * output_h
                gt_bbox[ilabel][itube][:, 2] = gt_bbox[ilabel][itube][:, 2] / original_w * output_w
                gt_bbox[ilabel][itube][:, 3] = gt_bbox[ilabel][itube][:, 3] / original_h * output_h
        images = [cv2.resize(im, (input_w, input_h), interpolation=cv2.INTER_LINEAR) for im in images]

Why the resize ratios are different between gt tubes and raw images? I thought different ratios lead to wrong bboxes.
If down_ratio == 4, then after resize, the bboxes are much smaller in the resized image...

How to pretrain an optical flow model on COCO?

HI, As COCO just has RGB images, how did you pretrain your model for optical flow? Could you give some details? Thanks a lot.

some problem

when i run DCNv2 ,the build.py encount a problem:TypeError: dist must be a Distribution instance
how can i resolve it ?

Custom dataset

I'm now trying to train your model with my custom dataset.
I tried using brox optical flow but it didn't work, so I used dense optical flow instead.
However, when I started training, I got an error like this:

Train_K7_flow_cocoTraceback (most recent call last):
  File "/content/drive/My Drive/action_recognition/Moc-Detection/src/train.py", line 167, in <module>
    main(opt)
  File "/content/drive/My Drive/action_recognition/Moc-Detection/src/train.py", line 95, in main
    log_dict_train = trainer.train(epoch, train_loader, train_writer)
  File "/content/drive/My Drive/action_recognition/Moc-Detection/src/trainer/moc_trainer.py", line 56, in train
    return self.run_epoch('train', epoch, data_loader, writer)
  File "/content/drive/My Drive/action_recognition/Moc-Detection/src/trainer/moc_trainer.py", line 74, in run_epoch
    for iter, batch in enumerate(data_loader):
  File "/usr/local/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 336, in __next__
    return self._process_next_batch(batch)
  File "/usr/local/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 357, in _process_next_batch
    raise batch.exc_type(batch.exc_msg)
IndexError: Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 106, in _worker_loop
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/usr/local/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 106, in <listcomp>
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/content/drive/My Drive/action_recognition/Moc-Detection/src/datasets/sample/sampler.py", line 120, in __getitem__
    draw_umich_gaussian(hm[ilabel], center_int, radius)
IndexError: index 20 is out of bounds for axis 0 with size 4

When I checked, brox-images are (240, 320, 3) and dense-images are the same, so I don't understand how your algorithms think they are different.
This is the image after I use dense optical flow

This is the image after I use dense optical flow and Raises pixel 0 to 128

I tried both and failed at the same error :(
Can you help me? Thank you!

Decoding tubes

Hi, there's an inconsistency in the way you convert from the output of model into tubes in src/detector/decode.py/moc_decode function.

lines 58,59: indexes of tube centers at key frame (K//2) are not modified by "mov" data
lines 80-83: indexes of tube centers at key frame (K//2) are adjusted by adding "mov" data.

Can you explain this difference please. Furthermore, according to the lines 56,57, we have:
xs + mov[..., 2 * i:2 * i + 1] = xs_all[:,:,i:i+1] and
ys + mov[..., 2 * i + 1:2 * i + 2] = ys_all[:,:,i:i+1] for every frame i except for i=K//2
where xs_all ys_all are tensors before beeing converted to long type (lines 61,62).

Why do you have to do this calculation again in line 80?
Thank you in advance for the information.

ModuleNotFoundError

Hi,
I got an error when running train.py. There's no module "network/DCNv2/_ext". The code implementation is missing.

Traceback (most recent call last):
File "train.py", line 10, in
from MOC_utils.model import create_model, load_model, save_model, load_coco_pretrained_model, load_imagenet_pretrained_model
File "D:/TOULON/MOC-Detector-master/src/MOC_utils/model.py", line 10, in
from network.moc_net import MOC_Net
File "D:/TOULON/MOC-Detector-master/src/network/moc_net.py", line 7, in
from .dla import MOC_DLA
File "D:/TOULON/MOC-Detector-master/src/network/dla.py", line 12, in
from .DCNv2.dcn_v2 import DCN
File "D:/TOULON/MOC-Detector-master/src/network/DCNv2/dcn_v2.py", line 11, in
from .dcn_v2_func import DCNv2Function
File "D:/TOULON/MOC-Detector-master/src/network/DCNv2/dcn_v2_func.py", line 9, in
from ._ext import dcn_v2 as _backend
ModuleNotFoundError: No module named 'network.DCNv2._ext'

Why ninput = 5 when input modality is flow?

First of all, thank you for sharing this amazing work.
My question is about this line:

MOC-Detector/src/detector/stream_moc_det.py

Line 79 in 45c3fd0

data[i][3 * ii:3 * ii + 3, :, :] = np.transpose(images[i + ii], (2, 0, 1))

Why you have to stack 5 flow images and make the number of channels to be 15?
Why can't we just treat the flow input the same as the RGB input?
Thank you!

Source code

Hello Guy,
When do you release the source code?
Thanks

bug

Thanks for your awesome project. I found an unimplementation bug in det.py.
from inference.speed_test import speed_test_stream_inference in the line 36. Hopefully you could add this missing file. Thanks in advance!

How to interpret the 3-channel in jpg image of optical flow?

The optical flow jhmdb provides is jpg image.
From this line, I know you just read it as an ordinary 3-channel rgb image:

MOC-Detector/src/datasets/sample/sampler.py

Line 25 in 45c3fd0

 images = [cv2.imread(self.flowfile(v, min(frame + i, self._nframes[v]))).astype(np.float32) for i in range(K + self._ninput - 1)] 

But, in this line, when you try to flip the image, why is images[i][:, :, 2] the x component of optical flow? (in ACT, it also said so) I thought it was just one of the rgb channels?

MOC-Detector/src/datasets/sample/sampler.py

Line 37 in 45c3fd0

images[i][:, :, 2] = 255 - images[i][:, :, 2]

the standard of pkl

Hello,I want to ask to you about the dateset's standard of pkl,for example the ucfl101...,because i want to train this code in charades.

GIF demo

Thanks for sharing your work!
I want to create a video with bounding box like your gif, but i don't know how, input a video and output frames or video with bounding box. Could you help me a little?
I have read your code, but seems like my level is not enough to understand all of its. :(
Once again, thank you for your attention!

Pretrained models

Hi, I have some questions about the pretrained models and their state_dict

In MOC_utils/model.py,

load_imagenet_pretrained_model() only loads weights to backbone subnet of MOC_net, while
load_coco_pretrained_model() loads weights to both backbone and branch subnets.

The difference comes from the fact that coco_dla.pth, coco_resdcn18.pth, coco_resdcn101.pth are detector models, while resnet and dla34-ba72cf86.pth are classification models. Is this true?

There is a difference in state dict between source and target models:

model.py/create_model() creates models with the following keys
branch.hm
branch.mov
branch.wh
model.py/load_coco_pretrained_model() modifies the keys as follows:
backbone.reg
backbone.hm
branch.wh
Here, in load_coco_pretrained_model(), you renamed the state dict key of "hm" and "mov", from branch to backbone, in order to ignore them. At the same time, you tried to keep the key of wh unchanged, in order to reuse its weights. Do I understand correctly? And why did you do that?

Similarly, in load_model(), you ignore branch.hm and branch.mov. Why?
Haven't you supported to load both rgb and flow models at the same time?
I'm confused between option load_model and rgb_model, flow_model.
Can we remove load_model and replace it by rgb_model, flow_model?
Whenever rgb_model or flow_model options are not empty, we will load model's weights from the given paths.

I'm sorry about these trivial questions and thanks for your reply.

Import Error

I followd installation instructions and ran
bash make.sh
with no errors, however when I try to run the train.py script I get the following error:
from ._dcn_v2 import lib as _lib, ffi as _ffi
undefined symbol __cudaPopCallConfiguration
any ideas as to why this is happening?

FLOPs analysis

Thank you for sharing your amazing work!

I was wondering, if possible could you share more details on how you computed GFLOPs (as reported in Table 6 of your paper)?

Specifically, I tried to reproduce the reported FLOP value (thanks to https://github.com/sovrasov/flops-counter.pytorch) using only RGB images (288x288), DLA-34 as backbone and K as 7, but my obtained GFLOP is significantly higher than 29.4. (To be clear, I did multiply GFLOPs of the backbone by K since it is necessary to first obtain 2D feature of each frame for MOC).

Many thanks again!

Reproduction of paper results

Thanks for your code. I downloaded your model of K7_rgb_coco and tested for frame mAP (using flip_test and N=100 by default) in ucf101-24, it gets 71.39 which is pretty lower than the given result 73.14.
Do you have any ideas on the disagreement?

wh branch does not converge when not loading it with COCO pretrained weight

Hi,

Firstly, thank you very much for your amazing work (and active responses in my earlier questions)!

I have a question regarding the "wh" branch that I'd like to consult with you. In my case, I noticed that this branch converged easily on JHMDB-21 when using COCO pretrained models (e.g., ResNet18). However, the total loss failed to converge if I did not load pretrained weights for the wh branch (while still loading COCO pretrained weights for the ResNet backbone).

This phenomenon is not very intuitive for me. Mainly, the wh branch consists of relatively small number of parameters (~3x3x64x64 + 1x1x256x2K) when compared to the entire model.

I wonder if you have experienced similar phenomenon? Is it normal that MOC's performance heavily depends on having used pretrained weights for the wh branch?

Thank you again.

performance variance on JHMDB

Hi,

Thanks very much for your awesome work.

May I ask do you meet a performance variance on JHMDB dataset?
In my case, I got a remarkable variance (about 2~4 mAP), especially in split 3.

UCF101-GT_v2

Hi, Thanks for the great repo.
You gave the repo for downloading the corrected annotation here https://github.com/gurkirt/corrected-UCF101-Annots. But when I checked the repo, I can not find the file UCF101-GT_v2.pkl in the repo. There is a pkl file called pyannot.pkl. Is this one which you called UCF101-GT_v2.pkl here? So I need to convert the format of pyannot.pkl by myself or is there a specific link to directly download UCF101-GT_v2.pkl? Could you kindly provide it if there is? thanks a lot.

model for K7 RGB + FLOW COCO on UCF101-24

Hi, could you send me a link to download your trained model for K7 RGB + FLOW COCO on UCF101-24, please? I noted you only released ucf_dla34_K7_rgb_coco.pth and ucf_dla34_K7_flow_coco.pth. Thanks a lot

how to get you dataset in china?

After reading your excellent paper, I'd like to learn how to get your dataset from other channels because China can't use Google.Thank you very much.

Reproduction of paper results

Hi,
First of all, thanks for open source code!

I have an issue with reproduction of your results on UCF dataset. I got the following results after training and inference:
RGB only (scores are located as follows: [email protected] | [email protected] | @0.5 | @0.75 | 0.5:0.95):
(paper) ucf_dla34_K7_rgb_coco.pth: 73.14 | 78.81 | 51.02 | 27.05 | 26.51

ucf_dla34_K7_rgb_coco with N10, w/o flip_test: 69.84 | 75.77 | 47.55 | 24.57 | 24.76

ucf_dla34_K7_rgb_coco with N100 and flip_test: 72.13 | 78.25 | 50.48 | 26.18 | 26.32

RGB + FLOW:
(paper) K7 RGB + FLOW COCO : 78.01 | 82.81 | 53.83 | 29.59 | 28.33

K7 RGB + FLOW COCO with N10, w/o flip_test: 74.98 | 80.88 | 52.82 | 27.39 | 27.30

K7 RGB + FLOW COCO with N100 and flip_test: 76.52 | 81.62 | 53.52 | 29.05 | 28.05

So, as you can see my results are far enough from yours. I mean you wrote that while inference N10 w/o flip_test is faster (no doubt) and scores are pretty much the same than N100 with flip_test. But I get the opposite.

It would be nice if you help me to understand what am I doing wrong.

Some info that might be useful:
For training and evaluation I've used master branch code.
All parameters are the same as in the script MOC-Detector/scripts/train_ucf_k7_dla.sh
I use PyTorch 1.3 and python 3.6
I've tried to train with enabled cudnn lib for batchnorm - the same problem.

Training with ResNet

Thanks for your excellent project. But I met a RuntimeError when I want to train the model with the 'resnet_18' as the backbone.
RuntimeError: Error(s) in loading state_dict for MOC_Net:
size mismatch for branch.wh.0.bias: copying a param of torch.Size([256]) from checkpoint, where the shape is torch.Size([64]) in current model.
size mismatch for branch.wh.0.weight: copying a param of torch.Size([256, 64, 3, 3]) from checkpoint, where the shape is torch.Size([64, 64, 3, 3]) in current model.
size mismatch for branch.wh.2.weight: copying a param of torch.Size([2, 256, 1, 1]) from checkpoint, where the shape is torch.Size([2, 64, 1, 1]) in current model.
What should I do if I want to try the resnet backbone?

about branch construction

when you construct branch module, ‘self.hm[-1].bias.data.fill_(-2.19)’ is used. What's the magic number meaning?

num_works problem

when i run this code with the num_workers=4,it does not load the data,but if the num_works=0,it can load data.why?

Data processing

DCNv2

在colab上运行时显示No module named 'network.DCNv2._ext.dcn_v2._dcn_v2'这个错误如何解决

more details about 3D CNN

When you used I3D or S3D, what's the construction bewteen I3D/S3D and three output branchs.

Inference with flow model

Hi, thanks for sharing the great work.

I've a question regarding the flow model during training and inference.

I could not find an explanation/architecture illustrating the use of flow model. Is the flow model related to the term 'two-stream' that referenced in the paper?

Would you please explain a bit on how flow model is used for training and inference? Since I only found something related in the Study on Movement Branch Design Section. Thanks a lot.

Did not find movement strategies implemented in the code

Thank you for sharing such an amazing work and truely one of the clearest instructions in using the codes!

I read your paper that the Full Movement strategy gave the best results, where there is a "sequential"relationship between the movement and box branch (bounding boxes on current frames are regressed on centers predicted by the movement branch).

However, I could not find relevant implementation in your code (e.g., box branch depending on movement branch). Could you guide me where you implemented the movement strategy?
Thank you again!

How to train K7 RGB + FLOW model on UCF?

Very nice work! Can you provide the training k7+RGB+FLOW model scripts on UCF?

RuntimeError: cuda runtime error (11) : invalid argument at /opt/conda/conda-bld/pytorch_1535491974311/work/aten/src/THC/THCGeneral.cpp:663

RuntimeError: cuda runtime error (11) : invalid argument at /opt/conda/conda-bld/pytorch_1535491974311/work/aten/src/THC/THCGeneral.cpp:663
how to solve it?

how to inference?

stupid question

There is something wrong when I tried to evaluate on UCF101. It says RuntimeError: given chunk sizes don't sum up to the tensor's size (sum(chunk_sizes) == 135, but expected 40) at the end of inference.

So should we use

python3 det.py --task normal --K 7 --gpus 0,1,2,3,4,5,6,7 --batch_size 94 --master_batch 10 --num_workers 8 --rgb_model ../experiment/result_model/$PATH_TO_RGB_MODEL --flow_model ../experiment/result_model/$PATH_TO_FLOW_MODEL --inference_dir $INFERENCE_DIR --flip_test --ninput 5

and

python3 det.py --task normal --K 7 --gpus 0 --batch_size 1 --master_batch 1 --num_workers 2 --rgb_model ../experiment/result_model/dla34_K7_rgb_coco.pth --flow_model ../experiment/result_model/dla34_K7_flow_coco.pth --inference_dir /data0/liyixuan/speed_test/test --flip_test --ninput 5

sequently, or just the latter is needed?

	constraints = batch_sampler['sample_constraint']
	ious = np.array([np.mean(iou2d(t, sampled_cuboid)) for t in sum(tubes.values(), [])])
	if ious.size == 0: # empty gt
	isample += 1
	continue

	if 'min_jaccard_overlap' in constraints and ious.max() >= constraints['min_jaccard_overlap']:
	sampled_cuboids.append(sampled_cuboid)
	isample += 1
	continue

	if 'max_jaccard_overlap' in constraints and ious.min() >= constraints['max_jaccard_overlap']:
	sampled_cuboids.append(sampled_cuboid)
	isample += 1
	continue