zhaoweicai / cascade-rcnn Goto Github PK

View Code? Open in Web Editor NEW

1.0K 42.0 292.0 5.18 MB

Caffe implementation of multiple popular object detection frameworks

CMake 2.58% Makefile 0.61% Shell 0.42% C++ 79.55% Cuda 7.33% MATLAB 1.10% Python 8.33% Dockerfile 0.07%

object-detection

cascade-rcnn's Introduction

Cascade R-CNN: Delving into High Quality Object Detection

by Zhaowei Cai and Nuno Vasconcelos

This repository is written by Zhaowei Cai at UC San Diego.

Introduction

This repository implements mulitple popular object detection algorithms, including Faster R-CNN, R-FCN, FPN, and our recently proposed Cascade R-CNN, on the MS-COCO and PASCAL VOC datasets. Multiple choices are available for backbone network, including AlexNet, VGG-Net and ResNet. It is written in C++ and powered by Caffe deep learning toolbox.

Cascade R-CNN is a multi-stage extension of the popular two-stage R-CNN object detection framework. The goal is to obtain high quality object detection, which can effectively reject close false positives. It consists of a sequence of detectors trained end-to-end with increasing IoU thresholds, to be sequentially more selective against close false positives. The output of a previous stage detector is forwarded to a later stage detector, and the detection results will be improved stage by stage. This idea can be applied to any detector based on the two-stage R-CNN framework, including Faster R-CNN, R-FCN, FPN, Mask R-CNN, etc, and reliable gains are available independently of baseline strength. A vanilla Cascade R-CNN on FPN detector of ResNet-101 backbone network, without any training or inference bells and whistles, achieved state-of-the-art results on the challenging MS-COCO dataset.

Update

The re-implementation of Cascade R-CNN in Detectron has been released. See Detectron-Cascade-RCNN. Very consistent improvements are available for all tested models, independent of baseline strength.

It is also recommended to use the third-party implementation, mmdetection based on PyTorch and tensorpack based on TensorFlow.

Citation

If you use our code/model/data, please cite our paper:

@inproceedings{cai18cascadercnn,
  author = {Zhaowei Cai and Nuno Vasconcelos},
  Title = {Cascade R-CNN: Delving into High Quality Object Detection},
  booktitle = {CVPR},
  Year  = {2018}
}

or its extension:

@article{cai2019cascadercnn,
  author = {Zhaowei Cai and Nuno Vasconcelos},
  title = {Cascade R-CNN: High Quality Object Detection and Instance Segmentation},
  journal = {arXiv preprint arXiv:1906.09756},
  year = {2019}
}

Benchmarking

We benchmark mulitple detector models on the MS-COCO and PASCAL VOC datasets in the below tables.

MS-COCO (Train/Test: train2017/val2017, shorter size: 800 for FPN and 600 for the others)

model	#GPUs	bs	lr	iter	train time	test time	AP	AP50	AP75
VGG-RPN-baseline	2	4	3e-3	100k	12.5 hr	0.075s	23.6	43.9	23.0
VGG-RPN-Cascade	2	4	3e-3	100k	15.5 hr	0.115s	27.0	44.2	27.7
Res50-RFCN-baseline	4	1	3e-3	280k	19 hr	0.07s	27.0	48.7	26.9
Res50-RFCN-Cascade	4	1	3e-3	280k	22.5 hr	0.075s	31.1	49.8	32.8
Res101-RFCN-baseline	4	1	3e-3	280k	29 hr	0.075s	30.3	52.2	30.8
Res101-RFCN-Cascade	4	1	3e-3	280k	30.5 hr	0.085s	33.3	52.0	35.2
Res50-FPN-baseline	8	1	5e-3	280k	32 hr	0.095s	36.5	58.6	39.2
Res50-FPN-Cascade	8	1	5e-3	280k	36 hr	0.115s	40.3	59.4	43.7
Res101-FPN-baseline	8	1	5e-3	280k	37 hr	0.115s	38.5	60.6	41.7
Res101-FPN-Cascade	8	1	5e-3	280k	46 hr	0.14s	42.7	61.6	46.6

PASCAL VOC 2007 (Train/Test: 2007+2012trainval/2007test, shorter size: 600)

model	#GPUs	bs	lr	iter	train time	AP	AP50	AP75
Alex-RPN-baseline	2	4	1e-3	45k	2.5 hr	29.4	63.2	23.7
Alex-RPN-Cascade	2	4	1e-3	45k	3 hr	38.9	66.5	40.5
VGG-RPN-baseline	2	4	1e-3	45k	6 hr	42.9	76.4	44.1
VGG-RPN-Cascade	2	4	1e-3	45k	7.5 hr	51.2	79.1	56.3
Res50-RFCN-baseline	2	2	2e-3	90k	8 hr	44.8	77.5	46.8
Res50-RFCN-Cascade	2	2	2e-3	90k	9 hr	51.8	78.5	57.1
Res101-RFCN-baseline	2	2	2e-3	90k	10.5 hr	49.4	79.8	53.2
Res101-RFCN-Cascade	2	2	2e-3	90k	12 hr	54.2	79.6	59.2

NOTE. In the above tables, all models have been run at least two times with close results. The training is relatively stable. RPN means Faster R-CNN. The annotations of PASCAL VOC are transformed to COCO format, and COCO API was used for evaluation. The results are different from the official VOC evaluation. If you want to compare the VOC results in publication, please use the official VOC code for evaluation.

Requirements

NVIDIA GPU and cuDNN are required to have fast speeds. For now, CUDA 8.0 with cuDNN 6.0.20 has been tested. The other versions should be working.
Caffe MATLAB wrapper is required to run the detection/evaluation demo.

Installation

Clone the Cascade-RCNN repository, and we'll call the directory that you cloned Cascade-RCNN into CASCADE_ROOT
```
git clone https://github.com/zhaoweicai/cascade-rcnn.git
```

Build Cascade-RCNN

cd $CASCADE_ROOT/
# Follow the Caffe installation instructions here:
#   http://caffe.berkeleyvision.org/installation.html

# If you're experienced with Caffe and have all of the requirements installed
# and your Makefile.config in place, then simply do:
make all -j 16

# If you want to run Cascade-RCNN detection/evaluation demo, build MATLAB wrapper as well
make matcaffe

Datasets

If you already have a COCO/VOC copy but not as organized as below, you can simply create Symlinks to have the same directory structure.

MS-COCO

In all MS-COCO experiments, we use train2017 for training, and val2017 (a.k.a. minival) for validation. Follow MS-COCO website to download images/annotations, and set-up the COCO API.

Assumed that your local COCO dataset copy is at /your/path/to/coco, make sure it has the following directory structure:

coco
|_ images
  |_ train2017
  |  |_ <im-1-name>.jpg
  |  |_ ...
  |  |_ <im-N-name>.jpg
  |_ val2017
  |_ ...
|_ annotations
   |_ instances_train2017.json
   |_ instances_val2017.json
   |_ ...
|_ MatlabAPI

PASCAL VOC

In all PASCAL VOC experiments, we use VOC2007+VOC2012 trainval for training, and VOC2007 test for validation. Follow PASCAL VOC website to download images/annotations, and set-up the VOCdevkit.

Assumed that your local VOCdevkit copy is at /your/path/to/VOCdevkit, make sure it has the following directory structure:

VOCdevkit
|_ VOC2007
  |_ JPEGImages
  |  |_ <000001>.jpg
  |  |_ ...
  |  |_ <009963>.jpg
  |_ Annotations
  |  |_ <000001>.xml
  |  |_ ...
  |  |_ <009963>.xml
  |_ ...
|_ VOC2012
  |_ JPEGImages
  |  |_ <2007_000027>.jpg
  |  |_ ...
  |  |_ <2012_004331>.jpg
  |_ Annotations
  |  |_ <2007_000027>.xml
  |  |_ ...
  |  |_ <2012_004331>.xml
  |_ ...
|_ VOCcode

Training Cascade-RCNN

Get the training data
```
cd $CASCADE_ROOT/data/
sh get_coco_data.sh
```
This will download the window files required for the experiments. You can also use the provided MATLAB scripts coco_window_file.m under $CASCADE_ROOT/data/coco/ to generate your own window files.
Download the pretrained models on ImageNet. For AlexNet and VGG-Net, the FC layers are pruned and 2048 units per FC layer are remained. In addition, the two FC layers are copied three times for Cascade R-CNN training. For ResNet, the BatchNorm layers are merged into Scale layers and frozen during training as common practice.
```
cd $CASCADE_ROOT/models/
sh fetch_vggnet.sh
```
Multiple shell scripts are provided to train Cascade-RCNN on different baseline detectors as described in our paper. Under each model folder, you need to change the root_folder of the data layer in train.prototxt and test.prototxt to your COCO path. After that, you can start to train your own Cascade-RCNN models. Take vgg-12s-600-rpn-cascade for example.
```
cd $CASCADE_ROOT/examples/coco/vgg-12s-600-rpn-cascade/
sh train_detection.sh
```
Log file will be generated along the training procedure. The total training time depends on the complexity of models and datasets. If you want to quickly check if the training works well, try the light AlexNet model on VOC dataset.

NOTE. Occasionally, the training of the Res101-FPN-Cascade will be out of memory. Just resume the training from the latest solverstate.

Pretrained Models

We only provide the Res50-FPN-baseline, Res50-FPN-Cascade and Res101-FPN-Cascade models for COCO dataset, and Res101-RFCN-Cascade for VOC dataset.

Download pre-trained models

cd $CASCADE_ROOT/examples/coco/
sh fetch_cascadercnn_models.sh

The pretrained models produce exactly the same results as described in our paper.

Testing/Evaluation Demo

Once the models pretrained or trained by yourself are available, you can use the MATLAB script run_cascadercnn_coco.m to obtain the detection and evaluation results. Set the right dataset path and choose the model of your interest to test in the demo script. The default setting is for the pretrained model. The final detection results will be saved under $CASCADE_ROOT/examples/coco/detections/ and the evaluation results will be saved under the model folder.

You also can run the shell script test_coco_detection.sh under each model folder for evalution, but it is not identical to the official evaluation. For publication, use the MATLAB script.

Disclaimer

When we were re-implementing the FPN framework and roi_align layer, we only referred to their published papers. Thus, our implementation details could be different from the official Detectron.

If you encounter any issue when using our code or model, please let me know.

cascade-rcnn's People

Stargazers

Watchers

Forkers

issac8huxley tqdavid xiongweiwu starstylesky msnqqer dreadlord1984 wanjinchang statml jac578 fireae opencvfun yhy1993824 zbxzc35 insmod-he ci-ai liyuanyaun tangyoubao galaxy-fangfang xychen9459 huipengzhang xtanitfy inosonnia shlpu quxiaofeng zqdeepbluesky xshhhm poodarchu zhengfangwu shiguang2017 cvtower zgsxwsdxg jjprincess aust-hansen liuguoyou xzf125244170 yang778 levelsethu abolfathzade suzhenghang queenjuliazxx locussam xiangliu886 guoxingyan juncaipengluck xinw1012 rog93 hxl1990 jindingwang afcarl liujiandu amore-hdu wywywy01 pacterakun hzhang57 paranoidw zhenxingsh yiran-thu shelleyhlx linhanxiao la-fe laycoding jwmneu qijiezhao northrend goingqs zumbalamambo dansonc haha00gou ricardozzf shadow992 2017tjm machinelp 10183308 pbdahzou zimenglan-sysu-512 boosting fanofjava wilburd ceci3 husterjwx hwenjun18 conleykong yangjayhui css1995 renqiangnwpu dicksonyuan tianfukang humengdoudou hongdayu farmingyard feng-leaf andyliu93 xqpinitial zjuqiushi hariag yuanpengcheng theonly22 barongeng mantou22 msunming

cascade-rcnn's Issues

A problom about batch size

When I train the model with coco dataset and fpn-cascade model, it seems that the batch size can only be 1. How can I train the model with larger batch size?

Test on single image

Can you provide the method for testing on single image?

How to predict the class and position of the target in a new image through the trained model

I trained a model by cascade-rcnn/examples/voc/res50-9s-600-rfcn-cascade/train_detection.sh.How can i predict the class and position of the target in a new image through the trained model .

Cannot compile the project with the CPU-ONLY mode

It seems the project can only be compiled with GPU mode. Is there a way to compile in the CPU mode?

don't remove redundant high IOU boxes in DecodeBBox operator

I tried to transplant cascade RCNN to Detectron(using COCO2017train+val as training dataset), I haven't remove redundant high IOU boxes in DecodeBBox operator, the results came out not as expected. Details are as below:

I wonder whether this operation would bring bad effects to the detection performance, can you give me any advice?
@zhaoweicai

what does 'field_whr' and 'field_xyr' means in prototxt

As the title, in the prototxt, 'field_whr' is set to 8 and 'field_xyr' is set to 1, what does they mean?

coudn't find any detection

when I train and test the model for voc data,I always get this problem.

why diffient stage of FrcnnOutput has different cls_prob and stage 1st has the highest cls_score

I train cascade rcnn on my own dataset and when testing the val data, I find that 1st stage has the highest cls_score to a object, 2nd stage has the lowest cls_score to the same object. for example, testing pinture a.jpg, 1st stage's FrcnnOutput to a person in a.jpg get cls_score 0.9763, 2nd stage's FrcnnOutput to the person in a.jpg get cls_score 0.5624, is that normal?

Questions about multi-gpu and multiple batch size

I haven't run caffe with multiple gpus before, so I'm a little confused that why the batch size can set larger than 1 in Cascade R-CNN given that the input images are not resized to exactly the same size? As far as I know, the image size should equal in a single batch.

Confusion about parameter settings in ProposalTarget layer

In the train.prototxt, what are the two parameters: "img_width" and "img_height" used for? do I need to change them according to my own dataset( like KITTI object detection task, the images are ~375x1240) ?

proposal_target_param {
    cls_num: 81
    batch_size: 512
    fg_fraction: 0.25
    num_img_per_batch: 4
    fg_thr: 0.5
    bg_thr_hg: 0.5
    bg_thr_lw: 0.0
    img_width: 600
    img_height: 600
  }

Thanks in advance!

Train error： many params are -1, can't save the trained model

I0806 23:44:24.048591 20123 solver.cpp:219] Iteration 9900 (2.14913 iter/s, 46.5305s/100 iters), loss = 0.440841
I0806 23:44:24.048627 20123 solver.cpp:238] Train net output #0: bbox_iou = -1
I0806 23:44:24.048635 20123 solver.cpp:238] Train net output #1: bbox_iou_2nd = -1
I0806 23:44:24.048638 20123 solver.cpp:238] Train net output #2: bbox_iou_3rd = -1
I0806 23:44:24.048641 20123 solver.cpp:238] Train net output #3: bbox_iou_pre = -1
I0806 23:44:24.048645 20123 solver.cpp:238] Train net output #4: bbox_iou_pre_2nd = -1
I0806 23:44:24.048648 20123 solver.cpp:238] Train net output #5: bbox_iou_pre_3rd = -1
I0806 23:44:24.048651 20123 solver.cpp:238] Train net output #6: cls_accuracy = 0.984375
I0806 23:44:24.048655 20123 solver.cpp:238] Train net output #7: cls_accuracy_2nd = 0.972656
I0806 23:44:24.048658 20123 solver.cpp:238] Train net output #8: cls_accuracy_3rd = 0.964844
I0806 23:44:24.048666 20123 solver.cpp:238] Train net output #9: loss_bbox = 0.0117847 (* 1 = 0.0117847 loss)
I0806 23:44:24.048671 20123 solver.cpp:238] Train net output #10: loss_bbox_2nd = 0.0129223 (* 0.5 = 0.00646114 loss)
I0806 23:44:24.048676 20123 solver.cpp:238] Train net output #11: loss_bbox_3rd = 0.00699362 (* 0.25 = 0.0017484 loss)
I0806 23:44:24.048681 20123 solver.cpp:238] Train net output #12: loss_cls = 0.0294972 (* 1 = 0.0294972 loss)
I0806 23:44:24.048686 20123 solver.cpp:238] Train net output #13: loss_cls_2nd = 0.0663875 (* 0.5 = 0.0331937 loss)
I0806 23:44:24.048689 20123 solver.cpp:238] Train net output #14: loss_cls_3rd = 0.0622066 (* 0.25 = 0.0155517 loss)
I0806 23:44:24.048696 20123 solver.cpp:238] Train net output #15: rpn_accuracy = 0.999953
I0806 23:44:24.048701 20123 solver.cpp:238] Train net output #16: rpn_accuracy = -1
I0806 23:44:24.048703 20123 solver.cpp:238] Train net output #17: rpn_bboxiou = -1
I0806 23:44:24.048708 20123 solver.cpp:238] Train net output #18: rpn_loss = 0.000343773 (* 1 = 0.000343773 loss)
I0806 23:44:24.048713 20123 solver.cpp:238] Train net output #19: rpn_loss = 0 (* 1 = 0 loss)
I0806 23:44:24.048717 20123 sgd_solver.cpp:105] Iteration 9900, lr = 0.0002
I0806 23:45:10.848093 20123 solver.cpp:587] Snapshotting to binary proto file /disk1/g201708021059/cascade-rcnn/examples/voc/res101-9s-600-rfcn-cascade/log/cascadercnn_voc_iter_10000.caffemodel
*** Aborted at 1533570310 (unix time) try "date -d @1533570310" if you are using GNU date ***
PC: @ 0x7f55674532e7 caffe::Layer<>::ToProto()
*** SIGSEGV (@0x0) received by PID 20123 (TID 0x7f55682b49c0) from PID 0; stack trace: ***
@ 0x7f5565dedcb0 (unknown)
@ 0x7f55674532e7 caffe::Layer<>::ToProto()
@ 0x7f55675d7533 caffe::Net<>::ToProto()
@ 0x7f55675f415f caffe::Solver<>::SnapshotToBinaryProto()
@ 0x7f55675f42f2 caffe::Solver<>::Snapshot()
@ 0x7f55675f7f7a caffe::Solver<>::Step()
@ 0x7f55675f8994 caffe::Solver<>::Solve()
@ 0x40d4c0 train()
@ 0x408d32 main
@ 0x7f5565dd8f45 (unknown)
@ 0x409442 (unknown)
@ 0x0 (unknown)

thanks a lot

Have you written the python inference code on one image?

when i have the trained model , but i want to use the python code to do detection.
can someone tell me how to do inference on one image using trained model?

accuracy

hi ,I trained the network on the coco2017,but test on coco2017val, the 3rd stage accuracy is ap36.9.
is there something wrong?
I can get the ap42.7 on the released model coco_iter_280000.caffemodel, should I use the solver_step.prototxt instead of solver.prototxt to get the accuracy?

It has been circulating in the iteration 0

不知道为什么，运行的默认的alex-9s-600-rpn-base下的默认的train_detection.sh,就一直循环在第0次迭代，出不去，还望指教，谢谢！

when i try to train the res101-9s-600-rfcn-cascade detector using my gpus 4,5 , it said

(nil) (unknown) Aborted (core dumped)

When I try to run you training script to train a resnet101 cascade faster rcnn, I encounter a error like this:

F0530 22:30:27.787274 1077 syncedmem.cpp:71] Check failed: error == cudaSuccess (2 vs. 0) out of memory
*** Check failure stack trace: ***
@ 0x7ff7618205cd google::LogMessage::Fail()
.............................
Aborted (core dumped)

i know it because memory is not enough.but When I try to run you training script to train a resnet50 cascade faster rcnn, I encounter the other error like this:

@     0x7f999f9855d5  (unknown)

I0530 22:31:45.417707 1167 layer_factory.hpp:77] Creating layer scale2a_branch2b
I0530 22:31:45.418323 1167 net.cpp:122] Setting up scale2a_branch2b
I0530 22:31:45.418344 1167 net.cpp:129] Top shape: 1 64 250 150 (2400000)
I0530 22:31:45.418352 1167 net.cpp:137] Memory required for data: 276001080
I0530 22:31:45.418380 1167 layer_factory.hpp:77] Creating layer res2a_branch2b_relu
I0530 22:31:45.418406 1167 net.cpp:84] Creating Layer res2a_branch2b_relu
I0530 22:31:45.418423 1167 net.cpp:406] res2a_branch2b_relu <- res2a_branch2b
I0530 22:31:45.418457 1167 net.cpp:367] res2a_branch2b_relu -> res2a_branch2b (in-place)
@ 0x7f999a5966ba start_thread
I0530 22:31:45.418839 1167 net.cpp:122] Setting up res2a_branch2b_relu
I0530 22:31:45.418860 1167 net.cpp:129] Top shape: 1 64 250 150 (2400000)
I0530 22:31:45.418870 1167 net.cpp:137] Memory required for data: 285601080
I0530 22:31:45.418885 1167 layer_factory.hpp:77] Creating layer res2a_branch2c
I0530 22:31:45.418934 1167 net.cpp:84] Creating Layer res2a_branch2c
I0530 22:31:45.418954 1167 net.cpp:406] res2a_branch2c <- res2a_branch2b
I0530 22:31:45.418989 1167 net.cpp:380] res2a_branch2c -> res2a_branch2c
@ 0x7f99aa19641d clone
@ (nil) (unknown)
Aborted (core dumped)

22/5000
Why are these two errors different? Can you give me some advice about the latter?

release code

Hi, when will you release the code?

Why all total positive in my training process is equal to 0

cannot run train_detection.sh

When I try to run you training script to train a vgg cascade faster rcnn, I encounter a error like this:

I0426 15:14:21.523888 14111 layer_factory.hpp:77] Creating layer data
F0426 15:14:21.523939 14111 layer_factory.hpp:81] Check failed: registry.count(type) == 1 (0 vs. 1) Unknown layer type: DetectionData (known types: AbsVal, Accuracy, ArgMax, BNLL, BatchNorm, BatchReindex, BboxAccuracy, Bias, BoxGroupOutput, Concat, ContrastiveLoss, Convolution, Crop, Data, DecodeBBox, Deconvolution, DetectionEvaluate, DetectionGroupAccuracy, DetectionGroupLoss, Dropout, DummyData, ELU, Eltwise, Embed, EuclideanLoss, Exp, Filter, Flatten, FrcnnOutput, HDF5Data, HDF5Output, HardMining, HingeLoss, Im2col, InfogainLoss, InnerProduct, Input, LRN, LSTM, LSTMUnit, Log, MVN, MemoryData, MultinomialLogisticLoss, PReLU, PSROIPooling, Parameter, Pooling, Power, ProposalTarget, Python, RNN, ROIAlign, ROIPooling, ReLU, Reduction, Reshape, SPP, Scale, ScaleRoute, Sigmoid, SigmoidCrossEntropyLoss, Silence, Slice, SmoothL1Loss, Softmax, SoftmaxWithLoss, Split, TanH, Threshold, Tile)
*** Check failure stack trace: ***
@ 0x7f87d5321daa (unknown)
@ 0x7f87d5321ce4 (unknown)
@ 0x7f87d53216e6 (unknown)
@ 0x7f87d5324687 (unknown)
@ 0x7f87d59a7f89 caffe::LayerRegistry<>::CreateLayer()
@ 0x7f87d59aa488 caffe::Net<>::Init()
@ 0x7f87d59ac262 caffe::Net<>::Net()
@ 0x7f87d5b3eab0 caffe::Solver<>::InitTrainNet()
@ 0x7f87d5b3efc3 caffe::Solver<>::Init()
@ 0x7f87d5b3f27f caffe::Solver<>::Solver()
@ 0x7f87d5b31241 caffe::Creator_SGDSolver<>()
@ 0x414437 caffe::SolverRegistry<>::CreateSolver()
@ 0x40dbd5 train()
@ 0x40a02c main
@ 0x7f87d3b81f45 (unknown)
@ 0x40aa6b (unknown)
@ (nil) (unknown)
Aborted (core dumped)

I have checked the caffe.proto and the parameters of detection_data_layer are already registered, can you give me some advise?

Why is total positive equal to 0 in many iterations during the training process?

When I was training, batchsize was equal to 1. There was at least one sample in my own training pictures, but Why is total positive equal to 0 in many iterations during the training process?

questions on the fg/bg resampling mechanism.

In the proposal_target_layer.cpp at line 169 to line 256, the fg and bg box indexes are sampled with different way compared with faster rcnn.

It seems that the fg/bg samples in cascade rcnn are not allowed repeated sampling, and if the sampled bg boxes are not adequate enough, some boxes are picked up from discarded pool.

This is very different from faster rcnn. I was wondering whether this mechanism is important for cascade rcnn.

By the way, I'm also trying to reimplement cascade rcnn with tensorflow api, and still use the resampling method in faster rcnn, but the bbox regression loss curves in the three stages look strange (loss_box_stage1 > loss_box_stage2 > loss_box_stage3). Does this result relate to the sampling method, or simply the normalization statistics (stds and means) are not set well?

ResNet-101-v1 + Faster RCNN + cascade detection, trained on coco2017 trainset.
Visualize the training log of res50-15s-800-fpn-cascade model in cascade rcnn coco examples.

test networks have more branches than training networks?

I have visualized your provided prototxt files for both training and test networks. I found that the test network has additional fully-connected layers which does not exist in training network, i.e.:

stage2: [roi_pool_2nd] --> FC6/FC7 --> [cls_prob_1st_2nd] -->...
stage3: [roi_pool_3rd] --> FC6/FC7/FC/Softmax --> [cls_prob_1st_3rd] -->...
[roi_pool_3rd] --> FC6/FC7/FC/Softmax --> [cls_prob_2nd_3rd] -->...

I was confused that if these additional branches are not trained, how can they be used in test phase? Or the parameters of these branches are simply copied from cls prediction branch in RCNN? In this case, will these branches in the same stage predict the same cls_prob?

cannot run train_detection.sh

When I try to run you training script to train a vgg cascade faster rcnn, I encounter a error like this:

I0530 22:20:18.926313 31741 cudnn_conv_layer.cpp:194] Reallocating workspace storage: 786432
I0530 22:20:18.926367 31741 net.cpp:122] Setting up conv1_2
I0530 22:20:18.926391 31741 net.cpp:129] Top shape: 1 64 1000 600 (38400000)
I0530 22:20:18.926401 31741 net.cpp:137] Memory required for data: 468001080
I0530 22:20:18.926450 31741 layer_factory.hpp:77] Creating layer relu1_2
I0530 22:20:18.926482 31741 net.cpp:84] Creating Layer relu1_2
I0530 22:20:18.926501 31741 net.cpp:406] relu1_2 <- conv1_2
I0530 22:20:18.926537 31741 net.cpp:367] relu1_2 -> conv1_2 (in-place)
I0530 22:20:18.926864 31741 net.cpp:122] Setting up relu1_2
I0530 22:20:18.926888 31741 net.cpp:129] Top shape: 1 64 1000 600 (38400000)
I0530 22:20:18.926898 31741 net.cpp:137] Memory required for data: 621601080
I0530 22:20:18.926913 31741 layer_factory.hpp:77] Creating layer pool1
I0530 22:20:18.926946 31741 net.cpp:84] Creating Layer pool1
I0530 22:20:18.926964 31741 net.cpp:406] pool1 <- conv1_2
I0530 22:20:18.927002 31741 net.cpp:380] pool1 -> pool1
@ 0x7f0aea498368 boost::_bi::list7<>::operator()<>()
I0530 22:20:18.927142 31741 net.cpp:122] Setting up pool1
I0530 22:20:18.927165 31741 net.cpp:129] Top shape: 1 64 500 300 (9600000)
I0530 22:20:18.927175 31741 net.cpp:137] Memory required for data: 660001080
I0530 22:20:18.927188 31741 layer_factory.hpp:77] Creating layer conv2_1
I0530 22:20:18.927238 31741 net.cpp:84] Creating Layer conv2_1
I0530 22:20:18.927258 31741 net.cpp:406] conv2_1 <- pool1
I0530 22:20:18.927299 31741 net.cpp:380] conv2_1 -> conv2_1
I0530 22:20:18.930271 31741 cudnn_conv_layer.cpp:194] Reallocating workspace storage: 1572864
@ 0x7f0aea498242 boost::_bi::bind_t<>::operator()()
I0530 22:20:18.931128 31741 net.cpp:122] Setting up conv2_1
I0530 22:20:18.931154 31741 net.cpp:129] Top shape: 1 128 500 300 (19200000)
I0530 22:20:18.931165 31741 net.cpp:137] Memory required for data: 736801080
I0530 22:20:18.931210 31741 layer_factory.hpp:77] Creating layer relu2_1
I0530 22:20:18.931241 31741 net.cpp:84] Creating Layer relu2_1
I0530 22:20:18.931260 31741 net.cpp:406] relu2_1 <- conv2_1
I0530 22:20:18.931291 31741 net.cpp:367] relu2_1 -> conv2_1 (in-place)
I0530 22:20:18.931601 31741 net.cpp:122] Setting up relu2_1
I0530 22:20:18.931622 31741 net.cpp:129] Top shape: 1 128 500 300 (19200000)
I0530 22:20:18.931632 31741 net.cpp:137] Memory required for data: 813601080
I0530 22:20:18.931646 31741 layer_factory.hpp:77] Creating layer conv2_2
I0530 22:20:18.931684 31741 net.cpp:84] Creating Layer conv2_2
I0530 22:20:18.931702 31741 net.cpp:406] conv2_2 <- conv2_1
I0530 22:20:18.931744 31741 net.cpp:380] conv2_2 -> conv2_2
@ 0x7f0aea4981f4 boost::detail::thread_data<>::run()
@ 0x7f0add9e45d5 (unknown)
@ 0x7f0ad85f56ba start_thread
@ 0x7f0ae81f541d clone
@ (nil) (unknown)

could you give me some advices?

c++ inference code

could you offer c++ inference code？

bbox_util

void DecodeBBoxesWithPrior(const Dtype* bbox_data, const vector prior_bboxes,
const int bbox_dim, const Dtype* means, const Dtype* stds,
Dtype* pred_data) {
const int num = prior_bboxes.size();
const int cls_num = bbox_dim/4;
for (int i = 0; i < num; i++) {
Dtype pw, ph, cx, cy;
pw = prior_bboxes[i].xmax-prior_bboxes[i].xmin+1;
ph = prior_bboxes[i].ymax-prior_bboxes[i].ymin+1;
cx = 0.5*(prior_bboxes[i].xmax+prior_bboxes[i].xmin);
cy = 0.5*(prior_bboxes[i].ymax+prior_bboxes[i].ymin);
for (int c = 0; c < cls_num; c++) {
Dtype bx, by, bw, bh;
// bbox de-normalization
bx = bbox_data[ibbox_dim+4c]stds[0]+means[0];
by = bbox_data[ibbox_dim+4c+1]stds[1]+means[1];
bw = bbox_data[ibbox_dim+4c+2]stds[2]+means[2];
bh = bbox_data[ibbox_dim+4*c+3]*stds[3]+means[3];

  Dtype tx, ty, tw, th;
  tx = bx*pw+cx; ty = by*ph+cy;
  tw = pw*exp(bw); th = ph*exp(bh);
  tx -= (tw-1)/2; ty -= (th-1)/2;
  pred_data[i*bbox_dim+4*c] = tx; 
  pred_data[i*bbox_dim+4*c+1] = ty;
  pred_data[i*bbox_dim+4*c+2] = tx+tw-1; 
  pred_data[i*bbox_dim+4*c+3] = ty+th-1;
}

}
}
What function does this part of the code implement?

whether it can multi-scale train？how to do

very fresh idea, project release highly anticipated ...

how to train it on my own dataset

hi! I want to train cascade-rcnn on my own dataset (three classes). I don't know how to modify the files(eg. examples/voc/). Can you give me some instructions? Thank you!

Do you have pytorch version?

hi,author,do you have a pytorch version?

Python inference code about test

hi! I write the python code about test single image , but the result of test, there's always a problem. Can you provide the python code about test single image. Thanks!

Did you try to cascaded with OHEM or just use OHEM in the third stage?

OHEM is useful in my experiment, did you try to combine OHEM with cascaded rcnn？ should I apply OHEM in every stage of rcnn?

when i try to test and evaluate the models, i got problem. Loading and preparing annotations... DONE (t=0.62s). Error using containers.Map/subsref The specified key is not present in this container. Error in caffe.Net/blobs (line 82) blob = self.blob_vec(self.name2blob_index(blob_name));

About the Source Code

Thanks a lot about your work, so do you have any plan to release the implementation source code, I only see the trained model and running scripts, I want to apply your method to my own datasets.

when i try to train the res101-9s-600-rfcn-cascade detector using my gpus 4,5 , it said Multi-GPU execution not avilable -rebuild with USE_NCCL. CAN'T use two GPUS???

Why no BatchNorm layer is inserted before Scale layer?

It seems that you didn't do normalization before scale the feature map, like here, so what's happening here?

Do you have a program to detect targets in unknown pictures?

Does your program have a detection file like py-faster-rcnn/tools/demo.py?

how to build my own VOC dataset?

Hi，I tired your voc_window_file.m scripts in data/voc/ ,but it did not work. I am not familiar with Matlab,can anybody help?Thank you.

training on custom dataset

I tried your code trained on my own dataset. But I found that your annotation format is .txt which is not standard voc format. I only have the annotations on my dataset of voc format (.xml files) or coco format (.json files). How could I do to use them for training on my dataset?

What's the shape of input data in test.prototxt

In detection_data_layer.cpp, with train.prototxt and test.prototxt, the shape of top data blob is 1,3,1312,800, but in deploy.prototxt I see input dim is 1,3,800,1312.

About the GPU memory when trainning

hi
thank you for sharing your work!
I have a problem when i trying to train the model.Since I have only one GTX1080 Gpu, it reports "out of memory"error,when i run the train script. I checked the train.prototxt,and i find the batch_size is 1.
So, how can I get it run or what is the lowest GPU memory should i have with which the code can run?

expect your reply,tahnk you.

multi-gpu training error

Trying examples/voc/res101-9s-600-rfcn-cascade. Single GPU training with some gpu id (e.g. gpu id 1 ok, but not 2) is ok, however when train with 2 GPUs, got following error quickly:

F0602 18:15:13.552311 13690 decode_bbox_layer.cpp:110] Check failed: keep_num > 0 (0 vs. 0)
*** Check failure stack trace: ***
F0602 18:15:13.553015 13740 decode_bbox_layer.cpp:110] Check failed: keep_num > 0 (0 vs. 0)
*** Check failure stack trace: ***
    @     0x7f1b5745b5cd  google::LogMessage::Fail()
    @     0x7f1b5745b5cd  google::LogMessage::Fail()
    @     0x7f1b5745d433  google::LogMessage::SendToLog()
    @     0x7f1b5745d433  google::LogMessage::SendToLog()
    @     0x7f1b5745b15b  google::LogMessage::Flush()
    @     0x7f1b5745b15b  google::LogMessage::Flush()
    @     0x7f1b5745de1e  google::LogMessageFatal::~LogMessageFatal()
    @     0x7f1b5745de1e  google::LogMessageFatal::~LogMessageFatal()
    @     0x7f1b57b9de37  caffe::DecodeBBoxLayer<>::Forward_cpu()
    @     0x7f1b57b9de37  caffe::DecodeBBoxLayer<>::Forward_cpu()
    @     0x7f1b57d245e7  caffe::Net<>::ForwardFromTo()
    @     0x7f1b57d245e7  caffe::Net<>::ForwardFromTo()
    @     0x7f1b57d24977  caffe::Net<>::Forward()
    @     0x7f1b57d1d878  caffe::Solver<>::Step()
    @     0x7f1b57d24977  caffe::Net<>::Forward()
    @     0x7f1b57d09d5e  caffe::Worker<>::InternalThreadEntry()
    @     0x7f1b57d1d878  caffe::Solver<>::Step()
    @     0x7f1b57b2c535  caffe::InternalThread::entry()
    @     0x7f1b57d1e39a  caffe::Solver<>::Solve()
    @     0x7f1b57b2d3fe  boost::detail::thread_data<>::run()
    @     0x7f1b493865d5  (unknown)
    @     0x7f1b57d0891c  caffe::NCCL<>::Run()
    @           0x411522  train()
    @           0x40c2eb  main
    @     0x7f1b560a36ba  start_thread
    @     0x7f1b55cf2830  __libc_start_main
    @           0x40d089  _start
    @     0x7f1b55dd941d  clone
    @              (nil)  (unknown)

coco models seem to be fine.

Error when compile the project about "detection_group_accuracy_layer.cu"

@zhaoweicai Thanks for your great idea and code. It's a nice work. However, when i compile, I got a error as follows:

src/caffe/layers/detection_group_accuracy_layer.cu(147): error: identifier "DetectionGroupAccuracyParameter" is undefined

src/caffe/layers/detection_group_accuracy_layer.cu(148): error: class "caffe::LayerParameter" has no member "detection_group_accuracy_param"
          detected during instantiation of "void caffe::DetectionGroupAccuracyLayer<Dtype>::Forward_gpu(const std::vector<caffe::Blob<Dtype> *, std::allocator<caffe::Blob<Dtype> *>> &, const std::vector<caffe::Blob<Dtype> *, std::allocator<caffe::Blob<Dtype> *>> &) [with Dtype=float]"
(233): here

src/caffe/layers/detection_group_accuracy_layer.cu(148): error: class "caffe::LayerParameter" has no member "detection_group_accuracy_param"
          detected during instantiation of "void caffe::DetectionGroupAccuracyLayer<Dtype>::Forward_gpu(const std::vector<caffe::Blob<Dtype> *, std::allocator<caffe::Blob<Dtype> *>> &, const std::vector<caffe::Blob<Dtype> *, std::allocator<caffe::Blob<Dtype> *>> &) [with Dtype=double]"
(233): here

3 errors detected in the compilation of "/tmp/tmpxft_00009594_00000000-11_detection_group_accuracy_layer.compute_61.cpp1.ii".
Makefile:594: recipe for target '.build_release/cuda/src/caffe/layers/detection_group_accuracy_layer.o' failed
make: *** [.build_release/cuda/src/caffe/layers/detection_group_accuracy_layer.o] Error 1
make: *** 正在等待未完成的任务....

Other files did have this problem. It seems that 'detection_group_accuracy_layer.cu' cannot be make successfully. Did it cause any problem? How to fix it? Could you offer me some help? Thank you very much.

Assign anchor parameter & augmentation

In detection data layer,I didn't find the background threshold parameter for assigning anchor label(0).Is it the ignore_fg_threshold parameter?To be honestly,I can't figure out the ignore_fg_threshold parameter's function. Can you solve my problem?
And I wonder whether augmentation is used in training voc baseline and voc cascade rcnn. Because I find that you use apply distort to image but your paper say "No data augmentation was used except standard horizontal image flipping"

About class agnostic

I think your work is wonderful. I have some questions about the implement. The paper said that choosing class agnostic is for simplicity. I want to know whether it's important for the final result? If using the way of class-aware when doing bounding box regression, how to refine the bbox?

I wonder whether the result of cascade stage 1 has the same result with base

The result of res50-15s-800-fpn-base is same as res50-15s-800-fpn-cascade stage 1 or not, I did train and test on my own datasets and found the result is not same, to my opinion, I think the result of base should be same as stage1 of cascade

cpu下make 时报错

.build_release/lib/libcaffe.so：对‘caffe::BoxGroupOutputLayer::Forward_gpu(std::vector<caffe::Blob, std::allocator<caffe::Blob> > const&, std::vector<caffe::Blob, std::allocator<caffe::Blob> > const&)’未定义的引用
.build_release/lib/libcaffe.so：对‘caffe::BoxGroupOutputLayer::Forward_gpu(std::vector<caffe::Blob, std::allocator<caffe::Blob> > const&, std::vector<caffe::Blob, std::allocator<caffe::Blob> > const&)’未定义的引用
.build_release/lib/libcaffe.so：对‘caffe::DetectionGroupAccuracyLayer::Forward_gpu(std::vector<caffe::Blob, std::allocator<caffe::Blob> > const&, std::vector<caffe::Blob, std::allocator<caffe::Blob> > const&)’未定义的引用
.build_release/lib/libcaffe.so：对‘caffe::ProposalTargetLayer::Forward_gpu(std::vector<caffe::Blob, std::allocator<caffe::Blob> > const&, std::vector<caffe::Blob, std::allocator<caffe::Blob> > const&)’未定义的引用
collect2: error: ld returned 1 exit status
Makefile:626: recipe for target '.build_release/tools/upgrade_solver_proto_text.bin' failed
make: *** [.build_release/tools/upgrade_solver_proto_text.bin] Error 1

abnormal log

i trained your code on my own dataset and noticed the abnormal log like this,

the output of bbox and rpn are -1. Could you please give me some advice?

ln: failed to create symbolic link 'build': Function not implemented

Some error when run make all -j32:

ln: failed to create symbolic link 'build': Function not implemented
Makefile:563: recipe for target '.build_release/.linked' failed
make: *** [.build_release/.linked] Error 1

I wonder why the bbox_pre layer's num_output is always 8

In the train.prototxt, show as follow:
layer {
name: "bbox_pred"
type: "InnerProduct"
bottom: "data"
top: "bbox_pred"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
inner_product_param {
num_output: 8
weight_filler {
type: "gaussian"
std: 0.001
}
bias_filler {
type: "constant"
value: 0
}
}
}
I don't understand why the num_output is 8

about regression targets

In your code，the regression targets of rcnn is （dx，dy，dw，dh），but in your evaluation script，it seems like that you just take out the outputs from bbox_pred_1st/2nd/3rd layers and use them as the four locations （x1，y1，x2，y2）of rois，can you explain this for me？