Giter Club home page Giter Club logo

cascade-rcnn's Introduction

Cascade R-CNN: Delving into High Quality Object Detection

by Zhaowei Cai and Nuno Vasconcelos

This repository is written by Zhaowei Cai at UC San Diego.

Introduction

This repository implements mulitple popular object detection algorithms, including Faster R-CNN, R-FCN, FPN, and our recently proposed Cascade R-CNN, on the MS-COCO and PASCAL VOC datasets. Multiple choices are available for backbone network, including AlexNet, VGG-Net and ResNet. It is written in C++ and powered by Caffe deep learning toolbox.

Cascade R-CNN is a multi-stage extension of the popular two-stage R-CNN object detection framework. The goal is to obtain high quality object detection, which can effectively reject close false positives. It consists of a sequence of detectors trained end-to-end with increasing IoU thresholds, to be sequentially more selective against close false positives. The output of a previous stage detector is forwarded to a later stage detector, and the detection results will be improved stage by stage. This idea can be applied to any detector based on the two-stage R-CNN framework, including Faster R-CNN, R-FCN, FPN, Mask R-CNN, etc, and reliable gains are available independently of baseline strength. A vanilla Cascade R-CNN on FPN detector of ResNet-101 backbone network, without any training or inference bells and whistles, achieved state-of-the-art results on the challenging MS-COCO dataset.

Update

The re-implementation of Cascade R-CNN in Detectron has been released. See Detectron-Cascade-RCNN. Very consistent improvements are available for all tested models, independent of baseline strength.

It is also recommended to use the third-party implementation, mmdetection based on PyTorch and tensorpack based on TensorFlow.

Citation

If you use our code/model/data, please cite our paper:

@inproceedings{cai18cascadercnn,
  author = {Zhaowei Cai and Nuno Vasconcelos},
  Title = {Cascade R-CNN: Delving into High Quality Object Detection},
  booktitle = {CVPR},
  Year  = {2018}
}

or its extension:

@article{cai2019cascadercnn,
  author = {Zhaowei Cai and Nuno Vasconcelos},
  title = {Cascade R-CNN: High Quality Object Detection and Instance Segmentation},
  journal = {arXiv preprint arXiv:1906.09756},
  year = {2019}
}

Benchmarking

We benchmark mulitple detector models on the MS-COCO and PASCAL VOC datasets in the below tables.

  1. MS-COCO (Train/Test: train2017/val2017, shorter size: 800 for FPN and 600 for the others)
model #GPUs bs lr iter train time test time AP AP50 AP75
VGG-RPN-baseline     2 4   3e-3 100k 12.5 hr 0.075s 23.6 43.9 23.0
VGG-RPN-Cascade     2 4   3e-3 100k 15.5 hr 0.115s 27.0 44.2 27.7
Res50-RFCN-baseline     4 1   3e-3 280k 19 hr 0.07s 27.0 48.7 26.9
Res50-RFCN-Cascade     4 1   3e-3 280k 22.5 hr 0.075s 31.1 49.8 32.8
Res101-RFCN-baseline     4 1   3e-3 280k 29 hr 0.075s 30.3 52.2 30.8
Res101-RFCN-Cascade     4 1   3e-3 280k 30.5 hr 0.085s 33.3 52.0 35.2
Res50-FPN-baseline     8 1   5e-3 280k 32 hr 0.095s 36.5 58.6 39.2
Res50-FPN-Cascade     8 1   5e-3 280k 36 hr 0.115s 40.3 59.4 43.7
Res101-FPN-baseline     8 1   5e-3 280k 37 hr 0.115s 38.5 60.6 41.7
Res101-FPN-Cascade     8 1   5e-3 280k 46 hr 0.14s 42.7 61.6 46.6
  1. PASCAL VOC 2007 (Train/Test: 2007+2012trainval/2007test, shorter size: 600)
model #GPUs bs lr iter train time AP AP50 AP75
Alex-RPN-baseline     2 4   1e-3 45k 2.5 hr 29.4 63.2 23.7
Alex-RPN-Cascade     2 4   1e-3 45k 3 hr 38.9 66.5 40.5
VGG-RPN-baseline     2 4   1e-3 45k 6 hr 42.9 76.4 44.1
VGG-RPN-Cascade     2 4   1e-3 45k 7.5 hr 51.2 79.1 56.3
Res50-RFCN-baseline     2 2   2e-3 90k 8 hr 44.8 77.5 46.8
Res50-RFCN-Cascade     2 2   2e-3 90k 9 hr 51.8 78.5 57.1
Res101-RFCN-baseline     2 2   2e-3 90k 10.5 hr 49.4 79.8 53.2
Res101-RFCN-Cascade     2 2   2e-3 90k 12 hr 54.2 79.6 59.2

NOTE. In the above tables, all models have been run at least two times with close results. The training is relatively stable. RPN means Faster R-CNN. The annotations of PASCAL VOC are transformed to COCO format, and COCO API was used for evaluation. The results are different from the official VOC evaluation. If you want to compare the VOC results in publication, please use the official VOC code for evaluation.

Requirements

  1. NVIDIA GPU and cuDNN are required to have fast speeds. For now, CUDA 8.0 with cuDNN 6.0.20 has been tested. The other versions should be working.

  2. Caffe MATLAB wrapper is required to run the detection/evaluation demo.

Installation

  1. Clone the Cascade-RCNN repository, and we'll call the directory that you cloned Cascade-RCNN into CASCADE_ROOT

    git clone https://github.com/zhaoweicai/cascade-rcnn.git
  2. Build Cascade-RCNN

    cd $CASCADE_ROOT/
    # Follow the Caffe installation instructions here:
    #   http://caffe.berkeleyvision.org/installation.html
    
    # If you're experienced with Caffe and have all of the requirements installed
    # and your Makefile.config in place, then simply do:
    make all -j 16
    
    # If you want to run Cascade-RCNN detection/evaluation demo, build MATLAB wrapper as well
    make matcaffe

Datasets

If you already have a COCO/VOC copy but not as organized as below, you can simply create Symlinks to have the same directory structure.

MS-COCO

In all MS-COCO experiments, we use train2017 for training, and val2017 (a.k.a. minival) for validation. Follow MS-COCO website to download images/annotations, and set-up the COCO API.

Assumed that your local COCO dataset copy is at /your/path/to/coco, make sure it has the following directory structure:

coco
|_ images
  |_ train2017
  |  |_ <im-1-name>.jpg
  |  |_ ...
  |  |_ <im-N-name>.jpg
  |_ val2017
  |_ ...
|_ annotations
   |_ instances_train2017.json
   |_ instances_val2017.json
   |_ ...
|_ MatlabAPI

PASCAL VOC

In all PASCAL VOC experiments, we use VOC2007+VOC2012 trainval for training, and VOC2007 test for validation. Follow PASCAL VOC website to download images/annotations, and set-up the VOCdevkit.

Assumed that your local VOCdevkit copy is at /your/path/to/VOCdevkit, make sure it has the following directory structure:

VOCdevkit
|_ VOC2007
  |_ JPEGImages
  |  |_ <000001>.jpg
  |  |_ ...
  |  |_ <009963>.jpg
  |_ Annotations
  |  |_ <000001>.xml
  |  |_ ...
  |  |_ <009963>.xml
  |_ ...
|_ VOC2012
  |_ JPEGImages
  |  |_ <2007_000027>.jpg
  |  |_ ...
  |  |_ <2012_004331>.jpg
  |_ Annotations
  |  |_ <2007_000027>.xml
  |  |_ ...
  |  |_ <2012_004331>.xml
  |_ ...
|_ VOCcode

Training Cascade-RCNN

  1. Get the training data

    cd $CASCADE_ROOT/data/
    sh get_coco_data.sh

    This will download the window files required for the experiments. You can also use the provided MATLAB scripts coco_window_file.m under $CASCADE_ROOT/data/coco/ to generate your own window files.

  2. Download the pretrained models on ImageNet. For AlexNet and VGG-Net, the FC layers are pruned and 2048 units per FC layer are remained. In addition, the two FC layers are copied three times for Cascade R-CNN training. For ResNet, the BatchNorm layers are merged into Scale layers and frozen during training as common practice.

    cd $CASCADE_ROOT/models/
    sh fetch_vggnet.sh
  3. Multiple shell scripts are provided to train Cascade-RCNN on different baseline detectors as described in our paper. Under each model folder, you need to change the root_folder of the data layer in train.prototxt and test.prototxt to your COCO path. After that, you can start to train your own Cascade-RCNN models. Take vgg-12s-600-rpn-cascade for example.

    cd $CASCADE_ROOT/examples/coco/vgg-12s-600-rpn-cascade/
    sh train_detection.sh

    Log file will be generated along the training procedure. The total training time depends on the complexity of models and datasets. If you want to quickly check if the training works well, try the light AlexNet model on VOC dataset.

NOTE. Occasionally, the training of the Res101-FPN-Cascade will be out of memory. Just resume the training from the latest solverstate.

Pretrained Models

We only provide the Res50-FPN-baseline, Res50-FPN-Cascade and Res101-FPN-Cascade models for COCO dataset, and Res101-RFCN-Cascade for VOC dataset.

Download pre-trained models

cd $CASCADE_ROOT/examples/coco/
sh fetch_cascadercnn_models.sh

The pretrained models produce exactly the same results as described in our paper.

Testing/Evaluation Demo

Once the models pretrained or trained by yourself are available, you can use the MATLAB script run_cascadercnn_coco.m to obtain the detection and evaluation results. Set the right dataset path and choose the model of your interest to test in the demo script. The default setting is for the pretrained model. The final detection results will be saved under $CASCADE_ROOT/examples/coco/detections/ and the evaluation results will be saved under the model folder.

You also can run the shell script test_coco_detection.sh under each model folder for evalution, but it is not identical to the official evaluation. For publication, use the MATLAB script.

Disclaimer

  1. When we were re-implementing the FPN framework and roi_align layer, we only referred to their published papers. Thus, our implementation details could be different from the official Detectron.

If you encounter any issue when using our code or model, please let me know.

cascade-rcnn's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cascade-rcnn's Issues

A problom about batch size

When I train the model with coco dataset and fpn-cascade model, it seems that the batch size can only be 1. How can I train the model with larger batch size?

don't remove redundant high IOU boxes in DecodeBBox operator

I tried to transplant cascade RCNN to Detectron(using COCO2017train+val as training dataset), I haven't remove redundant high IOU boxes in DecodeBBox operator, the results came out not as expected. Details are as below:

2018-06-22 5 13 58

I wonder whether this operation would bring bad effects to the detection performance, can you give me any advice?
@zhaoweicai

Questions about multi-gpu and multiple batch size

I haven't run caffe with multiple gpus before, so I'm a little confused that why the batch size can set larger than 1 in Cascade R-CNN given that the input images are not resized to exactly the same size? As far as I know, the image size should equal in a single batch.

Confusion about parameter settings in ProposalTarget layer

In the train.prototxt, what are the two parameters: "img_width" and "img_height" used for? do I need to change them according to my own dataset( like KITTI object detection task, the images are ~375x1240) ?

proposal_target_param {
    cls_num: 81
    batch_size: 512
    fg_fraction: 0.25
    num_img_per_batch: 4
    fg_thr: 0.5
    bg_thr_hg: 0.5
    bg_thr_lw: 0.0
    img_width: 600
    img_height: 600
  }

Thanks in advance!

Train error: many params are -1, can't save the trained model

I0806 23:44:24.048591 20123 solver.cpp:219] Iteration 9900 (2.14913 iter/s, 46.5305s/100 iters), loss = 0.440841
I0806 23:44:24.048627 20123 solver.cpp:238] Train net output #0: bbox_iou = -1
I0806 23:44:24.048635 20123 solver.cpp:238] Train net output #1: bbox_iou_2nd = -1
I0806 23:44:24.048638 20123 solver.cpp:238] Train net output #2: bbox_iou_3rd = -1
I0806 23:44:24.048641 20123 solver.cpp:238] Train net output #3: bbox_iou_pre = -1
I0806 23:44:24.048645 20123 solver.cpp:238] Train net output #4: bbox_iou_pre_2nd = -1
I0806 23:44:24.048648 20123 solver.cpp:238] Train net output #5: bbox_iou_pre_3rd = -1
I0806 23:44:24.048651 20123 solver.cpp:238] Train net output #6: cls_accuracy = 0.984375
I0806 23:44:24.048655 20123 solver.cpp:238] Train net output #7: cls_accuracy_2nd = 0.972656
I0806 23:44:24.048658 20123 solver.cpp:238] Train net output #8: cls_accuracy_3rd = 0.964844
I0806 23:44:24.048666 20123 solver.cpp:238] Train net output #9: loss_bbox = 0.0117847 (* 1 = 0.0117847 loss)
I0806 23:44:24.048671 20123 solver.cpp:238] Train net output #10: loss_bbox_2nd = 0.0129223 (* 0.5 = 0.00646114 loss)
I0806 23:44:24.048676 20123 solver.cpp:238] Train net output #11: loss_bbox_3rd = 0.00699362 (* 0.25 = 0.0017484 loss)
I0806 23:44:24.048681 20123 solver.cpp:238] Train net output #12: loss_cls = 0.0294972 (* 1 = 0.0294972 loss)
I0806 23:44:24.048686 20123 solver.cpp:238] Train net output #13: loss_cls_2nd = 0.0663875 (* 0.5 = 0.0331937 loss)
I0806 23:44:24.048689 20123 solver.cpp:238] Train net output #14: loss_cls_3rd = 0.0622066 (* 0.25 = 0.0155517 loss)
I0806 23:44:24.048696 20123 solver.cpp:238] Train net output #15: rpn_accuracy = 0.999953
I0806 23:44:24.048701 20123 solver.cpp:238] Train net output #16: rpn_accuracy = -1
I0806 23:44:24.048703 20123 solver.cpp:238] Train net output #17: rpn_bboxiou = -1
I0806 23:44:24.048708 20123 solver.cpp:238] Train net output #18: rpn_loss = 0.000343773 (* 1 = 0.000343773 loss)
I0806 23:44:24.048713 20123 solver.cpp:238] Train net output #19: rpn_loss = 0 (* 1 = 0 loss)
I0806 23:44:24.048717 20123 sgd_solver.cpp:105] Iteration 9900, lr = 0.0002
I0806 23:45:10.848093 20123 solver.cpp:587] Snapshotting to binary proto file /disk1/g201708021059/cascade-rcnn/examples/voc/res101-9s-600-rfcn-cascade/log/cascadercnn_voc_iter_10000.caffemodel
*** Aborted at 1533570310 (unix time) try "date -d @1533570310" if you are using GNU date ***
PC: @ 0x7f55674532e7 caffe::Layer<>::ToProto()
*** SIGSEGV (@0x0) received by PID 20123 (TID 0x7f55682b49c0) from PID 0; stack trace: ***
@ 0x7f5565dedcb0 (unknown)
@ 0x7f55674532e7 caffe::Layer<>::ToProto()
@ 0x7f55675d7533 caffe::Net<>::ToProto()
@ 0x7f55675f415f caffe::Solver<>::SnapshotToBinaryProto()
@ 0x7f55675f42f2 caffe::Solver<>::Snapshot()
@ 0x7f55675f7f7a caffe::Solver<>::Step()
@ 0x7f55675f8994 caffe::Solver<>::Solve()
@ 0x40d4c0 train()
@ 0x408d32 main
@ 0x7f5565dd8f45 (unknown)
@ 0x409442 (unknown)
@ 0x0 (unknown)

thanks a lot

accuracy

hi ,I trained the network on the coco2017,but test on coco2017val, the 3rd stage accuracy is ap36.9.
is there something wrong?
I can get the ap42.7 on the released model coco_iter_280000.caffemodel, should I use the solver_step.prototxt instead of solver.prototxt to get the accuracy?

(nil) (unknown) Aborted (core dumped)

When I try to run you training script to train a resnet101 cascade faster rcnn, I encounter a error like this:

F0530 22:30:27.787274 1077 syncedmem.cpp:71] Check failed: error == cudaSuccess (2 vs. 0) out of memory
*** Check failure stack trace: ***
@ 0x7ff7618205cd google::LogMessage::Fail()
.............................
Aborted (core dumped)

i know it because memory is not enough.but When I try to run you training script to train a resnet50 cascade faster rcnn, I encounter the other error like this:

@     0x7f999f9855d5  (unknown)

I0530 22:31:45.417707 1167 layer_factory.hpp:77] Creating layer scale2a_branch2b
I0530 22:31:45.418323 1167 net.cpp:122] Setting up scale2a_branch2b
I0530 22:31:45.418344 1167 net.cpp:129] Top shape: 1 64 250 150 (2400000)
I0530 22:31:45.418352 1167 net.cpp:137] Memory required for data: 276001080
I0530 22:31:45.418380 1167 layer_factory.hpp:77] Creating layer res2a_branch2b_relu
I0530 22:31:45.418406 1167 net.cpp:84] Creating Layer res2a_branch2b_relu
I0530 22:31:45.418423 1167 net.cpp:406] res2a_branch2b_relu <- res2a_branch2b
I0530 22:31:45.418457 1167 net.cpp:367] res2a_branch2b_relu -> res2a_branch2b (in-place)
@ 0x7f999a5966ba start_thread
I0530 22:31:45.418839 1167 net.cpp:122] Setting up res2a_branch2b_relu
I0530 22:31:45.418860 1167 net.cpp:129] Top shape: 1 64 250 150 (2400000)
I0530 22:31:45.418870 1167 net.cpp:137] Memory required for data: 285601080
I0530 22:31:45.418885 1167 layer_factory.hpp:77] Creating layer res2a_branch2c
I0530 22:31:45.418934 1167 net.cpp:84] Creating Layer res2a_branch2c
I0530 22:31:45.418954 1167 net.cpp:406] res2a_branch2c <- res2a_branch2b
I0530 22:31:45.418989 1167 net.cpp:380] res2a_branch2c -> res2a_branch2c
@ 0x7f99aa19641d clone
@ (nil) (unknown)
Aborted (core dumped)

22/5000
Why are these two errors different? Can you give me some advice about the latter?

cannot run train_detection.sh

When I try to run you training script to train a vgg cascade faster rcnn, I encounter a error like this:

I0426 15:14:21.523888 14111 layer_factory.hpp:77] Creating layer data
F0426 15:14:21.523939 14111 layer_factory.hpp:81] Check failed: registry.count(type) == 1 (0 vs. 1) Unknown layer type: DetectionData (known types: AbsVal, Accuracy, ArgMax, BNLL, BatchNorm, BatchReindex, BboxAccuracy, Bias, BoxGroupOutput, Concat, ContrastiveLoss, Convolution, Crop, Data, DecodeBBox, Deconvolution, DetectionEvaluate, DetectionGroupAccuracy, DetectionGroupLoss, Dropout, DummyData, ELU, Eltwise, Embed, EuclideanLoss, Exp, Filter, Flatten, FrcnnOutput, HDF5Data, HDF5Output, HardMining, HingeLoss, Im2col, InfogainLoss, InnerProduct, Input, LRN, LSTM, LSTMUnit, Log, MVN, MemoryData, MultinomialLogisticLoss, PReLU, PSROIPooling, Parameter, Pooling, Power, ProposalTarget, Python, RNN, ROIAlign, ROIPooling, ReLU, Reduction, Reshape, SPP, Scale, ScaleRoute, Sigmoid, SigmoidCrossEntropyLoss, Silence, Slice, SmoothL1Loss, Softmax, SoftmaxWithLoss, Split, TanH, Threshold, Tile)
*** Check failure stack trace: ***
@ 0x7f87d5321daa (unknown)
@ 0x7f87d5321ce4 (unknown)
@ 0x7f87d53216e6 (unknown)
@ 0x7f87d5324687 (unknown)
@ 0x7f87d59a7f89 caffe::LayerRegistry<>::CreateLayer()
@ 0x7f87d59aa488 caffe::Net<>::Init()
@ 0x7f87d59ac262 caffe::Net<>::Net()
@ 0x7f87d5b3eab0 caffe::Solver<>::InitTrainNet()
@ 0x7f87d5b3efc3 caffe::Solver<>::Init()
@ 0x7f87d5b3f27f caffe::Solver<>::Solver()
@ 0x7f87d5b31241 caffe::Creator_SGDSolver<>()
@ 0x414437 caffe::SolverRegistry<>::CreateSolver()
@ 0x40dbd5 train()
@ 0x40a02c main
@ 0x7f87d3b81f45 (unknown)
@ 0x40aa6b (unknown)
@ (nil) (unknown)
Aborted (core dumped)

I have checked the caffe.proto and the parameters of detection_data_layer are already registered, can you give me some advise?

questions on the fg/bg resampling mechanism.

In the proposal_target_layer.cpp at line 169 to line 256, the fg and bg box indexes are sampled with different way compared with faster rcnn.

It seems that the fg/bg samples in cascade rcnn are not allowed repeated sampling, and if the sampled bg boxes are not adequate enough, some boxes are picked up from discarded pool.

This is very different from faster rcnn. I was wondering whether this mechanism is important for cascade rcnn.

By the way, I'm also trying to reimplement cascade rcnn with tensorflow api, and still use the resampling method in faster rcnn, but the bbox regression loss curves in the three stages look strange (loss_box_stage1 > loss_box_stage2 > loss_box_stage3). Does this result relate to the sampling method, or simply the normalization statistics (stds and means) are not set well?

  • ResNet-101-v1 + Faster RCNN + cascade detection, trained on coco2017 trainset.
    image

  • Visualize the training log of res50-15s-800-fpn-cascade model in cascade rcnn coco examples.
    image

test networks have more branches than training networks?

I have visualized your provided prototxt files for both training and test networks. I found that the test network has additional fully-connected layers which does not exist in training network, i.e.:

  • stage2: [roi_pool_2nd] --> FC6/FC7 --> [cls_prob_1st_2nd] -->...

  • stage3: [roi_pool_3rd] --> FC6/FC7/FC/Softmax --> [cls_prob_1st_3rd] -->...
                [roi_pool_3rd] --> FC6/FC7/FC/Softmax --> [cls_prob_2nd_3rd] -->...

I was confused that if these additional branches are not trained, how can they be used in test phase? Or the parameters of these branches are simply copied from cls prediction branch in RCNN? In this case, will these branches in the same stage predict the same cls_prob?

cannot run train_detection.sh

When I try to run you training script to train a vgg cascade faster rcnn, I encounter a error like this:

I0530 22:20:18.926313 31741 cudnn_conv_layer.cpp:194] Reallocating workspace storage: 786432
I0530 22:20:18.926367 31741 net.cpp:122] Setting up conv1_2
I0530 22:20:18.926391 31741 net.cpp:129] Top shape: 1 64 1000 600 (38400000)
I0530 22:20:18.926401 31741 net.cpp:137] Memory required for data: 468001080
I0530 22:20:18.926450 31741 layer_factory.hpp:77] Creating layer relu1_2
I0530 22:20:18.926482 31741 net.cpp:84] Creating Layer relu1_2
I0530 22:20:18.926501 31741 net.cpp:406] relu1_2 <- conv1_2
I0530 22:20:18.926537 31741 net.cpp:367] relu1_2 -> conv1_2 (in-place)
I0530 22:20:18.926864 31741 net.cpp:122] Setting up relu1_2
I0530 22:20:18.926888 31741 net.cpp:129] Top shape: 1 64 1000 600 (38400000)
I0530 22:20:18.926898 31741 net.cpp:137] Memory required for data: 621601080
I0530 22:20:18.926913 31741 layer_factory.hpp:77] Creating layer pool1
I0530 22:20:18.926946 31741 net.cpp:84] Creating Layer pool1
I0530 22:20:18.926964 31741 net.cpp:406] pool1 <- conv1_2
I0530 22:20:18.927002 31741 net.cpp:380] pool1 -> pool1
@ 0x7f0aea498368 boost::_bi::list7<>::operator()<>()
I0530 22:20:18.927142 31741 net.cpp:122] Setting up pool1
I0530 22:20:18.927165 31741 net.cpp:129] Top shape: 1 64 500 300 (9600000)
I0530 22:20:18.927175 31741 net.cpp:137] Memory required for data: 660001080
I0530 22:20:18.927188 31741 layer_factory.hpp:77] Creating layer conv2_1
I0530 22:20:18.927238 31741 net.cpp:84] Creating Layer conv2_1
I0530 22:20:18.927258 31741 net.cpp:406] conv2_1 <- pool1
I0530 22:20:18.927299 31741 net.cpp:380] conv2_1 -> conv2_1
I0530 22:20:18.930271 31741 cudnn_conv_layer.cpp:194] Reallocating workspace storage: 1572864
@ 0x7f0aea498242 boost::_bi::bind_t<>::operator()()
I0530 22:20:18.931128 31741 net.cpp:122] Setting up conv2_1
I0530 22:20:18.931154 31741 net.cpp:129] Top shape: 1 128 500 300 (19200000)
I0530 22:20:18.931165 31741 net.cpp:137] Memory required for data: 736801080
I0530 22:20:18.931210 31741 layer_factory.hpp:77] Creating layer relu2_1
I0530 22:20:18.931241 31741 net.cpp:84] Creating Layer relu2_1
I0530 22:20:18.931260 31741 net.cpp:406] relu2_1 <- conv2_1
I0530 22:20:18.931291 31741 net.cpp:367] relu2_1 -> conv2_1 (in-place)
I0530 22:20:18.931601 31741 net.cpp:122] Setting up relu2_1
I0530 22:20:18.931622 31741 net.cpp:129] Top shape: 1 128 500 300 (19200000)
I0530 22:20:18.931632 31741 net.cpp:137] Memory required for data: 813601080
I0530 22:20:18.931646 31741 layer_factory.hpp:77] Creating layer conv2_2
I0530 22:20:18.931684 31741 net.cpp:84] Creating Layer conv2_2
I0530 22:20:18.931702 31741 net.cpp:406] conv2_2 <- conv2_1
I0530 22:20:18.931744 31741 net.cpp:380] conv2_2 -> conv2_2
@ 0x7f0aea4981f4 boost::detail::thread_data<>::run()
@ 0x7f0add9e45d5 (unknown)
@ 0x7f0ad85f56ba start_thread
@ 0x7f0ae81f541d clone
@ (nil) (unknown)

could you give me some advices?

bbox_util

void DecodeBBoxesWithPrior(const Dtype* bbox_data, const vector prior_bboxes,
const int bbox_dim, const Dtype* means, const Dtype* stds,
Dtype* pred_data) {
const int num = prior_bboxes.size();
const int cls_num = bbox_dim/4;
for (int i = 0; i < num; i++) {
Dtype pw, ph, cx, cy;
pw = prior_bboxes[i].xmax-prior_bboxes[i].xmin+1;
ph = prior_bboxes[i].ymax-prior_bboxes[i].ymin+1;
cx = 0.5*(prior_bboxes[i].xmax+prior_bboxes[i].xmin);
cy = 0.5*(prior_bboxes[i].ymax+prior_bboxes[i].ymin);
for (int c = 0; c < cls_num; c++) {
Dtype bx, by, bw, bh;
// bbox de-normalization
bx = bbox_data[ibbox_dim+4c]stds[0]+means[0];
by = bbox_data[i
bbox_dim+4c+1]stds[1]+means[1];
bw = bbox_data[i
bbox_dim+4
c+2]stds[2]+means[2];
bh = bbox_data[i
bbox_dim+4*c+3]*stds[3]+means[3];

  Dtype tx, ty, tw, th;
  tx = bx*pw+cx; ty = by*ph+cy;
  tw = pw*exp(bw); th = ph*exp(bh);
  tx -= (tw-1)/2; ty -= (th-1)/2;
  pred_data[i*bbox_dim+4*c] = tx; 
  pred_data[i*bbox_dim+4*c+1] = ty;
  pred_data[i*bbox_dim+4*c+2] = tx+tw-1; 
  pred_data[i*bbox_dim+4*c+3] = ty+th-1;
}

}
}
What function does this part of the code implement?

how to train it on my own dataset

hi! I want to train cascade-rcnn on my own dataset (three classes). I don't know how to modify the files(eg. examples/voc/). Can you give me some instructions? Thank you!

Python inference code about test

hi! I write the python code about test single image , but the result of test, there's always a problem. Can you provide the python code about test single image. Thanks!

About the Source Code

Thanks a lot about your work, so do you have any plan to release the implementation source code, I only see the trained model and running scripts, I want to apply your method to my own datasets.

how to build my own VOC dataset?

Hi,I tired your voc_window_file.m scripts in data/voc/ ,but it did not work. I am not familiar with Matlab,can anybody help?Thank you.

training on custom dataset

I tried your code trained on my own dataset. But I found that your annotation format is .txt which is not standard voc format. I only have the annotations on my dataset of voc format (.xml files) or coco format (.json files). How could I do to use them for training on my dataset?

About the GPU memory when trainning

hi
thank you for sharing your work!
I have a problem when i trying to train the model.Since I have only one GTX1080 Gpu, it reports "out of memory"error,when i run the train script. I checked the train.prototxt,and i find the batch_size is 1.
So, how can I get it run or what is the lowest GPU memory should i have with which the code can run?

expect your reply,tahnk you.

multi-gpu training error

Trying examples/voc/res101-9s-600-rfcn-cascade. Single GPU training with some gpu id (e.g. gpu id 1 ok, but not 2) is ok, however when train with 2 GPUs, got following error quickly:

F0602 18:15:13.552311 13690 decode_bbox_layer.cpp:110] Check failed: keep_num > 0 (0 vs. 0)
*** Check failure stack trace: ***
F0602 18:15:13.553015 13740 decode_bbox_layer.cpp:110] Check failed: keep_num > 0 (0 vs. 0)
*** Check failure stack trace: ***
    @     0x7f1b5745b5cd  google::LogMessage::Fail()
    @     0x7f1b5745b5cd  google::LogMessage::Fail()
    @     0x7f1b5745d433  google::LogMessage::SendToLog()
    @     0x7f1b5745d433  google::LogMessage::SendToLog()
    @     0x7f1b5745b15b  google::LogMessage::Flush()
    @     0x7f1b5745b15b  google::LogMessage::Flush()
    @     0x7f1b5745de1e  google::LogMessageFatal::~LogMessageFatal()
    @     0x7f1b5745de1e  google::LogMessageFatal::~LogMessageFatal()
    @     0x7f1b57b9de37  caffe::DecodeBBoxLayer<>::Forward_cpu()
    @     0x7f1b57b9de37  caffe::DecodeBBoxLayer<>::Forward_cpu()
    @     0x7f1b57d245e7  caffe::Net<>::ForwardFromTo()
    @     0x7f1b57d245e7  caffe::Net<>::ForwardFromTo()
    @     0x7f1b57d24977  caffe::Net<>::Forward()
    @     0x7f1b57d1d878  caffe::Solver<>::Step()
    @     0x7f1b57d24977  caffe::Net<>::Forward()
    @     0x7f1b57d09d5e  caffe::Worker<>::InternalThreadEntry()
    @     0x7f1b57d1d878  caffe::Solver<>::Step()
    @     0x7f1b57b2c535  caffe::InternalThread::entry()
    @     0x7f1b57d1e39a  caffe::Solver<>::Solve()
    @     0x7f1b57b2d3fe  boost::detail::thread_data<>::run()
    @     0x7f1b493865d5  (unknown)
    @     0x7f1b57d0891c  caffe::NCCL<>::Run()
    @           0x411522  train()
    @           0x40c2eb  main
    @     0x7f1b560a36ba  start_thread
    @     0x7f1b55cf2830  __libc_start_main
    @           0x40d089  _start
    @     0x7f1b55dd941d  clone
    @              (nil)  (unknown)

coco models seem to be fine.

Error when compile the project about "detection_group_accuracy_layer.cu"

@zhaoweicai Thanks for your great idea and code. It's a nice work. However, when i compile, I got a error as follows:

src/caffe/layers/detection_group_accuracy_layer.cu(147): error: identifier "DetectionGroupAccuracyParameter" is undefined

src/caffe/layers/detection_group_accuracy_layer.cu(148): error: class "caffe::LayerParameter" has no member "detection_group_accuracy_param"
          detected during instantiation of "void caffe::DetectionGroupAccuracyLayer<Dtype>::Forward_gpu(const std::vector<caffe::Blob<Dtype> *, std::allocator<caffe::Blob<Dtype> *>> &, const std::vector<caffe::Blob<Dtype> *, std::allocator<caffe::Blob<Dtype> *>> &) [with Dtype=float]"
(233): here

src/caffe/layers/detection_group_accuracy_layer.cu(148): error: class "caffe::LayerParameter" has no member "detection_group_accuracy_param"
          detected during instantiation of "void caffe::DetectionGroupAccuracyLayer<Dtype>::Forward_gpu(const std::vector<caffe::Blob<Dtype> *, std::allocator<caffe::Blob<Dtype> *>> &, const std::vector<caffe::Blob<Dtype> *, std::allocator<caffe::Blob<Dtype> *>> &) [with Dtype=double]"
(233): here

3 errors detected in the compilation of "/tmp/tmpxft_00009594_00000000-11_detection_group_accuracy_layer.compute_61.cpp1.ii".
Makefile:594: recipe for target '.build_release/cuda/src/caffe/layers/detection_group_accuracy_layer.o' failed
make: *** [.build_release/cuda/src/caffe/layers/detection_group_accuracy_layer.o] Error 1
make: *** 正在等待未完成的任务....

Other files did have this problem. It seems that 'detection_group_accuracy_layer.cu' cannot be make successfully. Did it cause any problem? How to fix it? Could you offer me some help? Thank you very much.

Assign anchor parameter & augmentation

In detection data layer,I didn't find the background threshold parameter for assigning anchor label(0).Is it the ignore_fg_threshold parameter?To be honestly,I can't figure out the ignore_fg_threshold parameter's function. Can you solve my problem?
And I wonder whether augmentation is used in training voc baseline and voc cascade rcnn. Because I find that you use apply distort to image but your paper say "No data augmentation was used except standard horizontal image flipping"

About class agnostic

I think your work is wonderful. I have some questions about the implement. The paper said that choosing class agnostic is for simplicity. I want to know whether it's important for the final result? If using the way of class-aware when doing bounding box regression, how to refine the bbox?

cpu下make 时报错

.build_release/lib/libcaffe.so:对‘caffe::BoxGroupOutputLayer::Forward_gpu(std::vector<caffe::Blob, std::allocator<caffe::Blob> > const&, std::vector<caffe::Blob, std::allocator<caffe::Blob> > const&)’未定义的引用
.build_release/lib/libcaffe.so:对‘caffe::BoxGroupOutputLayer::Forward_gpu(std::vector<caffe::Blob, std::allocator<caffe::Blob> > const&, std::vector<caffe::Blob, std::allocator<caffe::Blob> > const&)’未定义的引用
.build_release/lib/libcaffe.so:对‘caffe::DetectionGroupAccuracyLayer::Forward_gpu(std::vector<caffe::Blob, std::allocator<caffe::Blob> > const&, std::vector<caffe::Blob, std::allocator<caffe::Blob> > const&)’未定义的引用
.build_release/lib/libcaffe.so:对‘caffe::ProposalTargetLayer::Forward_gpu(std::vector<caffe::Blob, std::allocator<caffe::Blob> > const&, std::vector<caffe::Blob, std::allocator<caffe::Blob> > const&)’未定义的引用
collect2: error: ld returned 1 exit status
Makefile:626: recipe for target '.build_release/tools/upgrade_solver_proto_text.bin' failed
make: *** [.build_release/tools/upgrade_solver_proto_text.bin] Error 1

abnormal log

i trained your code on my own dataset and noticed the abnormal log like this,
xr_2rjxr8ah5a zqxc xcd
the output of bbox and rpn are -1. Could you please give me some advice?

I wonder why the bbox_pre layer's num_output is always 8

In the train.prototxt, show as follow:
layer {
name: "bbox_pred"
type: "InnerProduct"
bottom: "data"
top: "bbox_pred"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
inner_product_param {
num_output: 8
weight_filler {
type: "gaussian"
std: 0.001
}
bias_filler {
type: "constant"
value: 0
}
}
}
I don't understand why the num_output is 8

about regression targets

In your code,the regression targets of rcnn is (dx,dy,dw,dh),but in your evaluation script,it seems like that you just take out the outputs from bbox_pred_1st/2nd/3rd layers and use them as the four locations (x1,y1,x2,y2)of rois,can you explain this for me?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.