matterport / mask_rcnn Goto Github PK

Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow

License: Other

Python 100.00%

mask-rcnn tensorflow object-detection instance-segmentation keras

mask_rcnn's Introduction

Mask R-CNN for Object Detection and Segmentation

This is an implementation of Mask R-CNN on Python 3, Keras, and TensorFlow. The model generates bounding boxes and segmentation masks for each instance of an object in the image. It's based on Feature Pyramid Network (FPN) and a ResNet101 backbone.

The repository includes:

Source code of Mask R-CNN built on FPN and ResNet101.
Training code for MS COCO
Pre-trained weights for MS COCO
Jupyter notebooks to visualize the detection pipeline at every step
ParallelModel class for multi-GPU training
Evaluation on MS COCO metrics (AP)
Example of training on your own dataset

The code is documented and designed to be easy to extend. If you use it in your research, please consider citing this repository (bibtex below). If you work on 3D vision, you might find our recently released Matterport3D dataset useful as well. This dataset was created from 3D-reconstructed spaces captured by our customers who agreed to make them publicly available for academic use. You can see more examples here.

Getting Started

demo.ipynb Is the easiest way to start. It shows an example of using a model pre-trained on MS COCO to segment objects in your own images. It includes code to run object detection and instance segmentation on arbitrary images.
train_shapes.ipynb shows how to train Mask R-CNN on your own dataset. This notebook introduces a toy dataset (Shapes) to demonstrate training on a new dataset.
(model.py, utils.py, config.py): These files contain the main Mask RCNN implementation.
inspect_data.ipynb. This notebook visualizes the different pre-processing steps to prepare the training data.
inspect_model.ipynb This notebook goes in depth into the steps performed to detect and segment objects. It provides visualizations of every step of the pipeline.
inspect_weights.ipynb This notebooks inspects the weights of a trained model and looks for anomalies and odd patterns.

Step by Step Detection

To help with debugging and understanding the model, there are 3 notebooks (inspect_data.ipynb, inspect_model.ipynb, inspect_weights.ipynb) that provide a lot of visualizations and allow running the model step by step to inspect the output at each point. Here are a few examples:

1. Anchor sorting and filtering

Visualizes every step of the first stage Region Proposal Network and displays positive and negative anchors along with anchor box refinement.

2. Bounding Box Refinement

This is an example of final detection boxes (dotted lines) and the refinement applied to them (solid lines) in the second stage.

3. Mask Generation

Examples of generated masks. These then get scaled and placed on the image in the right location.

4.Layer activations

Often it's useful to inspect the activations at different layers to look for signs of trouble (all zeros or random noise).

5. Weight Histograms

Another useful debugging tool is to inspect the weight histograms. These are included in the inspect_weights.ipynb notebook.

6. Logging to TensorBoard

TensorBoard is another great debugging and visualization tool. The model is configured to log losses and save weights at the end of every epoch.

6. Composing the different pieces into a final result

Training on MS COCO

We're providing pre-trained weights for MS COCO to make it easier to start. You can use those weights as a starting point to train your own variation on the network. Training and evaluation code is in samples/coco/coco.py. You can import this module in Jupyter notebook (see the provided notebooks for examples) or you can run it directly from the command line as such:

# Train a new model starting from pre-trained COCO weights
python3 samples/coco/coco.py train --dataset=/path/to/coco/ --model=coco

# Train a new model starting from ImageNet weights
python3 samples/coco/coco.py train --dataset=/path/to/coco/ --model=imagenet

# Continue training a model that you had trained earlier
python3 samples/coco/coco.py train --dataset=/path/to/coco/ --model=/path/to/weights.h5

# Continue training the last model you trained. This will find
# the last trained weights in the model directory.
python3 samples/coco/coco.py train --dataset=/path/to/coco/ --model=last

You can also run the COCO evaluation code with:

# Run COCO evaluation on the last trained model
python3 samples/coco/coco.py evaluate --dataset=/path/to/coco/ --model=last

The training schedule, learning rate, and other parameters should be set in samples/coco/coco.py.

Training on Your Own Dataset

Start by reading this blog post about the balloon color splash sample. It covers the process starting from annotating images to training to using the results in a sample application.

In summary, to train the model on your own dataset you'll need to extend two classes:

Config This class contains the default configuration. Subclass it and modify the attributes you need to change.

Dataset This class provides a consistent way to work with any dataset. It allows you to use new datasets for training without having to change the code of the model. It also supports loading multiple datasets at the same time, which is useful if the objects you want to detect are not all available in one dataset.

See examples in samples/shapes/train_shapes.ipynb, samples/coco/coco.py, samples/balloon/balloon.py, and samples/nucleus/nucleus.py.

Differences from the Official Paper

This implementation follows the Mask RCNN paper for the most part, but there are a few cases where we deviated in favor of code simplicity and generalization. These are some of the differences we're aware of. If you encounter other differences, please do let us know.

Image Resizing: To support training multiple images per batch we resize all images to the same size. For example, 1024x1024px on MS COCO. We preserve the aspect ratio, so if an image is not square we pad it with zeros. In the paper the resizing is done such that the smallest side is 800px and the largest is trimmed at 1000px.
Bounding Boxes: Some datasets provide bounding boxes and some provide masks only. To support training on multiple datasets we opted to ignore the bounding boxes that come with the dataset and generate them on the fly instead. We pick the smallest box that encapsulates all the pixels of the mask as the bounding box. This simplifies the implementation and also makes it easy to apply image augmentations that would otherwise be harder to apply to bounding boxes, such as image rotation.

To validate this approach, we compared our computed bounding boxes to those provided by the COCO dataset. We found that ~2% of bounding boxes differed by 1px or more, ~0.05% differed by 5px or more, and only 0.01% differed by 10px or more.
Learning Rate: The paper uses a learning rate of 0.02, but we found that to be too high, and often causes the weights to explode, especially when using a small batch size. It might be related to differences between how Caffe and TensorFlow compute gradients (sum vs mean across batches and GPUs). Or, maybe the official model uses gradient clipping to avoid this issue. We do use gradient clipping, but don't set it too aggressively. We found that smaller learning rates converge faster anyway so we go with that.

Citation

Use this bibtex to cite this repository:

@misc{matterport_maskrcnn_2017,
  title={Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow},
  author={Waleed Abdulla},
  year={2017},
  publisher={Github},
  journal={GitHub repository},
  howpublished={\url{https://github.com/matterport/Mask_RCNN}},
}

Contributing

Contributions to this repository are welcome. Examples of things you can contribute:

Speed Improvements. Like re-writing some Python code in TensorFlow or Cython.
Training on other datasets.
Accuracy Improvements.
Visualizations and examples.

You can also join our team and help us build even more projects like this one.

Requirements

Python 3.4, TensorFlow 1.3, Keras 2.0.8 and other common packages listed in requirements.txt.

MS COCO Requirements:

To train or test on MS COCO, you'll also need:

pycocotools (installation instructions below)
MS COCO Dataset
Download the 5K minival and the 35K validation-minus-minival subsets. More details in the original Faster R-CNN implementation.

If you use Docker, the code has been verified to work on this Docker container.

Installation

Clone this repository
Install dependencies
```
pip3 install -r requirements.txt
```
Run setup from the repository root directory
```
python3 setup.py install
```
Download pre-trained COCO weights (mask_rcnn_coco.h5) from the releases page.
(Optional) To train or test on MS COCO install pycocotools from one of these repos. They are forks of the original pycocotools with fixes for Python3 and Windows (the official repo doesn't seem to be active anymore).
- Linux: https://github.com/waleedka/coco
- Windows: https://github.com/philferriere/cocoapi. You must have the Visual C++ 2015 build tools on your path (see the repo for additional details)

Projects Using this Model

If you extend this model to other datasets or build projects that use it, we'd love to hear from you.

4K Video Demo by Karol Majek.

Images to OSM: Improve OpenStreetMap by adding baseball, soccer, tennis, football, and basketball fields.

Splash of Color. A blog post explaining how to train this model from scratch and use it to implement a color splash effect.

Segmenting Nuclei in Microscopy Images. Built for the 2018 Data Science Bowl

Code is in the samples/nucleus directory.

Detection and Segmentation for Surgery Robots by the NUS Control & Mechatronics Lab.

Reconstructing 3D buildings from aerial LiDAR

A proof of concept project by Esri, in collaboration with Nvidia and Miami-Dade County. Along with a great write up and code by Dmitry Kudinov, Daniel Hedges, and Omar Maher.

Usiigaci: Label-free Cell Tracking in Phase Contrast Microscopy

A project from Japan to automatically track cells in a microfluidics platform. Paper is pending, but the source code is released.

Characterization of Arctic Ice-Wedge Polygons in Very High Spatial Resolution Aerial Imagery

Research project to understand the complex processes between degradations in the Arctic and climate change. By Weixing Zhang, Chandi Witharana, Anna Liljedahl, and Mikhail Kanevskiy.

Mask-RCNN Shiny

A computer vision class project by HU Shiyu to apply the color pop effect on people with beautiful results.

Mapping Challenge: Convert satellite imagery to maps for use by humanitarian organisations.

GRASS GIS Addon to generate vector masks from geospatial imagery. Based on a Master's thesis by Ondřej Pešek.

mask_rcnn's People

Contributors

Stargazers

Watchers

Forkers

richmondwang chipmonkey propellingbits caomw johndpope ghosthamlet zhaoyang1708 dantodor madongyu codeaudit avidoggy starstylesky sudhir512kj hbcbh1999 kastnerkyle shafiahmed bityangke researchase manashty lukekulik zuowang tvkpz ikishorek niudong wangsujeon jeffreyyihuang rotblauer phecy sumitbinnani abunaser71 jjhartmann ricefryegg zijundeng deshpandeshrinath 19ai bygreencn johnmwashuma timesofbadri jimmyyfeng talumbau amos-zq baiyancheng20 liu-hai-yang davijo robin-- wcastil toast707 zengjianping locosoft1986 ominux irwingd likeucode redeipirati ml-kraft karolmajek prasadseemakurthi shravankumar147 security-geeks jannick-v gzpan mmourafiq pannous ideaplexus jgraving rosssong prashantpathakdb ilkarman xshhhm philippslang hoojaoh barbecacov himanshu141 ak3ra mehrdad-shokri salehe-e marclachapelle kokorzyc wanjinchang x-itec gakarak deepb1t bdholt1 adarbinyan cheneason morpheus3000 saadullahakram wsz912 morganwang010 dnvyadav owenlnut vladpaunescu raghavendrajain rtao danzschulman sixerwang jarlene lesaffrea tailhq hhy5277 pyleonard

mask_rcnn's Issues

Have anyone trained on Pascal Voc dataset ?

Detection result is error when input batch size > 1

I use 'model.detect(image_batch, verbose=1)' to detect images, but I find the result is error.
The results of '1.jpg' and '10.jpg' are same.
I check the code and find that the value of input is ok but the value of detections[0] and detections[1] are duplicate.

code in model.py

detections, mrcnn_class, mrcnn_bbox, mrcnn_mask, \
        rois, rpn_class, rpn_bbox =\
    self.keras_model.predict([molded_images, image_metas], verbose=0)

my test code

image_batch = []
image_name_batch = []

image_path_1 = './1.jpg'
image_batch.append(skimage.io.imread(image_path_1))
image_path_10 = './10.jpg'
image_batch.append(skimage.io.imread(image_path_10))

results = model.detect(image_batch, verbose=1)
for j in range(len(image_batch)):
    r = results[j]
    visualize.display_instances(image_batch[j], r['rois'], r['masks'], r['class_ids'],
                      coco_class_names, r['scores'],title=image_name_batch[j])

Could you share the trained model for MS COCO?

Could you share the trained model for MS COCO?
Thank you very much

NO file mask_rcnn_coco.h5

When I want to run the demo.ipynb, I can't find the model file mask_rcnn_coco.h5. Wherer can I get the file or I must train the model first?

When running train_shapes.ipynb AttributeError occured as well

Results reproduction

Hi, thanks a lot for publishing this code as open source project.

I trained the net without doing any changes to the code/configuration, and by initializing it with ImageNet weights. After training the weights output is different and gives worse results(but somewhat similar) compared to the one which was provided in the repository.
I have some thoughts on why the train could go wrong, although I’m not sure in my correctness. From the code, it seems to me that the mean is applied at the end of loss functions calculations and results from different GPUs are concatenated during the model parallelization, hence no need learning rate change. I have trained the net with 1 GPU, and as far I understand in such case in each step the net is trained with 2 images, but if the net in trained with 8 GPUs, then in each step the net will be trained with 16 images, and my thought is that in that case the train will be more stable, since the impact of “noise” will be smaller because the gradient direction will be determined by using 16 images instead of 2.
Please correct me if you think I made a wrong conclusions, and do you have any ideas on why the train could go wrong?

Thanks in advance!

Here is an inference result with my trained weights.

COCO minival split

Just a quick question: where can I find the minival set you used in to train with the coco dataset?
Currently I used a minival (5k images) set from here:
https://dl.dropboxusercontent.com/s/o43o90bna78omob/instances_minival2014.json.zip?dl=0
and the remaining 35K are in the following file:
https://dl.dropboxusercontent.com/s/s3tw5zcg7395368/instances_valminusminival2014.json.zip?dl=0

awesome work by the way!

Training the model on Custom Images

Please guide me on how to train the model and weights on custom images.I have set of images .How do I import those images,mask them and train this model.

Suggestion: Add `pycocotools` and `cython` to requirements + upade installation instructions

Hi Waleed,
This is wonderful work and I'm glad you made it available to the rest of us.
The two small edits to README.md might make it easier to experiment with your package:

## Requirements
* Python 3.4+
* TensorFlow 1.3+
* Keras 2.0.8+
* Jupyter Notebook
* Numpy, skimage, scipy
* pycocotools, cython

If you use Docker, the model has been verified to work on
[this Docker container](https://hub.docker.com/r/waleedka/modern-deep-learning/).

The package `pycocotools` requires `cython` and a C compiler to install correctly. See below for further instructions.

## Installation
1. Clone this repository
2. Download pre-trained COCO weights from the releases section of this repository.
3. Installing `pycocotools` as follows:
    - On Linux, run `pip install git+https://github.com/waleedka/cocoapi.git#egg=pycocotools&subdirectory=PythonAPI`
    - On Windows, run `pip install git+https://github.com/philferriere/cocoapi.git#egg=pycocotools^&subdirectory=PythonAPI`

Note that on Windows, for the avove to work, you must have the Visual C++ 2015 build tools on your path (see [this coco clone](https://github.com/philferriere/cocoapi) for additional details).

Let me know it you'd like me to PR this.
Thanks again!

The detection speed reduce when batch size increasing

I test the detection speed with different batch_size, and it's surprising that the detection speed reduce when batch size increasing.
Then I check my device , and find it's OK.
Finally, I find batch_slice's annotation in utils.py.

Batch Slicing
Some custom layers support a batch size of 1 only, and require a lot of work
to support batches greater than 1. This function slices an input tensor
across the batch dimension and feeds batches of size 1. Effectively,
an easy way to support batches > 1 quickly with little code modification.
In the long run, it's more efficient to modify the code to support large
batches and getting rid of this function. Consider this a temporary solution

Although I have changed the code in DetectionLayer, it's the fact that the model support batch size > 1 superficially.

And I also find a bug, if we feed images of which the length is smaller than config.BATCH_SIZE to the function detect(self, images, verbose=0), program will raise exception.
I add "assert" to avoid it.

assert len(images) == self.config.BATCH_SIZE, "len(images) must be equal to BATCH_SIZE"

Suggest

1.2159 exclude_ix = np.where((boxes[:, 2] - boxes[:, 0]) * (boxes[:, 2] - boxes[:, 0]) <= 0)[0] --line 2159 in function unmold_detections(self, detections, mrcnn_mask, image_shape, window) the bboxes area calculations was misspelt.
2. Also in this function the filtering out zero area process is better placed after the translation coordinates backing to image domain process for the reason that the latter process may still cause zero area.

test issue when run "build_rpn_model"

Data Generator - reduction operation minimum which has no identity

I'm playing around with the shapes data-set; I have increased the settings by 4:

IMAGE_MIN_DIM = 4*128
IMAGE_MAX_DIM = 4*128
RPN_ANCHOR_SCALES = (4*8, 4*16, 4*32, 4*64, 4*128)  # anchor side in pixels
TRAIN_ROIS_PER_IMAGE = 4*32

And also generate 100x more samples. However I get errors like this that stop execution:

ValueError Traceback (most recent call last)
in ()
6 learning_rate=config.LEARNING_RATE,
7 epochs=1,
----> 8 layers='heads')
/home/iliauk/Mask_RCNN/model.py in train(self, train_dataset, val_dataset, learning_rate, epochs, layers)
2072 "steps_per_epoch": self.config.STEPS_PER_EPOCH,
2073 "callbacks": callbacks,
-> 2074 "validation_data": next(val_generator),
2075 "validation_steps": self.config.VALIDATION_STPES,
2076 "max_queue_size": 100,
/home/iliauk/Mask_RCNN/model.py in data_generator(dataset, config, shuffle, augment, random_rois, batch_size, detection_targets)
1525 image_id = image_ids[image_index]
1526 image, image_meta, gt_boxes, gt_masks =
-> 1527 load_image_gt(dataset, config, image_id, augment=augment, use_mini_mask=config.USE_MINI_MASK)
1528
1529 # Skip images that have no instances. This can happen in cases
/home/iliauk/Mask_RCNN/model.py in load_image_gt(dataset, config, image_id, augment, use_mini_mask)
1150 # Resize masks to smaller size to reduce memory usage
1151 if use_mini_mask:
-> 1152 mask = utils.minimize_mask(bbox, mask, config.MINI_MASK_SHAPE)
1153
1154 # Image meta data
/home/iliauk/Mask_RCNN/utils.py in minimize_mask(bbox, mask, mini_shape)
433 y1, x1, y2, x2 = bbox[i][:4]
434 m = m[y1:y2, x1:x2]
--> 435 m = scipy.misc.imresize(m.astype(float), mini_shape, interp='bilinear')
436 mini_mask[:, :, i] = np.where(m >= 128, 1, 0)
437 return mini_mask
/anaconda/envs/py35/lib/python3.5/site-packages/numpy/lib/utils.py in newfunc(*args, **kwds)
99 """arrayrange is deprecated, use arange instead!"""
100 warnings.warn(depdoc, DeprecationWarning, stacklevel=2)
--> 101 return func(*args, **kwds)
102
103 newfunc = _set_function_name(newfunc, old_name)
/anaconda/envs/py35/lib/python3.5/site-packages/scipy/misc/pilutil.py in imresize(arr, size, interp, mode)
552
553 """
--> 554 im = toimage(arr, mode=mode)
555 ts = type(size)
556 if issubdtype(ts, numpy.signedinteger):
/anaconda/envs/py35/lib/python3.5/site-packages/numpy/lib/utils.py in newfunc(*args, **kwds)
99 """arrayrange is deprecated, use arange instead!"""
100 warnings.warn(depdoc, DeprecationWarning, stacklevel=2)
--> 101 return func(*args, **kwds)
102
103 newfunc = _set_function_name(newfunc, old_name)
/anaconda/envs/py35/lib/python3.5/site-packages/scipy/misc/pilutil.py in toimage(arr, high, low, cmin, cmax, pal, mode, channel_axis)
334 if mode in [None, 'L', 'P']:
335 bytedata = bytescale(data, high=high, low=low,
--> 336 cmin=cmin, cmax=cmax)
337 image = Image.frombytes('L', shape, bytedata.tostring())
338 if pal is not None:
/anaconda/envs/py35/lib/python3.5/site-packages/numpy/lib/utils.py in newfunc(*args, **kwds)
99 """arrayrange is deprecated, use arange instead!"""
100 warnings.warn(depdoc, DeprecationWarning, stacklevel=2)
--> 101 return func(*args, **kwds)
102
103 newfunc = _set_function_name(newfunc, old_name)
/anaconda/envs/py35/lib/python3.5/site-packages/scipy/misc/pilutil.py in bytescale(data, cmin, cmax, high, low)
91
92 if cmin is None:
---> 93 cmin = data.min()
94 if cmax is None:
95 cmax = data.max()
/anaconda/envs/py35/lib/python3.5/site-packages/numpy/core/_methods.py in _amin(a, axis, out, keepdims)
27
28 def _amin(a, axis=None, out=None, keepdims=False):
---> 29 return umr_minimum(a, axis, None, out, keepdims)
30
31 def _sum(a, axis=None, dtype=None, out=None, keepdims=False):
ValueError: zero-size array to reduction operation minimum which has no identity

Training Policy Advice?

Hi, I want to train my own dataset consisting of 10 classes and 30000 images with segmentation annotation. Can you give me some advice on how many epoches for each training stage and how much time it may take to train on two TITAN X GPU? Thanks a lot!

The implement and effect of 'display_instances' and 'display_detections' are almost same.

Why are there two functions 'display_instances' and 'display_detections' ?
They are almost the same, and I don't find the code in where 'display_detections' is used.

Pre trained weights

The README mentions pre trained weights trained on the COCO dataset. However it doesn't seem to be there in the repo! Could you please commit/link it, it would be really useful.

Train on my own numpy dataset

Hi,
Thanks for the great work!

I want to train Mask RCNN on my own dataset (numpy array of images and masks, or two folder for images and masks). Could you help me how can I do that?
I've checked COCO dataset and Shapes dataset codes, but I couldn't understand how Dataset class actually works.

ImportError: No module named 'pycocotools'

Thanks a lot
I am having the following error
"ImportError: No module named 'pycocotools'"

Can you please advice me how to install it as I could not find it on pip or conda?

I tried the installation from https://github.com/cocodataset/cocoapi but did not work too

Where to download the "mask_rcnn_coco.h5" file?

Thanks for the great work!

In "demo.ipynb", It reminded can be found 'mask_rcnn_coco.h5' in the README file. BUT, I can't find.

How much gpu memory required for inference?

I have a gtx 780 card which have 3Gb of memory, but when runing the demo example, an error ocurred :"W tensorflow/core/common_runtime/bfc_allocator.cc:217] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.07GiB."

so how much gpu memory required to run an inference example ?

How to load resnet50?

Hi @waleedka ,

I have a less powerful GPU and want to train my own data which contain only 3 classes. Therefore I would use Resnet50 to speed up the model. How can I switch to Resnet50? Thanks for your help.

Wings, limbs and holes

The model seems to have problems with wings, limbs and holes. Is there theoretical awareness of this and proposed countermeasures?

Do you report MSCOCO performance?

Do I need to resize mask after resizing image?

Hi,I made some masks for some 2048*2048 images.Then I found it's too big for my memory.So I change the IMAGE_MAX_DIM in config .But I couldn't find anywhere to change the size of the masks.Will the size of the masks change automatically?

not work

i want to train on my own dataset. the image size is 1024*1024 and with a Titan xp GPU (12G). when i run the script . it wait .
Configurations:

BACKBONE_SHAPES                [[256 256]
 [128 128]
 [ 64  64]
 [ 32  32]
 [ 16  16]]
BACKBONE_STRIDES               [4, 8, 16, 32, 64]
BATCH_SIZE                     1
BBOX_STD_DEV                   [ 0.1  0.1  0.2  0.2]
DETECTION_MAX_INSTANCES        100
DETECTION_MIN_CONFIDENCE       0.7
DETECTION_NMS_THRESHOLD        0.3
GPU_COUNT                      1
IMAGES_PER_GPU                 1
IMAGE_MAX_DIM                  1024
IMAGE_MIN_DIM                  1024
IMAGE_PADDING                  True
IMAGE_SHAPE                    [1024 1024    3]
LEARNING_MOMENTUM              0.9
LEARNING_RATE                  0.002
MASK_POOL_SIZE                 14
MASK_SHAPE                     [28, 28]
MAX_GT_INSTANCES               100
MEAN_PIXEL                     [ 96.  96.  96.]
MINI_MASK_SHAPE                (56, 56)
NAME                           guidewire
NUM_CLASSES                    2
POOL_SIZE                      7
POST_NMS_ROIS_INFERENCE        1000
POST_NMS_ROIS_TRAINING         2000
ROI_POSITIVE_RATIO             0.33
RPN_ANCHOR_RATIOS              [0.5, 1, 2]
RPN_ANCHOR_SCALES              (32, 64, 128, 256)
RPN_ANCHOR_STRIDE              2
RPN_BBOX_STD_DEV               [ 0.1  0.1  0.2  0.2]
RPN_TRAIN_ANCHORS_PER_IMAGE    256
STEPS_PER_EPOCH                100
TRAIN_ROIS_PER_IMAGE           32
USE_MINI_MASK                  False
USE_RPN_ROIS                   True
VALIDATION_STEPS               1
WEIGHT_DECAY                   0.0001


2017-11-12 12:59:57.096834: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: TITAN Xp, pci bus id: 0000:0a:00.0, compute capability: 6.1)

mrcnn_mask_conv1       (TimeDistributed)
mrcnn_mask_bn1         (TimeDistributed)
mrcnn_mask_conv2       (TimeDistributed)
mrcnn_class_conv1      (TimeDistributed)
mrcnn_mask_bn2         (TimeDistributed)
mrcnn_class_bn1        (TimeDistributed)
mrcnn_mask_conv3       (TimeDistributed)
mrcnn_mask_bn3         (TimeDistributed)
mrcnn_class_conv2      (TimeDistributed)
mrcnn_class_bn2        (TimeDistributed)
mrcnn_mask_conv4       (TimeDistributed)
mrcnn_mask_bn4         (TimeDistributed)
mrcnn_bbox_fc          (TimeDistributed)
mrcnn_mask_deconv      (TimeDistributed)
mrcnn_class_logits     (TimeDistributed)
mrcnn_mask             (TimeDistributed)
/home/wuyudong/anaconda3/envs/python34/lib/python3.4/site-packages/tensorflow/python/ops/gradients_impl.py:96: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
/home/wuyudong/anaconda3/envs/python34/lib/python3.4/site-packages/keras/engine/training.py:2022: UserWarning: Using a generator with use_multiprocessing=True and multiple workers may duplicate your data. Please consider using the keras.utils.Sequence class.
  UserWarning(Using a generator with use_multiprocessing=True
Epoch 1/100`

when i use Ctrl+c it out_put:
Traceback (most recent call last): File "train_own_data.py", line 252, in <module> train_own_data() File "train_own_data.py", line 189, in train_own_data layers="all") File "/home/wuyudong/Project/scripts/Mask_RCNN/model.py", line 2088, in train **fit_kwargs File "/home/wuyudong/anaconda3/envs/python34/lib/python3.4/site-packages/keras/legacy/interfaces.py", line 87, in wrapper return func(*args, **kwargs) File "/home/wuyudong/anaconda3/envs/python34/lib/python3.4/site-packages/keras/engine/training.py", line 2046, in fit_generator generator_output = next(output_generator) File "/home/wuyudong/anaconda3/envs/python34/lib/python3.4/site-packages/keras/utils/data_utils.py", line 661, in get time.sleep(self.wait_time) KeyboardInterrupt
when i set the image size to 256*256 it works
what should i do

The loss of Step 2

During my training process, the loss of step 1 suddenly jumps from a low value (like 2.1) to a high value (like 13) in step 2. is that a normal situation?

KL.UpSampling2D()

You used KL.UpSampling2D() to build the FPN network. Did you mean un-pooling or deconv?

Because when I looked into keras' document, it says this function is to repeats the rows and columns. (more like un-pooling). See from this: https://keras.io/layers/convolutional/#upsampling2d

But it seems like this process should be more like deconv. Am I wrong about understanding FPN ?

Shape mismatch error

Hi @waleedka ,

Thanks a lot for an awesome repository. You've done good work.
I'm trying to trying to trigger a training on own dataset which is of Coco dataset type, but less number of classes(only 5).

When I changed the the number classes to 5+1 in the coco.py and trigger a training using Coco pretrained model, I'm getting the shape mismatch error.

Does this because the pretrained model is of 81 classes Or have I done any mistake? Please correct me if I'm wrong.

So does this mean, I need to start a new training from scratch for 5 classes?

Regards,
Sharath

model.py's datagenerator() may not be threadsafe

I haven't been able to run train_shapes.ipynb to completion perhaps because of threading issues. Training the head branches fails with the following output:

Starting at epoch 0. LR=0.002

Checkpoint Path: E:\repos\Mask_RCNN.wip\logs\shapes20171102T1726\mask_rcnn_shapes_{epoch:04d}.h5
Selecting layers to train
fpn_c5p5               (Conv2D)
fpn_c4p4               (Conv2D)
fpn_c3p3               (Conv2D)
fpn_c2p2               (Conv2D)
fpn_p5                 (Conv2D)
fpn_p2                 (Conv2D)
fpn_p3                 (Conv2D)
fpn_p4                 (Conv2D)
In model:  rpn_model
    rpn_conv_shared        (Conv2D)
    rpn_class_raw          (Conv2D)
    rpn_bbox_pred          (Conv2D)
mrcnn_mask_conv1       (TimeDistributed)
mrcnn_mask_bn1         (TimeDistributed)
mrcnn_mask_conv2       (TimeDistributed)
mrcnn_class_conv1      (TimeDistributed)
mrcnn_mask_bn2         (TimeDistributed)
mrcnn_class_bn1        (TimeDistributed)
mrcnn_mask_conv3       (TimeDistributed)
mrcnn_mask_bn3         (TimeDistributed)
mrcnn_class_conv2      (TimeDistributed)
mrcnn_class_bn2        (TimeDistributed)
mrcnn_mask_conv4       (TimeDistributed)
mrcnn_mask_bn4         (TimeDistributed)
mrcnn_bbox_fc          (TimeDistributed)
mrcnn_mask_deconv      (TimeDistributed)
mrcnn_class_logits     (TimeDistributed)
mrcnn_mask             (TimeDistributed)
e:\toolkits.win\anaconda3-4.4.0\envs\dlwin36coco\lib\site-packages\tensorflow\python\ops\gradients_impl.py:95: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
e:\toolkits.win\anaconda3-4.4.0\envs\dlwin36coco\lib\site-packages\keras\engine\training.py:1987: UserWarning: Using a generator with `use_multiprocessing=True` and multiple workers may duplicate your data. Please consider using the`keras.utils.Sequence class.
  UserWarning('Using a generator with `use_multiprocessing=True`'
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-11-2101606c7d8e> in <module>()
      6             learning_rate=config.LEARNING_RATE,
      7             epochs=1,
----> 8             layers='heads')

E:\repos\Mask_RCNN.wip\model.py in train(self, train_dataset, val_dataset, learning_rate, epochs, layers)
   2089             initial_epoch=self.epoch,
   2090             epochs=epochs,
-> 2091             **fit_kwargs
   2092             )
   2093         self.epoch = max(self.epoch, epochs)

e:\toolkits.win\anaconda3-4.4.0\envs\dlwin36coco\lib\site-packages\keras\legacy\interfaces.py in wrapper(*args, **kwargs)
     85                 warnings.warn('Update your `' + object_name +
     86                               '` call to the Keras 2 API: ' + signature, stacklevel=2)
---> 87             return func(*args, **kwargs)
     88         wrapper._original_function = func
     89         return wrapper

e:\toolkits.win\anaconda3-4.4.0\envs\dlwin36coco\lib\site-packages\keras\engine\training.py in fit_generator(self, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch)
   2000                                              use_multiprocessing=use_multiprocessing,
   2001                                              wait_time=wait_time)
-> 2002             enqueuer.start(workers=workers, max_queue_size=max_queue_size)
   2003             output_generator = enqueuer.get()
   2004 

e:\toolkits.win\anaconda3-4.4.0\envs\dlwin36coco\lib\site-packages\keras\utils\data_utils.py in start(self, workers, max_queue_size)
    594                     thread = threading.Thread(target=data_generator_task)
    595                 self._threads.append(thread)
--> 596                 thread.start()
    597         except:
    598             self.stop()

e:\toolkits.win\anaconda3-4.4.0\envs\dlwin36coco\lib\multiprocessing\process.py in start(self)
    103                'daemonic processes are not allowed to have children'
    104         _cleanup()
--> 105         self._popen = self._Popen(self)
    106         self._sentinel = self._popen.sentinel
    107         _children.add(self)

e:\toolkits.win\anaconda3-4.4.0\envs\dlwin36coco\lib\multiprocessing\context.py in _Popen(process_obj)
    221     @staticmethod
    222     def _Popen(process_obj):
--> 223         return _default_context.get_context().Process._Popen(process_obj)
    224 
    225 class DefaultContext(BaseContext):

e:\toolkits.win\anaconda3-4.4.0\envs\dlwin36coco\lib\multiprocessing\context.py in _Popen(process_obj)
    320         def _Popen(process_obj):
    321             from .popen_spawn_win32 import Popen
--> 322             return Popen(process_obj)
    323 
    324     class SpawnContext(BaseContext):

e:\toolkits.win\anaconda3-4.4.0\envs\dlwin36coco\lib\multiprocessing\popen_spawn_win32.py in __init__(self, process_obj)
     63             try:
     64                 reduction.dump(prep_data, to_child)
---> 65                 reduction.dump(process_obj, to_child)
     66             finally:
     67                 set_spawning_popen(None)

e:\toolkits.win\anaconda3-4.4.0\envs\dlwin36coco\lib\multiprocessing\reduction.py in dump(obj, file, protocol)
     58 def dump(obj, file, protocol=None):
     59     '''Replacement for pickle.dump() using ForkingPickler.'''
---> 60     ForkingPickler(file, protocol).dump(obj)
     61 
     62 #

AttributeError: Can't pickle local object 'GeneratorEnqueuer.start.<locals>.data_generator_task'

I have also tried to modify fit_generator()'s param as follows:

        # Common parameters to pass to fit_generator()
        fit_kwargs = {
            "steps_per_epoch": self.config.STEPS_PER_EPOCH,
            "callbacks": callbacks,
            "validation_data": next(val_generator),
            "validation_steps": self.config.VALIDATION_STPES,
            "max_queue_size": 100,
            "workers": 1,  # Phil: was max(self.config.BATCH_SIZE // 2, 2),
            "use_multiprocessing": False # Phil: was "use_multiprocessing": True,
        }

Unfortunatly, that also results in a crash with the jupyter notebook crashing without any output...

It could well be that the generator is threadsafe. After a quick perusal, however, I haven't found any serializing code anywhere. Threadsafe data generators usually implement some kind of locking mechanism. Here are examples that are threadsafe: keras-team/keras#1638 (comment) and http://anandology.com/blog/using-iterators-and-generators/

Here's a bit of information about my own GPU config:

(from notebook):

os: nt
sys: 3.6.1 |Continuum Analytics, Inc.| (default, May 11 2017, 13:25:24) [MSC v.1900 64 bit (AMD64)]
numpy: 1.13.3, e:\toolkits.win\anaconda3-4.4.0\envs\dlwin36coco\lib\site-packages\numpy\__init__.py
matplotlib: 2.0.2, e:\toolkits.win\anaconda3-4.4.0\envs\dlwin36coco\lib\site-packages\matplotlib\__init__.py
cv2: 3.3.0, e:\toolkits.win\anaconda3-4.4.0\envs\dlwin36coco\lib\site-packages\cv2.cp36-win_amd64.pyd
tensorflow: 1.3.0, e:\toolkits.win\anaconda3-4.4.0\envs\dlwin36coco\lib\site-packages\tensorflow\__init__.py
keras: 2.0.8, e:\toolkits.win\anaconda3-4.4.0\envs\dlwin36coco\lib\site-packages\keras\__init__.py

(from jupyter notebook log):

2017-11-02 16:59:50.162182: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:955] Found device 0 with properties:
name: GeForce GTX TITAN X
major: 5 minor: 2 memoryClockRate (GHz) 1.076
pciBusID 0000:03:00.0
Total memory: 12.00GiB
Free memory: 10.06GiB
2

Has anyone else observed similar issues?

I used the trained model to test an image. There are some errors.

The code is here followed the demo.ipynb. But there are some errros.

import os
import sys
import random
import math
import numpy as np
import scipy.misc
import matplotlib
import matplotlib.pyplot as plt

import coco
import utils
import model as modellib
import visualize

Create model object in inference mode.

model = modellib.MaskRCNN(mode="inference", model_dir='mask_rcnn_coco.h5', config=0)

Load weights trained on MS-COCO

model.load_weights('mask_rcnn_coco.h5', by_name=True)

COCO Class names

Index of the class in the list is its ID. For example, to get ID of

the teddy bear class, use: class_names.index('teddy bear')

class_names = ['BG', 'person', 'bicycle', 'car', 'motorcycle', 'airplane',
'bus', 'train', 'truck', 'boat', 'traffic light',
'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird',
'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant', 'bear',
'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie',
'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball',
'kite', 'baseball bat', 'baseball glove', 'skateboard',
'surfboard', 'tennis racket', 'bottle', 'wine glass', 'cup',
'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple',
'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza',
'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed',
'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote',
'keyboard', 'cell phone', 'microwave', 'oven', 'toaster',
'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors',
'teddy bear', 'hair drier', 'toothbrush']

Load a random image from the images folder

file_names = next(os.walk('/media/wxl/00063DAF000FECC3/Mask_RCNN-master/images'))
image = scipy.misc.imread(os.path.join('/media/wxl/00063DAF000FECC3/Mask_RCNN-master/images', random.choice(file_names)))

Run detection

results = model.detect([image], verbose=1)

Visualize results

r = results[0]
visualize.display_instances(image, r['rois'], r['masks'], r['class_ids'],
class_names, r['scores'])

The errors locate in:
File "/media/wxl/00063DAF000FECC3/Mask_RCNN-master/test.py", line 20, in
model = modellib.MaskRCNN(mode="inference", model_dir='mask_rcnn_coco.h5', config=0)
I think the model is load incorrect. But I have no idea about the mistakes. Thanks for your helps

ValueError: The channel dimension of the inputs should be defined. Found `None`. in demo.ipynb

Hey @waleedka

Great work. Now I have man-crush on you.

A little PSA. If anyone is getting this error:

ValueError: The channel dimension of the inputs should be defined. Found None.

you're most likely having backend issue ( in ~/.keras/keras.json), as it might be set to Theano instead of TF. And even if it it,s for some reason jupyter notebook is not reading it correctly.

Solution. You can do either

Before this cell is executed:

model = modellib.MaskRCNN(mode="inference", model_dir=MODEL_DIR, config=config)
# Load weights trained on MS-COCO
model.load_weights(COCO_MODEL_PATH, by_name=True)

you can add:

import keras.backend

K = keras.backend.backend()
if K=='tensorflow':
    keras.backend.set_image_dim_ordering('tf')

Stick it in ipython notebook startup file (~/.ipython/profile_default/startup/ ) like this guy

where is the link I can donwload the weights model

Looking for a copy of pretrained ImageNet weights

Hi could anyone kindly tell me where to download the pretrained imagenet model weights...thanks!

Run out of memory

Well, I have a voc-like dataset with 7000 classes. So I use the following config:

    GPU_COUNT = 2
    IMAGES_PER_GPU = 1
    STEPS_PER_EPOCH = 150
    BASE_EPOCH = 10
    NUM_CLASSES = 1 + 7000  # WARNING: This dataset has  7000 classes
    MAX_GT_INSTANCES = 50
    POST_NMS_ROIS_TRAINING = 1000
    POST_NMS_ROIS_INFERENCE = 500
    DETECTION_MAX_INSTANCES = 50

Other config is just as the default config.

And~ I only take less than 15 objects in one image.

I run it on 2 Titan X, each of which has 12 GB memory. But still run out of memory during training:

41/150 [=======>......................] - ETA: 1:12 - loss: 3.6104 - rpn_class_loss: 0.0231 - rpn_bbox_loss: 0.7184 - mrcnn_class_loss: 1.5067 - mrcnn_bbox_loss: 0.6695 - mrcnn_mask_loss: 0.6922
42/150 [=======>......................] - ETA: 1:11 - loss: 3.6236 - rpn_class_loss: 0.0239 - rpn_bbox_loss: 0.7198 - mrcnn_class_loss: 1.5192 - mrcnn_bbox_loss: 0.6680 - mrcnn_mask_loss: 0.6921


2017-11-13 16:05:25.086033: W tensorflow/core/common_runtime/bfc_allocator.cc:273] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.62GiB.  Current allocation summary follows.
2017-11-13 16:05:25.086152: I tensorflow/core/common_runtime/bfc_allocator.cc:627] Bin (256): 	Total Chunks: 276, Chunks in use: 275. 69.0KiB allocated for chunks. 68.8KiB in use in bin. 11.9KiB client-requested in use in bin.
2017-11-13 16:05:25.086173: I tensorflow/core/common_runtime/bfc_allocator.cc:627] Bin (512): 	Total Chunks: 45, Chunks in use: 44. 23.2KiB allocated for chunks. 22.5KiB in use in bin. 22.1KiB client-requested in use in bin.
.....
.....

So, anyone can tell me how to prevent it?

OK~ set TRAIN_ROIS_PER_IMAGE = 32, goes well.

error ：model.load_weights(COCO_MODEL_PATH, by_name=True)

I get an error when running the demon code : model.load_weights(COCO_MODEL_PATH, by_name=True)

IOError: Unable to open file (truncated file: eof = 255370586, sblock->base_addr = 0, stored_eoa = 257558264)

When running train_shapes.ipynb AttributeError occured.

model.train(dataset_train, dataset_val, 
            learning_rate=config.LEARNING_RATE, 
            epochs=1, 
            layers='heads')

AttributeError: Can't pickle local object 'GeneratorEnqueuer.start..data_generator_task'

Can you train for keypoint detection?

Hi @waleedka : Thanks for the great work! Is it possible to train for keypoint detection? Sorry for the wrong title of the issue, I can't correct it.

Performance Improvement

Firstly I want to congratulate you for this amazing work.

In your opinion, where modifications will impact most the performance (frame rate) of the method.

I'm thinking going with a smaller backbone network (for instance, squeeze net). Do you think it's there the performance bottleneck?

How to used it？Any one can help me？

Typo in config name VALIDATION_STPES

Presumably VALIDATION_STPES should be VALIDATION_STEPS instead?

AssertionError in run_graph

I am running testing MASK_RCNN on my local computer and on a remote machine.
In inspect_model everything runs fine locally but on the remote machine I get an assertion error at ### 1.b RPN Predictions:

# Run RPN sub-graph
pillar = model.keras_model.get_layer("ROI").output  # node to start searching from
rpn = model.run_graph([image], [
    ("rpn_class", model.keras_model.get_layer("rpn_class").output),
    ("pre_nms_anchors", model.ancestor(pillar, "ROI/pre_nms_anchors:0")),
    ("refined_anchors", model.ancestor(pillar, "ROI/refined_anchors:0")),
    ("refined_anchors_clipped", model.ancestor(pillar, "ROI/refined_anchors_clipped:0")),
    ("post_nms_anchor_ix", model.ancestor(pillar, "ROI/rpn_non_max_suppression:0")),
    ("proposals", model.keras_model.get_layer("ROI").output),
])

AssertionError            Traceback (most recent call last)
<ipython-input-14-799ca4676404> in <module>()
      7     ("refined_anchors_clipped", model.ancestor(pillar, "ROI/refined_anchors_clipped:0")),
      8     ("post_nms_anchor_ix", model.ancestor(pillar, "ROI/rpn_non_max_suppression:0")),
----> 9     ("proposals", model.keras_model.get_layer("ROI").output),
     10 ])

/home/orestisz/repositories/Mask_RCNN/model.py in run_graph(self, images, outputs)
   2296         for o in outputs.values():
   2297             print(o)
-> 2298             assert o is not None
   2299 
   2300         # Build a Keras function to run parts of the computation graph

AssertionError:

when printing the outputs:

for o in outputs.values():
            print(o)
            assert o is not None

I get the following output locally:

Tensor("rpn_class/concat:0", shape=(?, ?, 2), dtype=float32, device=/device:CPU:0)
Tensor("ROI/pre_nms_anchors:0", shape=(1, 10000, 4), dtype=float32, device=/device:CPU:0)
Tensor("ROI/refined_anchors:0", shape=(1, 10000, 4), dtype=float32, device=/device:CPU:0)
Tensor("ROI/refined_anchors_clipped:0", shape=(1, 10000, 4), dtype=float32, device=/device:CPU:0)
Tensor("ROI/rpn_non_max_suppression:0", shape=(?,), dtype=int32, device=/device:CPU:0)
Tensor("ROI/packed_2:0", shape=(1, ?, 4), dtype=float32, device=/device:CPU:0)

and the following remotely:

Tensor("rpn_class/concat:0", shape=(?, ?, 2), dtype=float32, device=/device:CPU:0)
Tensor("ROI/pre_nms_anchors:0", shape=(1, 10000, 4), dtype=float32, device=/device:CPU:0)
Tensor("ROI/refined_anchors:0", shape=(1, 10000, 4), dtype=float32, device=/device:CPU:0)
Tensor("ROI/refined_anchors_clipped:0", shape=(1, 10000, 4), dtype=float32, device=/device:CPU:0)
None

So looks like ("post_nms_anchor_ix", model.ancestor(pillar, "ROI/rpn_non_max_suppression:0")) is causing the issue.

Any suggestions? Thanks in advance

scipy.misc has no attribute 'imread' when scipy version is 1.0.0

When I run demo ,there is an error:
scipy.misc has no attribute 'imread'.

I check the code and find that 'imread' is deprecated in SciPy 1.0.0, and will be removed in 1.2.0.
But the fact is that when I use the scipy with version 1.0.0, error occurs.
I chage the scipy version to 1.0.0rc2 and then the demo run ok.

Could you provide the dependence list for the project to guide user preparing the environment.

GPU and Mask_shape

some question about GPU_COUNT and mask_shape:

In the config.py, it says GPU_COUNT : NUMBER of gpus to use, for training on cpu, use 1.so if I want to use a single gpu from multi-gpu machine, what should i set the GPU_COUNT.1 or 2?. I found in the class ParallelModel, the GPU_COUNT represent the real number of gpu without minus 1(which i think 1 for cpu, and 2 for a single gpu....)
Zero volatile GPU-Util but high GPU Memory Usage: most time the volatile GPU-Util is zero but GPU Memory Usage is very high, and the training is slow after i set the USE_MINI_MASK = False, Is this a bug? what should I do? the input size of image is 1024*1024
in config.py have two paras: mask_pool_size and 'mask_shape', however in FCN only have one deconv layer which means the mask_shape = 2*mask_pool_size. so what i should do , if I want a more dense segmentation without resize from 28 * 28 to the Roi size

Train own Dataset by taking actual images

Hi,

Thanks a lot for the awesome repository.

I went to train_shapes file which describes about how to train for our own dataset.

But all the things which you guys are doing over there is by generating randomly. Could explain the same by taking actual images which has ground truth of mask, class and bounding related information.

Regards,
Pirag

Unable to train on multiple GPUs

I am trying to run coco.py on a machine with 8 Tesla P100 GPUs... However, it seems like there is something going wrong when I try to use more than one GPU.
I was able to run the parallel_model.py file on all GPUs without a problem.
The error that is dumped in my terminal is the following one:

Epoch 1/40
2017-11-09 14:49:19.639818: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: Incompatible shapes: [15,4] vs. [30,4]
	 [[Node: rpn_bbox_loss/sub = Sub[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](rpn_bbox_loss/concat, rpn_bbox_loss/GatherNd)]]
2017-11-09 14:49:19.639903: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: Incompatible shapes: [15,4] vs. [30,4]
	 [[Node: rpn_bbox_loss/sub = Sub[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](rpn_bbox_loss/concat, rpn_bbox_loss/GatherNd)]]
2017-11-09 14:49:19.640016: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: Incompatible shapes: [15,4] vs. [30,4]
	 [[Node: rpn_bbox_loss/sub = Sub[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](rpn_bbox_loss/concat, rpn_bbox_loss/GatherNd)]]
2017-11-09 14:49:19.640329: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: Incompatible shapes: [15,4] vs. [30,4]
	 [[Node: rpn_bbox_loss/sub = Sub[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](rpn_bbox_loss/concat, rpn_bbox_loss/GatherNd)]]
2017-11-09 14:49:19.640406: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: Incompatible shapes: [15,4] vs. [30,4]
	 [[Node: rpn_bbox_loss/sub = Sub[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](rpn_bbox_loss/concat, rpn_bbox_loss/GatherNd)]]
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1323, in _do_call
    return fn(*args)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1302, in _run_fn
    status, run_metadata)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/errors_impl.py", line 473, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [15,4] vs. [30,4]
	 [[Node: rpn_bbox_loss/sub = Sub[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](rpn_bbox_loss/concat, rpn_bbox_loss/GatherNd)]]
	 [[Node: proposal_targets/roi_assertion_1/AssertGuard/Assert/Switch/_1627 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_5177_proposal_targets/roi_assertion_1/AssertGuard/Assert/Switch", tensor_type=DT_BOOL, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "coco.py", line 417, in <module>
    layers='heads')
  File "/projects/mask_rcnn/model.py", line 2110, in train
    **fit_kwargs
  File "/usr/local/lib/python3.5/dist-packages/keras/legacy/interfaces.py", line 87, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/keras/engine/training.py", line 2077, in fit_generator
    class_weight=class_weight)
  File "/usr/local/lib/python3.5/dist-packages/keras/engine/training.py", line 1797, in train_on_batch
    outputs = self.train_function(ins)
  File "/usr/local/lib/python3.5/dist-packages/keras/backend/tensorflow_backend.py", line 2332, in __call__
    **self.session_kwargs)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 889, in run
    run_metadata_ptr)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1120, in _run
    feed_dict_tensor, options, run_metadata)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1317, in _do_run
    options, run_metadata)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1336, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [15,4] vs. [30,4]
	 [[Node: rpn_bbox_loss/sub = Sub[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](rpn_bbox_loss/concat, rpn_bbox_loss/GatherNd)]]
	 [[Node: proposal_targets/roi_assertion_1/AssertGuard/Assert/Switch/_1627 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_5177_proposal_targets/roi_assertion_1/AssertGuard/Assert/Switch", tensor_type=DT_BOOL, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Caused by op 'rpn_bbox_loss/sub', defined at:
  File "coco.py", line 365, in <module>
    model_dir=MODEL_DIR)
  File "/projects/mask_rcnn/model.py", line 1646, in __init__
    self.keras_model = self.build(mode=mode, config=config)
  File "/projects/mask_rcnn/model.py", line 1794, in build
    [input_rpn_bbox, input_rpn_match, rpn_bbox])
  File "/usr/local/lib/python3.5/dist-packages/keras/engine/topology.py", line 603, in __call__
    output = self.call(inputs, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/keras/layers/core.py", line 651, in call
    return self.function(inputs, **arguments)
  File "/projects/mask_rcnn/model.py", line 1793, in <lambda>
    rpn_bbox_loss = KL.Lambda(lambda x: rpn_bbox_loss_graph(config, *x), name="rpn_bbox_loss")(
  File "/projects/mask_rcnn/model.py", line 987, in rpn_bbox_loss_graph
    diff = K.abs(target_bbox - rpn_bbox)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/math_ops.py", line 894, in binary_op_wrapper
    return func(x, y, name=name)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/gen_math_ops.py", line 4636, in _sub
    "Sub", x=x, y=y, name=name)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 2956, in create_op
    op_def=op_def)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 1470, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): Incompatible shapes: [15,4] vs. [30,4]
	 [[Node: rpn_bbox_loss/sub = Sub[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](rpn_bbox_loss/concat, rpn_bbox_loss/GatherNd)]]
	 [[Node: proposal_targets/roi_assertion_1/AssertGuard/Assert/Switch/_1627 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_5177_proposal_targets/roi_assertion_1/AssertGuard/Assert/Switch", tensor_type=DT_BOOL, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Is there anyone that has faced a similar error here?
My system is running python 3.5 and the latest keras (2.0.9) and tensorflow (1.4.0) versions.
Thanks!

How to evaluate on a new image?

Hi
first thanks for sharing this great repo.

Now I tried the jupiter notebook but got an error on the first cell (not sure why):

File "coco.py", line 339
    config.print()
               ^
SyntaxError: invalid syntax

then I did comment the line given it is just a print but then I got a ton of more errors;

So back to the command line, but I get the same error...

can you please help to fix or give the steps to properly evaluate on a new image (not COCO data)?

thanks for your help.
Tets

OOM when allocating tensor

First of all I would like to thank the authors for such a great work.
When running the train_shapes notebook, I get the following error message 3 times in the first epoch.

ResourceExhaustedError: OOM when allocating tensor with shape[256,256,28,28]
	 [[Node: mrcnn_mask_deconv/conv2d_transpose = Conv2DBackpropInput[T=DT_FLOAT, data_format="NHWC", padding="VALID", strides=[1, 2, 2, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](mrcnn_mask_deconv/stack, mrcnn_mask_deconv/kernel/read, mrcnn_mask_deconv/Reshape)]]
	 [[Node: Mean_9/_5017 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_17513_Mean_9", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

I guess my humble 2GB GPU is not enough... how could I solve this? I do not see any batch size option implemented, is there any at all? Do you think that would that solve my issue? I would appreciate all kind of suggestions. Thanks.

Class imbalance for RPN

Hi,

For my master thesis I am working on the topic of text detection in images and video frames. I implemented a modified version of Faster R-CNN/Mask R-CNN, which is very similar to your implementation, but tailored for text detection.

On average there are ~54 positive text anchors in the images I use, which result in a mini batch of ~54 positive and ~200 negative examples per image ( I use mini batches of size 254 per image). The problem I encounter is that my network overfits on only negative prediction (because on average, there are many more negative than positive examples). Now a few simple solutions would be to 1) take a smaller mini batch size(for example 112), 2) to remove all the images in which there are not enough positive examples from the data set or 3) use a weighted loss function.

I inspected your code very carefully, but (as far as I can see) in your implementation this doesn't seem to be an issue. Was this positive/negative class imbalance also a problem in your implementation, and if so, how did you solve this problem?

Thanks!

Maurits

ValueError: zero-size array to reduction operation minimum which has no identity

Hi~ I am training Mask RCNN on my dataset. During training, an error occured:

63/150 [===========>..................] - ETA: 1:05 - loss: 5.5540 - rpn_class_loss: 0.1711 - rpn_bbox_loss: 1.3187 - mrcnn_class_loss: 1.6656 - mrcnn_bbox_loss: 0.8899 - mrcnn_mask_loss: 0.6008
ERROR:root:Error processing image {'source': 'suncg', 'path': '/path/to/mlt/8a33bca7ed13c8d2698303625feba21a/000005.png', 'obj_mask_path': '/path/to/node/8a33bca7ed13c8d2698303625feba21a/000005.png', 'cls_mask_path': '/path/to/category/8a33bca7ed13c8d2698303625feba21a/000005.png', 'id': 8067}
Traceback (most recent call last):
  File "/home/haoyu/Workspace/Mask_RCNN/model.py", line 1523, in data_generator
    load_image_gt(dataset, config, image_id, augment=augment, use_mini_mask=config.USE_MINI_MASK)
  File "/home/haoyu/Workspace/Mask_RCNN/model.py", line 1148, in load_image_gt
    mask = utils.minimize_mask(bbox, mask, config.MINI_MASK_SHAPE)
  File "/home/haoyu/Workspace/Mask_RCNN/utils.py", line 436, in minimize_mask
    m = scipy.misc.imresize(m.astype(float), mini_shape, interp='bilinear')
  File "/home/haoyu/venv/lib/python3.4/site-packages/scipy/misc/pilutil.py", line 480, in imresize
    im = toimage(arr, mode=mode)
  File "/home/haoyu/venv/lib/python3.4/site-packages/scipy/misc/pilutil.py", line 299, in toimage
    cmin=cmin, cmax=cmax)
  File "/home/haoyu/venv/lib/python3.4/site-packages/scipy/misc/pilutil.py", line 90, in bytescale
    cmin = data.min()
  File "/home/haoyu/venv/lib/python3.4/site-packages/numpy/core/_methods.py", line 29, in _amin
    return umr_minimum(a, axis, None, out, keepdims)
ValueError: zero-size array to reduction operation minimum which has no identity

Anyone can tell me how to fix it?

Can we use the demo to test in CPU

Can we use the CPU instead of GPU with your demo