3D object detection and segmentation are crucial for various domains and applications. However, transferring 2D image techniques to 3D data is still challenging because of the massive amount of data contained in 3D voxel grids. We present an architecture, which combines the principle of object detection and segmentation used by Mask R-CNN for 2D images with the computational efficiency of Sparse Submanifold Convolutions on sparse 3D voxel grids. The network consists of a Region Proposal Network to predict bounding boxes and both a Class Network and a Mask Network which rely on the region proposals. We show how parts of the feature extractor, the Class Network and the Mask Network can be rendered sparse. A sparse feature extractor reduces the amount of required computation while keeping similar detection performance. A sparse Mask Network enables to process masks of different shapes batch-wise without resizing and loosing spatial correspondence information. Furthermore, we propose a solution to find the best density of anchors by using anchor-wise anisotropic anchor densities with respect to each anchor’s shape. Our model proves that the Mask R-CNN based 3D model can achieve both state-of-the-art object detection and instance segmentation performance.
The method has been evaluated on the Scannet Benchmark
This setup is tested on Ubuntu 18.04 with CUDA 10.1. Furthermore it requires Anaconda to be installed.
- Download this git repository
git clone [email protected]:LeonhardFeiner/sparse_rcnn.git
- create an Anaconda environment using the environment file of this repo
cd sparse_rcnn/
conda env create -f environment.yml
conda activate py38_pt14_scn
- Download the SparseConvNet repository
cd ..
git clone [email protected]:facebookresearch/SparseConvNet.git
- Install SparseConvNet
cd SparseConvNet/
bash develop.sh
- Install Meshlab for data preparation
- Download the Scannet Dataset
Required Files:
_vh_clean.aggregation.json
_vh_clean_2.ply
_vh_clean_2.0.010000.segs.json
Optional high resolution data (not recommended):
_vh_clean.ply
_vh_clean.segs.json
- Adapt the paths in
scannet_config\pathes.py
- Correct error in "scene0217_00" by running
python preparation\0_raw_data_error_correction.py
- Create label mapper by running
python preparation\1_sparse_label_map.py
- Create the instance association tensors by running
python preparation\2_sparse_instance_association.py
- Calculate vertex normals using meshlab by running
python preparation\3_sparse_normals.py
- Calculate the dataset by
python preparation\4_sparse_create_data.py
- Precalculate augmented data (not required and not recommended)
python preparation\5_precalculate_dataset.py
Parameters of the network can be adapted here:
scannet_config\run.py
The training can be started with
python main.py
I'd like to thank Prof. Dr. Matthias Nießner for sharing his ideas and supervising my work.