Giter Club home page Giter Club logo

mvpnet's Introduction

MVPNet: Multi-view PointNet for 3D Scene Understanding

If you find our work useful, please cite our paper:

@inproceedings{jaritz2019multi,
	title={Multi-view PointNet for 3D Scene Understanding},
	author={Jaritz, Maximilian and Gu, Jiayuan and Su, Hao},
	booktitle={ICCV Workshop 2019},
	year={2019}
}

Pre-requisites

  • Python 3.6
  • Pytorch 1.2.0
  • CUDA 10.0 & CUDNN 7.6.4

All experiments can be run with a GPU with 11GB of RAM.

Preparation

Installation

It is recommended to create a (mini)conda environment:

cd <root of this repo>
conda env create --file environment.yml
conda activate mvpnet
export PYTHONPATH=$(pwd):$PYTHONPATH

Compile the module for PointNet++:

bash compile.sh

ScanNet Dataset

In order to download the ScanNet data, you need to fill out an agreement. You will then be provided with a download script. Please refer to the github page for details:

https://github.com/ScanNet/ScanNet

Note that you need to make minor changes to the script to run it with python3 (use import urllib.request instead of urllib and input instead of raw_input).

3D data

The 3D data contains the point clouds and 3D segmentation labels which is sufficient if you only want to run the PointNet2 baseline. However, for MVPNet with image fusion you also need to download the 2D data. Download the 3D data using the script like follows:

python download-scannet.py -o <ScanNet root> --type _vh_clean_2.ply
python download-scannet.py -o <ScanNet root> --type _vh_clean_2.labels.ply

After the download, your folder should look like this:

<ScanNet root>/scans/<scan_id>                                     % scan id, e.g.: scene0000_00
<ScanNet root>/scans/<scan_id>/<scan_id>_vh_clean_2.ply            % 3D point cloud (whole scene)
<ScanNet root>/scans/<scan_id>/<scan_id>_vh_clean_2.labels.ply     % 3D semantic segmentation labels

There is also a scans_test directory which is only needed to submit the results to the benchmark server.

2D data

Note that it takes a lot of time to download and extract the whole 2D data. It comprises all frames of the video streams. Download the 2D data using the script like follows:

python download-scannet.py -o <ScanNet root> --type .sens
python download-scannet.py -o <ScanNet root> --type _2d-label.zip

After download, the .sens files (one per scene) need to be extracted to get the 2D frame information (color, depth, intrinsic, pose). For extraction, we need to use the official SensReader code from the ScanNet repo which is in Python 2. We suggest to create an environment (e.g. conda create --name py27 python=2.7 ) and use it to run our convenience script using multiprocessing for extraction (please change the data paths in the script accordingly):

# only for this script, use Python 2.7
python mvpnet/data/preprocess/extract_raw_data_scannet.py

After that, you need to unzip the 2D labels. You can use our convenience script (adapt paths):

python mvpnet/data/preprocess/unzip_2d_labels.py

After extraction, you should have the following 2D data in your data directory:

<ScanNet root>/scans/<scan_id>                                     % scan id, e.g.: scene0000_00 
<ScanNet root>/scans/<scan_id>/color/                              % RGB images
<ScanNet root>/scans/<scan_id>/depth/                              % depth images
<ScanNet root>/scans/<scan_id>/intrinsic/                          % camera intrinsics
<ScanNet root>/scans/<scan_id>/label/                              % 2D labels
<ScanNet root>/scans/<scan_id>/pose/                               % pose for each 2D frame

In order to save disk space and reduce data loading times, we resize all images to the target resolution of 160x120. Use the following script (adapt paths in the script):

python mvpnet/data/preprocess/resize_scannet_images.py

Build data cache

If you only want to run the PointNet2 baseline use the following script (adapt paths in the script):

python mvpnet/data/preprocess/preprocess.py -o <output dir> -s <split>

This dumps all the ply files (3D point clouds) into one pickle file for faster loading.

For MVPNet, we also need to compute the overlap of each RGBD frame with the whole scene point cloud. You might want to copy the comparably lightweight 3D data, poses and intrinsics over to the directory with the resized scans with the --parents option in order to use the downscaled data for preprocessing.

cd <ScanNet root>/scans
cp --parents scene0*/*.ply <ScanNet root>/scans_resize_160x120/
cp -r --parents scene0*/intrinsic <ScanNet root>/scans_resize_160x120/
cp -r --parents scene0*/pose <ScanNet root>/scans_resize_160x120/

Run the preprocessing with the option --rgbd to also compute RGBD overlap, needed for MVPNet.

python mvpnet/data/preprocess/preprocess.py -o <output dir> -s <split> --rgbd

Training

The scripts are supposed to be run from the root of the repo.

MVPNet

Pre-train 2D networks on the 2D semantic segmentation task.

python mvpnet/train_2d.py --cfg configs/scannet/unet_resnet34.yaml

Train MVPNet

python mvpnet/train_mvpnet_3d.py --cfg configs/scannet/mvpnet_3d_unet_resnet34_pn2ssg.yaml

3D baselines (optional)

We include the code for PointNet++ baselines in chunk based and whole scene processing. You can train them with:

# chunk-based processing
python mvpnet/train_3d.py --cfg configs/scannet/3d_baselines/pn2ssg_chunk.yaml
# whole scene processing
python mvpnet/train_3d.py --cfg configs/scannet/3d_baselines/pn2ssg_scene.yaml

Testing

After you have trained the networks, you can test as follows:

MVPNet

2D networks on the 2D semantic segmentation task.

python mvpnet/test_2d.py --cfg configs/scannet/unet_resnet34.yaml

2D networks on the 3D semantic segmentation task. This uses depth maps to unproject the 2D result.

python mvpnet/test_2d_chunks.py --cfg configs/scannet/unet_resnet34.yaml

MVPNet on the 3D semantic segmentation task.

python mvpnet/test_mvpnet_3d.py --cfg configs/scannet/mvpnet_3d_unet_resnet34_pn2ssg.yaml  --num-views 5

3D baselines (optional)

# chunk-based processing
python mvpnet/test_3d_chunks.py --cfg configs/scannet/3d_baselines/pn2ssg_chunk.yaml
# whole scene processing
python mvpnet/test_3d_scene.py --cfg configs/scannet/3d_baselines/pn2ssg_scene.yaml

Results

On the validation set, we obtain the following results:

Method (config) mIoU on val set
3D baseline (pn2ssg_chunk) 58.14
2D to 3D projection baseline (unet_resnet34) 61.22
MVPNet (mvpnet_3d_unet_resnet34_pn2ssg) 67.92

Note that results can slightly differ from training to training.

License

The code is released under the MIT license. Copyright (c) 2019

mvpnet's People

Contributors

maxjaritz avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.