The rdn4depth from haofeixu

rdn4depth

A new learning based method to estimate depth from unconstrained monocular videos without ground truth supervision. The core contribution lies in Region Deformer Networks (RDN) for modeling various forms of object motions by the bicubic function. More details can be found in our paper:

Region Deformer Networks for Unsupervised Depth Estimation from Unconstrained Monocular Videos

Haofei Xu, Jianmin Zheng, Jianfei Cai and Juyong Zhang

IJCAI 2019

Any questions or discussions are welcomed!

RDN

The parameters of the bicubic function are learned by our proposed Region Deformer Network (RDN).

Installation

The code is developed with Python 3.6 and TensorFlow 1.2.0. For conda users (anaconda or miniconda), we have provided an environment.yml file, you can install with the following command:

conda env create -f environment.yml

Data Preparation

KITTI

You need to download KITTI raw dataset first, then the raw data is processed with the following three steps:

Generate training data

python prepare.py \
--dataset_dir /path/to/kitti/raw/data \
--dump_root /path/to/save/processed/data \
--gen_data

Instance segmentation

When training with our motion model, the instance segmentation mask is needed. We use an open source Mask R-CNN implementation to generate the segmentation mask. The raw output by Mask R-CNN is saved as lossless .png format (with shape [H, W, 3], the same value is used for all three channels, 0 for background and 1-255 for different instances). We name the raw output as X-raw-fseg.png, e.g. for image file test.png, you should save its segmentation as test-raw-fseg.png.
Align segments across frames

As the raw segments are not temporally consistent, we need to align them to make the same object have same instance id across frames.
```
python prepare.py \
--dump_root /path/to/processed/data \
--align_seg
```

Cityscapes

You need to download image sequence leftImg8bit_sequence_trainvaltest.zip and calibration file camera_trainvaltest.zip from Cityscapes website (registration is needed to download the data), then the data is processed with the following three steps:

Generate training data

python prepare.py \
--dataset cityscapes \
--dataset_dir /path/to/cityscapes/data \
--dump_root /path/to/save/processed/data \
--gen_data

Instance segmentation

Same as KITTI.

Align segments across frames

python prepare.py \
--dataset cityscapes \
--dump_root /path/to/processed/data \
--align_seg

Training

Detailed training commands for reproducing our results are provided below. Every time you run, the command and flags will be saved to checkpoint_dir/command.txt and checkpoint_dir/flags.json to track experiments history.

KITTI

Baseline

python train.py \
--logtostderr \
--checkpoint_dir checkpoints/kitti-baseline \
--data_dir /path/to/processed/kitti/data \
--imagenet_ckpt /path/to/pretrained/resnet18/model \
--seg_align_type null

Motion

python train.py \
--logtostderr \
--checkpoint_dir checkpoints/kitti-motion \
--data_dir /path/to/processed/kitti/data \
--handle_motion \
--pretrained_ckpt /path/to/pretrained/baseline/model \
--learning_rate 2e-5 \
--object_depth_weight 0.5

Cityscapes

Baseline

python train.py \
--logtostderr \
--checkpoint_dir checkpoints/cityscapes-baseline \
--data_dir /path/to/processed/cityscapes/data \
--imagenet_ckpt /path/to/pretrained/resnet18/model \
--seg_align_type null \
--smooth_weight 0.008

Motion

python train.py \
--logtostderr \
--checkpoint_dir checkpoints/cityscapes-motion \
--data_dir /path/to/processed/cityscapes/data \
--handle_motion \
--pretrained_ckpt /path/to/pretrained/baseline/model \
--learning_rate 2e-5 \
--object_depth_weight 0.5 \
--object_depth_threshold 0.5 \
--smooth_weights 0.008

Models

The trained models for KITTI and Cityscapes dataset are available at Google Drive.

Inference

Inference can be running on an image list file (for evaluation) or an image directory (for visualization).

KITTI

python inference.py \
--logtostderr \
--depth \
--input_list_file dataset/test_files_eigen.txt \
--output_dir output/ \
--model_ckpt /path/to/trained/model/ckpt

Cityscapes

python inference.py \
--logtostderr \
--depth \
--input_dir /path/to/cityscapes/data/directory \
--output_dir output/cityscapes \
--model_ckpt /path/to/trained/model/ckpt \
--not_save_depth_npy \
--inference_crop cityscapes

Evaluation

You can use the pack_pred_depths function in utils.py to generate a single depth prediction file for evaluation. We also make our depth prediction results on KITTI Eigen test split available at Google Drive.

On the whole image

Standard evaluation protocol on KITTI Eigen test split to compare with previous methods.

python evaluate.py \
--kitti_dir /path/to/kitti/raw/data \
--pred_file /path/to/depth/prediction/file

On specific objects

We also evaluate on specific objects to highlight the performance gains brought by our proposed RDN, which is realized by using the segmentation masks from Mask R-CNN. The segmentation masks for people and cars used in our paper are available at Google Drive.

python evaluate.py \
--kitti_dir /path/to/kitti/raw/data \
--pred_file /path/to/depth/prediction/file \
--mask people \
--seg_dir /path/to/eigen/test/split/segments

The evaluation results on people and cars of KITTI Eigen test split are as follows. If you want to compare with our results, please make sure to use the same object segmentation masks with us.

Citation

If you find our work useful in your research, please consider citing our paper:

@inproceedings{xu2019rdn4depth,
  title={Region Deformer Networks for Unsupervised Depth Estimation from Unconstrained Monocular Videos},
  author={Xu, Haofei and Zheng, Jianmin and Cai, Jianfei and Zhang, Juyong},
  booktitle={IJCAI},
  year={2019}
}

Acknowledgements

The code is inspired by struct2depth, we thank Vincent Casser and Anelia Angelova for clarifying the details of their work.

haofeixu / rdn4depth Goto Github PK

rdn4depth's Introduction

rdn4depth

RDN

Installation

Data Preparation

KITTI

Cityscapes

Training

KITTI

Cityscapes

Models

Inference

KITTI

Cityscapes

Evaluation

On the whole image

On specific objects

Citation

Acknowledgements

rdn4depth's People

Contributors

Stargazers

Watchers

Forkers

rdn4depth's Issues

System information

Describe the problem

Recommend Projects

Recommend Topics

Recommend Org