Giter Club home page Giter Club logo

rdn4depth's Introduction

rdn4depth

A new learning based method to estimate depth from unconstrained monocular videos without ground truth supervision. The core contribution lies in Region Deformer Networks (RDN) for modeling various forms of object motions by the bicubic function. More details can be found in our paper:

Region Deformer Networks for Unsupervised Depth Estimation from Unconstrained Monocular Videos

Haofei Xu, Jianmin Zheng, Jianfei Cai and Juyong Zhang

IJCAI 2019

Any questions or discussions are welcomed!

RDN

The parameters of the bicubic function are learned by our proposed Region Deformer Network (RDN).

Installation

The code is developed with Python 3.6 and TensorFlow 1.2.0. For conda users (anaconda or miniconda), we have provided an environment.yml file, you can install with the following command:

conda env create -f environment.yml

Data Preparation

KITTI

You need to download KITTI raw dataset first, then the raw data is processed with the following three steps:

  1. Generate training data

    python prepare.py \
    --dataset_dir /path/to/kitti/raw/data \
    --dump_root /path/to/save/processed/data \
    --gen_data
  2. Instance segmentation

    When training with our motion model, the instance segmentation mask is needed. We use an open source Mask R-CNN implementation to generate the segmentation mask. The raw output by Mask R-CNN is saved as lossless .png format (with shape [H, W, 3], the same value is used for all three channels, 0 for background and 1-255 for different instances). We name the raw output as X-raw-fseg.png, e.g. for image file test.png, you should save its segmentation as test-raw-fseg.png.

  3. Align segments across frames

    As the raw segments are not temporally consistent, we need to align them to make the same object have same instance id across frames.

    python prepare.py \
    --dump_root /path/to/processed/data \
    --align_seg

Cityscapes

You need to download image sequence leftImg8bit_sequence_trainvaltest.zip and calibration file camera_trainvaltest.zip from Cityscapes website (registration is needed to download the data), then the data is processed with the following three steps:

  1. Generate training data

    python prepare.py \
    --dataset cityscapes \
    --dataset_dir /path/to/cityscapes/data \
    --dump_root /path/to/save/processed/data \
    --gen_data
  2. Instance segmentation

    Same as KITTI.

  3. Align segments across frames

    python prepare.py \
    --dataset cityscapes \
    --dump_root /path/to/processed/data \
    --align_seg

Training

Detailed training commands for reproducing our results are provided below. Every time you run, the command and flags will be saved to checkpoint_dir/command.txt and checkpoint_dir/flags.json to track experiments history.

KITTI

  • Baseline

    python train.py \
    --logtostderr \
    --checkpoint_dir checkpoints/kitti-baseline \
    --data_dir /path/to/processed/kitti/data \
    --imagenet_ckpt /path/to/pretrained/resnet18/model \
    --seg_align_type null
  • Motion

    python train.py \
    --logtostderr \
    --checkpoint_dir checkpoints/kitti-motion \
    --data_dir /path/to/processed/kitti/data \
    --handle_motion \
    --pretrained_ckpt /path/to/pretrained/baseline/model \
    --learning_rate 2e-5 \
    --object_depth_weight 0.5

Cityscapes

  • Baseline

    python train.py \
    --logtostderr \
    --checkpoint_dir checkpoints/cityscapes-baseline \
    --data_dir /path/to/processed/cityscapes/data \
    --imagenet_ckpt /path/to/pretrained/resnet18/model \
    --seg_align_type null \
    --smooth_weight 0.008
  • Motion

    python train.py \
    --logtostderr \
    --checkpoint_dir checkpoints/cityscapes-motion \
    --data_dir /path/to/processed/cityscapes/data \
    --handle_motion \
    --pretrained_ckpt /path/to/pretrained/baseline/model \
    --learning_rate 2e-5 \
    --object_depth_weight 0.5 \
    --object_depth_threshold 0.5 \
    --smooth_weights 0.008

Models

The trained models for KITTI and Cityscapes dataset are available at Google Drive.

Inference

Inference can be running on an image list file (for evaluation) or an image directory (for visualization).

KITTI

python inference.py \
--logtostderr \
--depth \
--input_list_file dataset/test_files_eigen.txt \
--output_dir output/ \
--model_ckpt /path/to/trained/model/ckpt

Cityscapes

python inference.py \
--logtostderr \
--depth \
--input_dir /path/to/cityscapes/data/directory \
--output_dir output/cityscapes \
--model_ckpt /path/to/trained/model/ckpt \
--not_save_depth_npy \
--inference_crop cityscapes

Evaluation

You can use the pack_pred_depths function in utils.py to generate a single depth prediction file for evaluation. We also make our depth prediction results on KITTI Eigen test split available at Google Drive.

On the whole image

Standard evaluation protocol on KITTI Eigen test split to compare with previous methods.

python evaluate.py \
--kitti_dir /path/to/kitti/raw/data \
--pred_file /path/to/depth/prediction/file

On specific objects

We also evaluate on specific objects to highlight the performance gains brought by our proposed RDN, which is realized by using the segmentation masks from Mask R-CNN. The segmentation masks for people and cars used in our paper are available at Google Drive.

python evaluate.py \
--kitti_dir /path/to/kitti/raw/data \
--pred_file /path/to/depth/prediction/file \
--mask people \
--seg_dir /path/to/eigen/test/split/segments

The evaluation results on people and cars of KITTI Eigen test split are as follows. If you want to compare with our results, please make sure to use the same object segmentation masks with us.

Citation

If you find our work useful in your research, please consider citing our paper:

@inproceedings{xu2019rdn4depth,
  title={Region Deformer Networks for Unsupervised Depth Estimation from Unconstrained Monocular Videos},
  author={Xu, Haofei and Zheng, Jianmin and Cai, Jianfei and Zhang, Juyong},
  booktitle={IJCAI},
  year={2019}
}

Acknowledgements

The code is inspired by struct2depth, we thank Vincent Casser and Anelia Angelova for clarifying the details of their work.

rdn4depth's People

Contributors

haofeixu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

rdn4depth's Issues

out of memory using Nvidia P40

System information

Have I written custom code : no
OS Platform and Distribution: CentOS release 6.3
TensorFlow version: pip3 install tensorflow_gpu==1.2.0
CUDA/cuDNN version: cuda 8.0 / cudnn 5.1
GPU model and memory: Nvidia P40 (22GB)
Exact command to reproduce:
python3 train.py
--checkpoint_dir ./checkpoints_n0/
--data_dir ../dataset/aicv_2d/data_format/
--architecture resnet
--file_extension jpg
--img_height 128
--img_width 416
--learning_rate 0.0002
--object_depth_weight 0.5
--object_depth_threshold 0.5
--smooth_weight 0.008
--batch_size 2
--handle_motion

Describe the problem

I try to run the training code on my own dataset with Nvidia P40, but failed.
What can I do to solve the problem? Thanks so much!
By the way, I encountered the same problem with struct2depth even if batch_size = 2 using Nvidia V100 32G.

What does "depth" mean?

I want to know about “depth” in your paper.

  • depth is z of 3D(x,y,z) coordination?
  • depth of .npy file(inference.py) is metric?

Thank you.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.