Giter Club home page Giter Club logo

geonet's Introduction

GeoNet

This is a Tensorflow implementation of our paper:

GeoNet: Unsupervised Learning of Dense Depth, Optical Flow and Camera Pose (CVPR 2018)

Zhichao Yin and Jianping Shi

arxiv preprint: (https://arxiv.org/abs/1803.02276)

Requirements

This code has been tested with Python2.7, TensorFlow 1.2 and CUDA 8.0 on Ubuntu 16.04.

Data preparation

For replicating our results in all of the three tasks (monocular depth, camera pose and optical flow), you need to download the following datasets, and preprocess them into certain formats:

For depth and flow tasks, the training data is KITTI raw dataset and you can download them by the official script;

For pose task, the training data is KITTI odometry dataset and you should download the calibration files as well as ground truth poses (for evaluation).

After downloaded the data, you can run the following command for preprocessing:

python data/prepare_train_data.py --dataset_dir=/path/to/kitti/dataset/ --dataset_name=kitti_split --dump_root=/path/to/formatted/data/ --seq_length=3 --img_height=128 --img_width=416 --num_threads=16

For depth task, the --dataset_name should be kitti_raw_eigen and adding --remove_static flag is suggested;

For flow task, the --dataset_name should be kitti_raw_stereo (--remove_static is left out for learning optical flow in generic cases);

For pose task, the --dataset_name should be kitti_odom and --seq_length is set to 5.

You can optionally pretrain the model on Cityscapes dataset for any of the three tasks. The required training data is image sequence leftImg8bit_sequence_trainvaltest.zip and calibration file camera_trainvaltest.zip. After downloaded them, simply run:

python data/prepare_train_data.py --dataset_dir=/path/to/cityscapes/dataset/ --dataset_name='cityscapes' --dump_root=/path/to/formatted/data/ --seq_length=3 --img_height=171 --img_width=416 --num_threads=16

Training

Our code supports two training modes, corresponding to our stage-wise training strategy. The train_rigid mode is mainly for learning depth and pose, while train_flow mode supports direct or residual flow learning.

For train_rigid mode (depth and pose tasks), run the command

python geonet_main.py --mode=train_rigid --dataset_dir=/path/to/formatted/data/ --checkpoint_dir=/path/to/save/ckpts/ --learning_rate=0.0002 --seq_length=3 --batch_size=4 --max_steps=350000 

You can switch the network encoder by setting --dispnet_encoder flag, or perform depth scale normalization (see this paper for details) by setting --scale_normalize as True. Note that for replicating depth and pose results, the --seq_length is suggested to be 3 and 5 respectively.

For train_flow mode (flow task), run the command

python geonet_main.py --mode=train_flow --dataset_dir=/path/to/formatted/data/ --checkpoint_dir=/path/to/save/ckpts/ --learning_rate=0.0002 --seq_length=3 --flownet_type=direct --max_steps=400000

You can choose to learn direct or residual flow by setting --flownet_type flag. Note that when the --flownet_type is residual, the --init_ckpt_file should be specified to point at a model pretrained on the same dataset with mode of train_rigid. Also a max_steps more than 200 epochs is preferred for learning residual flow.

Testing

We provide testing and evaluation scripts for all of the three tasks.

Monocular Depth

Run the following command

python geonet_main.py --mode=test_depth --dataset_dir=/path/to/kitti/raw/dataset/ --init_ckpt_file=/path/to/trained/model/ --batch_size=1 --depth_test_split=eigen --output_dir=/path/to/save/predictions/

Then you can evaluate the prediction by running

python kitti_eval/eval_depth.py --split=eigen --kitti_dir=/path/to/kitti/raw/dataset/ --pred_file=/path/to/predictions/

Camera Pose

Firstly assuming you have downloaded the KITTI odometry dataset (including groundtruth poses), run

python geonet_main.py --mode=test_pose --dataset_dir=/path/to/kitti/odom/dataset/ --init_ckpt_file=/path/to/trained/model/ --batch_size=1 --seq_length=5 --pose_test_seq=9 --output_dir=/path/to/save/predictions/

Now you have predicted pose snippets. You can generate the groundtruth pose snippets by running

python kitti_eval/generate_pose_snippets.py --dataset_dir=/path/to/kitti/odom/dataset/ --output_dir=/path/to/save/gtruth/pose/snippets/ --seq_id=09 --seq_length=5

Then you can evaluate your predictions by

python kitti_eval/eval_pose.py --gtruth_dir=/path/to/gtruth/pose/snippets/ --pred_dir=/path/to/predicted/pose/snippets/

Optical Flow

Firstly you need to download the KITTI flow 2015 dataset and its multi-view extension. For replicating our flow results in the paper, a seq_length of 3 is recommended. You need to format the testing data by running

python kitti_eval/generate_multiview_extension.py --dataset_dir=/path/to/data_scene_flow_multiview/ --calib_dir=/path/to/data_scene_flow_calib/ --dump_root=/path/to/formatted/testdata/ --cam_id=02 --seq_length=3

Then you can test your trained model by

python geonet_main.py --mode=test_flow --dataset_dir=/path/to/formatted/testdata/ --init_ckpt_file=/path/to/trained/model/ --flownet_type=direct --batch_size=1 --output_dir=/path/to/save/predictions/

We again provide evaluation script:

python kitti_eval/eval_flow.py --dataset_dir=/path/to/kitti_stereo_2015/ --pred_dir=/path/to/predictions/

Acknowledgements

We thank Tinghui Zhou and Clément Godard for their great work and repo.

Reference

If you find our work useful in your research please consider citing our paper:

@inproceedings{yin2018geonet,
  title     = {GeoNet: Unsupervised Learning of Dense Depth, Optical Flow and Camera Pose},
  author    = {Yin, Zhichao and Shi, Jianping},
  booktitle = {CVPR},
  year = {2018}
}

geonet's People

Contributors

yzcjtr avatar

Watchers

wyz avatar paper2code - bot avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.