Giter Club home page Giter Club logo

midas's Introduction

Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer

This repository contains code to compute depth from a single image. It accompanies our paper:

Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer
René Ranftl, Katrin Lasinger, David Hafner, Konrad Schindler, Vladlen Koltun

and our preprint:

Vision Transformers for Dense Prediction
René Ranftl, Alexey Bochkovskiy, Vladlen Koltun

MiDaS was trained on 10 datasets (ReDWeb, DIML, Movies, MegaDepth, WSVD, TartanAir, HRWSI, ApolloScape, BlendedMVS, IRS) with multi-objective optimization. The original model that was trained on 5 datasets (MIX 5 in the paper) can be found here.

Changelog

Setup

  1. Pick one or more models and download corresponding weights to the weights folder:
  • For highest quality: dpt_large
  • For moderately less quality, but better speed on CPU and slower GPUs: dpt_hybrid
  • For real-time applications on resource-constrained devices: midas_v21_small
  • Legacy convolutional model: midas_v21
  1. Set up dependencies:

    conda install pytorch torchvision opencv
    pip install timm

    The code was tested with Python 3.7, PyTorch 1.8.0, OpenCV 4.5.1, and timm 0.4.5.

Usage

  1. Place one or more input images in the folder input.

  2. Run the model:

    python run.py --model_type dpt_large
    python run.py --model_type dpt_hybrid 
    python run.py --model_type midas_v21_small
    python run.py --model_type midas_v21
  3. The resulting inverse depth maps are written to the output folder.

via Docker

  1. Make sure you have installed Docker and the NVIDIA Docker runtime.

  2. Build the Docker image:

    docker build -t midas .
  3. Run inference:

    docker run --rm --gpus all -v $PWD/input:/opt/MiDaS/input -v $PWD/output:/opt/MiDaS/output midas

    This command passes through all of your NVIDIA GPUs to the container, mounts the input and output directories and then runs the inference.

via PyTorch Hub

The pretrained model is also available on PyTorch Hub

via TensorFlow or ONNX

See README in the tf subdirectory.

Currently only supports MiDaS v2.1. DPT-based models to be added.

via Mobile (iOS / Android)

See README in the mobile subdirectory.

via ROS1 (Robot Operating System)

See README in the ros subdirectory.

Currently only supports MiDaS v2.1. DPT-based models to be added.

Accuracy

Zero-shot error (the lower - the better) and speed (FPS):

Model DIW, WHDR Eth3d, AbsRel Sintel, AbsRel Kitti, δ>1.25 NyuDepthV2, δ>1.25 TUM, δ>1.25 Speed, FPS
Small models: iPhone 11
MiDaS v2 small 0.1248 0.1550 0.3300 21.81 15.73 17.00 0.6
MiDaS v2.1 small URL 0.1344 0.1344 0.3370 29.27 13.43 14.53 30
Big models: GPU RTX 3090
MiDaS v2 large URL 0.1246 0.1290 0.3270 23.90 9.55 14.29 51
MiDaS v2.1 large URL 0.1295 0.1155 0.3285 16.08 8.71 12.51 51
MiDaS v3.0 DPT-Hybrid URL 0.1106 0.0934 0.2741 11.56 8.69 10.89 46
MiDaS v3.0 DPT-Large URL 0.1082 0.0888 0.2697 8.46 8.32 9.97 47

Citation

Please cite our paper if you use this code or any of the models:

@ARTICLE {Ranftl2022,
    author  = "Ren\'{e} Ranftl and Katrin Lasinger and David Hafner and Konrad Schindler and Vladlen Koltun",
    title   = "Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-Shot Cross-Dataset Transfer",
    journal = "IEEE Transactions on Pattern Analysis and Machine Intelligence",
    year    = "2022",
    volume  = "44",
    number  = "3"
}

If you use a DPT-based model, please also cite:

@article{Ranftl2021,
	author    = {Ren\'{e} Ranftl and Alexey Bochkovskiy and Vladlen Koltun},
	title     = {Vision Transformers for Dense Prediction},
	journal   = {ICCV},
	year      = {2021},
}

Acknowledgements

Our work builds on and uses code from timm. We'd like to thank the author for making these libraries available.

License

MIT License

midas's People

Contributors

ranftlr avatar alexeyab avatar timmh avatar dvdhfnr avatar ak391 avatar barnjamin avatar erikreed avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.