hufu6371 / dorn Goto Github PK

Python 4.10% CMake 1.21% Makefile 0.27% Shell 0.31% HTML 0.08% CSS 0.10% Jupyter Notebook 57.24% C++ 33.60% Cuda 2.69% MATLAB 0.36% Dockerfile 0.03%

dorn's Introduction

DORN: Deep Ordinal Regression Network for Monocular Depth Estimation

Paper

H. Fu, M. Gong, C. Wang, K. Batmanghelich and D. Tao: Deep Ordinal Regression Network for Monocular Depth Estimation. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2018.

Introduction

The shared code is a Caffe implemention of our CVPR18 paper (DORN). The provided Caffe is not our internal one. But one can still use it for evaluation. We provide the pretrained models for KITTI and NYUV2 here (See Tab. 3 and Tab.4 in our paper). The code has been tested successfully on CentOS release 6.9, Cuda 9.0.176, Tesla V100, Anaconda python 2.7, Cudnn 7.0.

Our method won the 1st prize in Robust Vision Challange 2018. We ranked 1st place on both KITTI and ScanNet. Slides can be downloaded here.

KITTI

ScanNet

Robust Vision Challange 2018

This code is only for research purposes. If you use the provided Caffe, you may also need to follow the instructions of DeepLab v2 and PSPNet.

Installation

See Caffe for installation.

Usage

Clone the respository:

git clone https://github.com/hufu6371/DORN.git

Build and link to pycaffe:

cd $DORN_ROOT
edit Makefile.config
build pycaffe
export PYTHONPATH=$DORN_ROOT/python:$DORN_ROOT/pylayer:$PYTHONPATH

Download our pretrained models:

mv cvpr_kitti.caffemodel $DORN_ROOT/models/KITTI/
mv cvpr_nyuv2.caffemodel $DORN_ROOT/models/NYUV2/

Demo (KITTI and NYUV2):

python demo_kitti.py --filename=./data/KITTI/demo_01.png --outputroot=./result/KITTI
python demo_nyuv2.py --filename=./data/NYUV2/demo_01.png --outputroot=./result/NYUV2

Pretrained models

KITTI
NYUV2

Scores on the evaluation servers

Results on ScanNet

The evaluation scripts and the groundtruth depth maps for KITTI and NYU Depth v2 are contained in the zip files. You may also need to download the predictions from Eigen et al. for the center cropping used in our evaluation scripts.

Citation

@inproceedings{FuCVPR18-DORN,
  TITLE = {{Deep Ordinal Regression Network for Monocular Depth Estimation}},
  AUTHOR = {Fu, Huan and Gong, Mingming and Wang, Chaohui and Batmanghelich, Kayhan and Tao, Dacheng},
  BOOKTITLE = {{IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}},
  YEAR = {2018}
}

Contact

Huan Fu: [email protected]

dorn's People

Contributors

Stargazers

Watchers

Forkers

issac8huxley jwgu ziqi-zhang mgong2 joestark b2220333 yiweichen04 jiangqh jimish42 ywwang2013 xuezhisd xychen9459 wpfhtl sunyx93 shihmengli yhy1993824 satoshirobatofujimoto caffeandtf einsteinliu yoelm2l hyzcn klevis hzm8341 city292 sg47 wangq95 zhengxiawu williamkrobert visioan nishi951 feimengjuan luben2018 alisure-fork spyderlord liuguoyou dengdan ljun901527 chyanmao miaowu99 lwh521jll walirxx hanruye mazhijie0789 fengkai11 perrywu1989 zebrajack minzhangm mgqwyz jcjavaismgood chengwei-lee mingensiie zhaolei522 kblin1996 dmechea hak-kyoungkim poodarchu yvanyin jiafeng5513 wassryan ideaplexus mqchen1993 ylee1123 tamwaiban autohe ansonyanxin saeid-h aelphy yingxingde xmuzhengyh oliverqian itking666 zhumingxu octweiyi minygd sunyong6082 socome fuxiao567 llfl tedder59 soheilappear paman-ninja eliaswyq sirius-xie wj-chang-42 nicolasrosa-forks trisct lemonerh freefxy nitishjaiswal chenglong-robot bingyuanw roger-lj91 klc5cr6k jmachuca77 duweidai westnight hanshan1 xinfushe billtt17 baucheng

dorn's Issues

How to train

Please provide some instructions to train.

Depth image to point clouds with rgb

Hi,
I'm interested in converting the depth image in point cloud with RGP Infos.
I found this script on the internet:

https://svncvpr.in.tum.de/cvpr-ros-pkg/trunk/rgbd_benchmark/rgbd_benchmark_tools/src/rgbd_benchmark_tools/generate_pointcloud.py

but I think I have some informations missing like focal length and scaling factor.

Thanks for your help

Question about the white boundary in NYU dataset

Hi,

I know this question may sound stupid. But I found for nyu dataset, the labeled image have white boundary around, and in the ground truth, those pixels have valid depth value (as image attached).

I wonder what is the conventional way to handle it in the test time. I read a few codes, but it seems no one tries to handle it. So is the network supposed to predict the value for these pixels as well?

I am very new in this field and feel it is a little bit weird to ask network to predict value in these meaningless region. Could any one tell me the right way to do it? Crop or mask out the region?

Ordinal Regression in cpp

What makes your python layer difference from https://github.com/luoyetx/OrdinalRegression ?

split images in training?

Hi, I find that you split a kitti image into 4 slices by width in your demo_kitti.py, I wonder if you also split the input images when training ?
BTW: could you please offer detail image split files of eigen split? I have confusion in spliting it myself according to section 4.2 in eigen's paper

The usage of SID

Since many issues are presented about SID, here, I offer my realization.
The alpha and beta are determined from the ground truth, where you should compute the max and min depth value for one certain dataset. When you fine-tune on a new dataset, don't forget to modify the value.
The K setting is shown in the Exp section 4.2.3, 80 is the best.
Though I haven't reproduced the perfect results on NYU v2 because of the different network architecture, it seems this strategy offers the promotion for accuracy.
I also try the learnable alpha and beta value but it seems it varies a lot for different scenes. When the alpha changes stably, I believe it suffers from overfitting.
Using the statistic value of alpha and beta can directly reduce the error since you will not get a too small or too large value. I think it can be deemed as an interesting trick to reduce the accuracy, but I haven't tried whether changing the alpha or beta to a more free value like 0 and 10 is helpful.

KITTI depth data used for training

Hi,
I just wanted to ask about the KITTI data: when training the model do you use sparse depth data from the LIDAR pointcloud, or do you use interpolated depthmaps as shown in figure 5 as GT?
Thanks!
Daniyar

Why is 2K?

Hello! Thanks for your outstanding work! But i have some questions about your paper.
When i read your paper, i wonder why the number of Y is 2K. And what's the double layers in Ordinal regression stand for?

Unable to reproduce"result/KITTI/demo_01_pred.png" exactly

Thank you for your contribution first of all.

After setting up the pycaffe of the repo and downloading the model I ran "demo_kitti.py" and "demo_nyuv2", which saves the inference results as png.

For NYUV2, I can recreate "result/NYUV2/demo_01_pred.png" exactly.
For KITTI however, the file "result/KITTI/demo_01_pred.png" is slightly different (see image below).

Can it be that the KITTI checkpoint model is not up to date?

P.s.: On a related note: When performing inference on the 697 images from the Eigen split using Garg crop and evaluating, I get an abs-rel-error of 0.098 instead of 0.072 as reported in the paper. I believe, both symptoms may have the same root cause?

KITTI model clarification

Hi,

thanks for sharing. Does kitti model uses kitti depth dataset for training ? or it uses the Eigen split? or it is the model you used for Robust vision challenge?
If not is it possible to share Eigen model you evaluated in the paper and also Robust vision challenge pretrained model aswell?

Thanks

No module named 'ordinal_decode_layer', any ideas?

I0627 14:09:56.936329 3141 layer_factory.hpp:77] Creating layer decode_ord
ImportError: No module named 'ordinal_decode_layer'
Traceback (most recent call last):
File "demo_nyuv2.py", line 15, in
net = caffe.Net('models/NYUV2/deploy.prototxt', 'models/NYUV2/cvpr_nyuv2.caffemodel', caffe.TEST)
SystemError: <Boost.Python.function object at 0x1d53a50> returned NULL without setting an error

What are the details of augmentation?

Hi, I just read your paper and codes. I have been wondering what steps you have taken to augment your images. And would it be possible to share how you deal with images while training?

Questions about the paper

Hi,

I was just trying to understand this paper. I had some questions regarding the paper. Since I am not from a CS domain, the questions might be a bit dumb. Sorry for that. It would be great if anyone could help me understand the paper.

Why are there 2K weight vectors? How are we coming to this 2K number of weight vectors?
If I am understanding Y correctly, it is the set of ordinal outputs for each spatial location. In this case, what does 'spatial location' mean? Is it each pixel? In that case, why is the size of Y taken as W X H X 2K? As the paper says, K is the number of sub-intervals between alpha and beta. So, the number of ordinal outputs can be K for each pixel. So to my understanding, Y can be of size W X H X K. I may be understanding it wrong as well.
It is mentioned in the paper that " ˆl(w,h) is the estimated discrete value decoding from y(w,h)". What does 'estimated discrete value decoding' mean? Is it the predicted value of depth ordinal for each pixel by the developed architecture? If yes, how is ˆl(w,h) different from the Y value for each pixel? Is it that the Y value is the ground truth and ˆl(w,h) is the predicted value?

Thanks.
Best Regards
Ajay

read deploy.prototxt failed

When I tried to run the demo_kitti.py, I failed with the error:

WARNING: Logging before InitGoogleLogging() is written to STDERR
W0705 09:34:31.505373 33699 _caffe.cpp:135] DEPRECATION WARNING - deprecated use of Python interface
W0705 09:34:31.505487 33699 _caffe.cpp:136] Use this instead (with the named "weights" parameter):
W0705 09:34:31.505496 33699 _caffe.cpp:138] Net('./models/KITTI/deploy.prototxt', 1, weights='./models/KITTI/cvpr_kitti.caffemodel')
[libprotobuf ERROR google/protobuf/text_format.cc:245] Error parsing text-format caffe.NetParameter: 51:12: Message type "caffe.LayerParameter" has no field named "bn_param".
F0705 09:34:31.509136 33699 upgrade_proto.cpp:88] Check failed: ReadProtoFromTextFile(param_file, param) Failed to parse NetParameter file: ./models/KITTI/deploy.prototxt
*** Check failure stack trace: ***
Aborted (core dumped)

Then I replaced the "bn_param" with "batch_norm_param" and faced a new problem:

WARNING: Logging before InitGoogleLogging() is written to STDERR
W0705 12:38:15.151402 33968 _caffe.cpp:135] DEPRECATION WARNING - deprecated use of Python interface
W0705 12:38:15.151561 33968 _caffe.cpp:136] Use this instead (with the named "weights" parameter):
W0705 12:38:15.151607 33968 _caffe.cpp:138] Net('./models/KITTI/deploy.prototxt', 1, weights='./models/KITTI/cvpr_kitti.caffemodel')
[libprotobuf ERROR google/protobuf/text_format.cc:245] Error parsing text-format caffe.NetParameter: 52:18: Message type "caffe.BatchNormParameter" has no field named "slope_filler".
F0705 12:38:15.156330 33968 upgrade_proto.cpp:88] Check failed: ReadProtoFromTextFile(param_file, param) Failed to parse NetParameter file: ./models/KITTI/deploy.prototxt
*** Check failure stack trace: ***
Aborted (core dumped)

I have tried many times and I can't solve it. Could you please provide a solution or an idea? Thanks!

Performance on unseen dataset

Hi, I've read that supervised learning methods tend to overfit on the data they are trained on. For example, if you train on CityScapes but evaluate on KITTI, you get bad results. I wonder how your models perform in this kind of situation(different data at testing)? In the paper, there are only results for models trained and tested on the same dataset.

Thank you very much.

How to use the SID?

Hello.

Thanks for your awesome paper and I'd prepared to reproduce your training code.

But I have a question that how to use the SID? It seems like we need to calculate K thresholds by SID in the data loading stage and transmit the thresholds to loss layers. Is right?

Thanks.

No module named ordinal_decode_layer

Hi, thanks to your awesome work. When I try test demo.py, it shows:

ImportError: No module named ordinal_decode_layer

I make successfully pycaffe.

Will the training code be released in the future?

2K

How many trainable parameters?

Hello, in your article you inform that the encoder used has 51M trainable parameters. What about the total number of trainable parameters? Does this value include the "Dense feature extractor" + "Scene Understanding modular" parameters?

Preprocessing of training Depth maps

Hi,

I wonder how the depth maps are interpolated for training? there doesn't seem to be any details of this in the paper

Thanks

About resnet101 arch

Hey, I noticed that your resnet model is a dilated version. So I wonder what architecture do you use? Do you use this DRN? Or some other architecture?

Thanks

Tensorflow

Hi,
Thank you for sharing the inference code of your work.
Can you please implement it in Tensorflow framework?

Thanks

The values of running variance from pretrained file are negative

Question about the gradient equation

Hi, dear all,
when I tried to train the net, i met a problem about the gradient calculation with Equation(4). I tried to write the LossLayer, but I don't know how to write the backward. which layer's output is the x(w,h) in the equation?

Thanks,

Typo in paper equation

Hi again,

When I was reading the paper a long time ago, a part of loss function didn't make sense to me. I think there is a potentially important typo there. Please confirm if I am correct or wrong.

In equation (2) the second term should be:
SIGMA (1- log(P) ) instead of SIGMA(log(1-P))

Let's assume if a true depth is k (in uniform distribution), you want:

P0 = 1
P1 = 1 
.
.
.
P(K-1) = 1

and

PK = 0
P(K+1) = 0
.
.
PN = 0

This means there is a typo because your equation will give you log(0) for Pi when i>k-1 which will be infinity in LOSS.

How to interpolate the sparse Kitti depth image?

I want to transfer the sparse Kitti depth image into a dense one. Could someone give me some guide?

About the data you used in KITTI

Does the data you used in KITTI contains both left and right camera, or just left camera?

Tensorflow or Pytorch code

Hi,
Thank you for sharing the inference code of your work.
Can you release the same code in either tensorflow or pytorch.

Thanks.

Depth decoding method in kitti_demo.py

I notice that in kitti_demo.py, you use

ord_score = ord_score/counts - 1.0
ord_score = (ord_score + 40.0)/25.0
ord_score = np.exp(ord_score)

to decode the ordinal regression result (index) to depth in meters. What is the relationship between this equation and the equations in the original paper, which are:

How do you merge these two equations into one, and what alpha and beta do you use?

Did the code also generate point cloud?

I want to know the code only generate the depth image from RGB or also generate the point cloud from RGB depth?

Kitti Benchmarking

Hello,

First, congratulations on your results. I'm also working with Monocular Depth Estimation and I have some questions about the metrics used in the Kitti Depth Prediction Benchmark.

Does your Network predict depth in meters (m)?
If yes, did you change anything for applying the following metrics?

SILog: Scale invariant logarithmic error [log(m)*100]
iRMSE: Root mean squared error of the inverse depth [1/km]

I'm asking because they use these different units: [log(m)*100)] and [1/km].

evaluation Eigen split

Hi,

I was wondering if you could share your evaluation code or tell me which code did you use for evaluation?official kitti evaluation?
Did you use lidar raw data or post processed groundtruth provided by kitti?

How to change training label?

Hi, I find that you add a post process to direct network output ord_score = np.exp((ord_score + 40.0)/25.0). I want to know how this come about.
What is more, do you process the training labels in the inverse way?

Message type "caffe.LayerParameter" has no field named "bn_param"

Thanks for make the code open-sourced.
When I run

python3 -m pudb demo_nyuv2.py --filename=./data/NYUV2/demo_01.png --outputroot=./result/NYUV2

I meet this error:

WARNING: Logging before InitGoogleLogging() is written to STDERR
W1117 10:42:23.021567 29694 _caffe.cpp:139] DEPRECATION WARNING - deprecated use of Python interface
W1117 10:42:23.021591 29694 _caffe.cpp:140] Use this instead (with the named "weights" parameter):
W1117 10:42:23.021595 29694 _caffe.cpp:142] Net('models/NYUV2/deploy.prototxt', 1, weights='models/NYUV2/cvpr_nyuv2.caffemodel')
[libprotobuf ERROR /var/tmp/portage/dev-libs/protobuf-3.8.0/work/protobuf-3.8.0/src/google/protobuf/text_format.cc:317] Error parsing text-format caffe.NetParameter: 52:12: Message type "caffe.LayerParameter" has no field named "bn_param".
F1117 10:42:23.022581 29694 upgrade_proto.cpp:90] Check failed: ReadProtoFromTextFile(param_file, param) Failed to parse NetParameter file: models/NYUV2/deploy.prototxt
*** Check failure stack trace: ***
Aborted (core dumped)

I need some hints on the error

Message type "caffe.LayerParameter" has no field named "bn_param"

Could you please give some hints?

Question about equation

In equation 4,

what is the meaning of P(w,h)-1 part? why are you subtracting 1?

Thank you in advance

License

Hi Dear,
Could you tell me what is the license of this software, please?
According to GitHub's Policy, all repositories without an explicit license are considered Copyrighted materials. Do the authors intend to make this software free?
Thank you!

hufu6371 / dorn Goto Github PK

dorn's Introduction

DORN: Deep Ordinal Regression Network for Monocular Depth Estimation

Paper

Introduction

Installation

Usage

Pretrained models

Scores on the evaluation servers

Results on ScanNet

Citation

Contact

dorn's People

Contributors

Stargazers

Watchers

Forkers

dorn's Issues

Recommend Projects

Recommend Topics

Recommend Org