Giter Club home page Giter Club logo

dorn's People

Contributors

hufu6371 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dorn's Issues

Performance on unseen dataset

Hi, I've read that supervised learning methods tend to overfit on the data they are trained on. For example, if you train on CityScapes but evaluate on KITTI, you get bad results. I wonder how your models perform in this kind of situation(different data at testing)? In the paper, there are only results for models trained and tested on the same dataset.

Thank you very much.

Typo in paper equation

Hi again,

When I was reading the paper a long time ago, a part of loss function didn't make sense to me. I think there is a potentially important typo there. Please confirm if I am correct or wrong.

In equation (2) the second term should be:
SIGMA (1- log(P) ) instead of SIGMA(log(1-P))

Let's assume if a true depth is k (in uniform distribution), you want:

P0 = 1
P1 = 1 
.
.
.
P(K-1) = 1

and

PK = 0
P(K+1) = 0
.
.
PN = 0

This means there is a typo because your equation will give you log(0) for Pi when i>k-1 which will be infinity in LOSS.

Questions about the paper

Hi,

I was just trying to understand this paper. I had some questions regarding the paper. Since I am not from a CS domain, the questions might be a bit dumb. Sorry for that. It would be great if anyone could help me understand the paper.

  1. Why are there 2K weight vectors? How are we coming to this 2K number of weight vectors?
  2. If I am understanding Y correctly, it is the set of ordinal outputs for each spatial location. In this case, what does 'spatial location' mean? Is it each pixel? In that case, why is the size of Y taken as W X H X 2K? As the paper says, K is the number of sub-intervals between alpha and beta. So, the number of ordinal outputs can be K for each pixel. So to my understanding, Y can be of size W X H X K. I may be understanding it wrong as well.
  3. It is mentioned in the paper that " ˆl(w,h) is the estimated discrete value decoding from y(w,h)". What does 'estimated discrete value decoding' mean? Is it the predicted value of depth ordinal for each pixel by the developed architecture? If yes, how is ˆl(w,h) different from the Y value for each pixel? Is it that the Y value is the ground truth and ˆl(w,h) is the predicted value?

Thanks.
Best Regards
Ajay

evaluation Eigen split

Hi,

I was wondering if you could share your evaluation code or tell me which code did you use for evaluation?official kitti evaluation?
Did you use lidar raw data or post processed groundtruth provided by kitti?

No module named 'ordinal_decode_layer', any ideas?

I0627 14:09:56.936329 3141 layer_factory.hpp:77] Creating layer decode_ord
ImportError: No module named 'ordinal_decode_layer'
Traceback (most recent call last):
File "demo_nyuv2.py", line 15, in
net = caffe.Net('models/NYUV2/deploy.prototxt', 'models/NYUV2/cvpr_nyuv2.caffemodel', caffe.TEST)
SystemError: <Boost.Python.function object at 0x1d53a50> returned NULL without setting an error

Kitti Benchmarking

Hello,

First, congratulations on your results. I'm also working with Monocular Depth Estimation and I have some questions about the metrics used in the Kitti Depth Prediction Benchmark.

  1. Does your Network predict depth in meters (m)?

  2. If yes, did you change anything for applying the following metrics?

SILog: Scale invariant logarithmic error [log(m)*100]
iRMSE: Root mean squared error of the inverse depth [1/km]

I'm asking because they use these different units: [log(m)*100)] and [1/km].

split images in training?

Hi, I find that you split a kitti image into 4 slices by width in your demo_kitti.py, I wonder if you also split the input images when training ?
BTW: could you please offer detail image split files of eigen split? I have confusion in spliting it myself according to section 4.2 in eigen's paper

Message type "caffe.LayerParameter" has no field named "bn_param"

Thanks for make the code open-sourced.
When I run

python3 -m pudb demo_nyuv2.py --filename=./data/NYUV2/demo_01.png --outputroot=./result/NYUV2

I meet this error:

WARNING: Logging before InitGoogleLogging() is written to STDERR
W1117 10:42:23.021567 29694 _caffe.cpp:139] DEPRECATION WARNING - deprecated use of Python interface
W1117 10:42:23.021591 29694 _caffe.cpp:140] Use this instead (with the named "weights" parameter):
W1117 10:42:23.021595 29694 _caffe.cpp:142] Net('models/NYUV2/deploy.prototxt', 1, weights='models/NYUV2/cvpr_nyuv2.caffemodel')
[libprotobuf ERROR /var/tmp/portage/dev-libs/protobuf-3.8.0/work/protobuf-3.8.0/src/google/protobuf/text_format.cc:317] Error parsing text-format caffe.NetParameter: 52:12: Message type "caffe.LayerParameter" has no field named "bn_param".
F1117 10:42:23.022581 29694 upgrade_proto.cpp:90] Check failed: ReadProtoFromTextFile(param_file, param) Failed to parse NetParameter file: models/NYUV2/deploy.prototxt
*** Check failure stack trace: ***
Aborted (core dumped)

I need some hints on the error

Message type "caffe.LayerParameter" has no field named "bn_param"

Could you please give some hints?

No module named ordinal_decode_layer

Hi, thanks to your awesome work. When I try test demo.py, it shows:

ImportError: No module named ordinal_decode_layer

I make successfully pycaffe.

Depth decoding method in kitti_demo.py

I notice that in kitti_demo.py, you use

ord_score = ord_score/counts - 1.0
ord_score = (ord_score + 40.0)/25.0
ord_score = np.exp(ord_score)

to decode the ordinal regression result (index) to depth in meters. What is the relationship between this equation and the equations in the original paper, which are:
image
image

How do you merge these two equations into one, and what alpha and beta do you use?

KITTI depth data used for training

Hi,
I just wanted to ask about the KITTI data: when training the model do you use sparse depth data from the LIDAR pointcloud, or do you use interpolated depthmaps as shown in figure 5 as GT?
Thanks!
Daniyar

Unable to reproduce"result/KITTI/demo_01_pred.png" exactly

Thank you for your contribution first of all.

After setting up the pycaffe of the repo and downloading the model I ran "demo_kitti.py" and "demo_nyuv2", which saves the inference results as png.

For NYUV2, I can recreate "result/NYUV2/demo_01_pred.png" exactly.
For KITTI however, the file "result/KITTI/demo_01_pred.png" is slightly different (see image below).

Can it be that the KITTI checkpoint model is not up to date?

P.s.: On a related note: When performing inference on the 697 images from the Eigen split using Garg crop and evaluating, I get an abs-rel-error of 0.098 instead of 0.072 as reported in the paper. I believe, both symptoms may have the same root cause?

_comparison

About resnet101 arch

Hey, I noticed that your resnet model is a dilated version. So I wonder what architecture do you use? Do you use this DRN? Or some other architecture?

Thanks

Question about the gradient equation

Hi, dear all,
when I tried to train the net, i met a problem about the gradient calculation with Equation(4). I tried to write the LossLayer, but I don't know how to write the backward. which layer's output is the x(w,h) in the equation?

Thanks,

License

Hi Dear,
Could you tell me what is the license of this software, please?
According to GitHub's Policy, all repositories without an explicit license are considered Copyrighted materials. Do the authors intend to make this software free?
Thank you!

read deploy.prototxt failed

When I tried to run the demo_kitti.py, I failed with the error:

WARNING: Logging before InitGoogleLogging() is written to STDERR
W0705 09:34:31.505373 33699 _caffe.cpp:135] DEPRECATION WARNING - deprecated use of Python interface
W0705 09:34:31.505487 33699 _caffe.cpp:136] Use this instead (with the named "weights" parameter):
W0705 09:34:31.505496 33699 _caffe.cpp:138] Net('./models/KITTI/deploy.prototxt', 1, weights='./models/KITTI/cvpr_kitti.caffemodel')
[libprotobuf ERROR google/protobuf/text_format.cc:245] Error parsing text-format caffe.NetParameter: 51:12: Message type "caffe.LayerParameter" has no field named "bn_param".
F0705 09:34:31.509136 33699 upgrade_proto.cpp:88] Check failed: ReadProtoFromTextFile(param_file, param) Failed to parse NetParameter file: ./models/KITTI/deploy.prototxt
*** Check failure stack trace: ***
Aborted (core dumped)

Then I replaced the "bn_param" with "batch_norm_param" and faced a new problem:

WARNING: Logging before InitGoogleLogging() is written to STDERR
W0705 12:38:15.151402 33968 _caffe.cpp:135] DEPRECATION WARNING - deprecated use of Python interface
W0705 12:38:15.151561 33968 _caffe.cpp:136] Use this instead (with the named "weights" parameter):
W0705 12:38:15.151607 33968 _caffe.cpp:138] Net('./models/KITTI/deploy.prototxt', 1, weights='./models/KITTI/cvpr_kitti.caffemodel')
[libprotobuf ERROR google/protobuf/text_format.cc:245] Error parsing text-format caffe.NetParameter: 52:18: Message type "caffe.BatchNormParameter" has no field named "slope_filler".
F0705 12:38:15.156330 33968 upgrade_proto.cpp:88] Check failed: ReadProtoFromTextFile(param_file, param) Failed to parse NetParameter file: ./models/KITTI/deploy.prototxt
*** Check failure stack trace: ***
Aborted (core dumped)

I have tried many times and I can't solve it. Could you please provide a solution or an idea? Thanks!

What are the details of augmentation?

Hi, I just read your paper and codes. I have been wondering what steps you have taken to augment your images. And would it be possible to share how you deal with images while training?

Why is 2K?

Hello! Thanks for your outstanding work! But i have some questions about your paper.
When i read your paper, i wonder why the number of Y is 2K. And what's the double layers in Ordinal regression stand for?

How to use the SID?

Hello.

Thanks for your awesome paper and I'd prepared to reproduce your training code.

But I have a question that how to use the SID? It seems like we need to calculate K thresholds by SID in the data loading stage and transmit the thresholds to loss layers. Is right?

Thanks.

KITTI model clarification

Hi,

thanks for sharing. Does kitti model uses kitti depth dataset for training ? or it uses the Eigen split? or it is the model you used for Robust vision challenge?
If not is it possible to share Eigen model you evaluated in the paper and also Robust vision challenge pretrained model aswell?

Thanks

Tensorflow or Pytorch code

Hi,
Thank you for sharing the inference code of your work.
Can you release the same code in either tensorflow or pytorch.

Thanks.

How to change training label?

Hi, I find that you add a post process to direct network output ord_score = np.exp((ord_score + 40.0)/25.0). I want to know how this come about.
What is more, do you process the training labels in the inverse way?

Question about the white boundary in NYU dataset

Hi,

I know this question may sound stupid. But I found for nyu dataset, the labeled image have white boundary around, and in the ground truth, those pixels have valid depth value (as image attached).

I wonder what is the conventional way to handle it in the test time. I read a few codes, but it seems no one tries to handle it. So is the network supposed to predict the value for these pixels as well?

I am very new in this field and feel it is a little bit weird to ask network to predict value in these meaningless region. Could any one tell me the right way to do it? Crop or mask out the region?

image

How many trainable parameters?

Hello, in your article you inform that the encoder used has 51M trainable parameters. What about the total number of trainable parameters? Does this value include the "Dense feature extractor" + "Scene Understanding modular" parameters?

Tensorflow

Hi,
Thank you for sharing the inference code of your work.
Can you please implement it in Tensorflow framework?

Thanks

Question about equation

In equation 4,

Screenshot from 2019-05-15 11-07-02

what is the meaning of P(w,h)-1 part? why are you subtracting 1?

Thank you in advance

The usage of SID

Since many issues are presented about SID, here, I offer my realization.
The alpha and beta are determined from the ground truth, where you should compute the max and min depth value for one certain dataset. When you fine-tune on a new dataset, don't forget to modify the value.
The K setting is shown in the Exp section 4.2.3, 80 is the best.
Though I haven't reproduced the perfect results on NYU v2 because of the different network architecture, it seems this strategy offers the promotion for accuracy.
I also try the learnable alpha and beta value but it seems it varies a lot for different scenes. When the alpha changes stably, I believe it suffers from overfitting.
Using the statistic value of alpha and beta can directly reduce the error since you will not get a too small or too large value. I think it can be deemed as an interesting trick to reduce the accuracy, but I haven't tried whether changing the alpha or beta to a more free value like 0 and 10 is helpful.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.