dorn's People
Forkers
issac8huxley jwgu ziqi-zhang mgong2 joestark b2220333 yiweichen04 jiangqh jimish42 ywwang2013 xuezhisd xychen9459 wpfhtl sunyx93 shihmengli yhy1993824 satoshirobatofujimoto caffeandtf einsteinliu yoelm2l hyzcn klevis hzm8341 city292 sg47 wangq95 zhengxiawu williamkrobert visioan nishi951 feimengjuan luben2018 alisure-fork spyderlord liuguoyou dengdan ljun901527 chyanmao miaowu99 lwh521jll walirxx hanruye mazhijie0789 fengkai11 perrywu1989 zebrajack minzhangm mgqwyz jcjavaismgood chengwei-lee mingensiie zhaolei522 kblin1996 dmechea hak-kyoungkim poodarchu yvanyin jiafeng5513 wassryan ideaplexus mqchen1993 ylee1123 tamwaiban autohe ansonyanxin saeid-h aelphy yingxingde xmuzhengyh oliverqian itking666 zhumingxu octweiyi minygd sunyong6082 socome fuxiao567 llfl tedder59 soheilappear paman-ninja eliaswyq sirius-xie wj-chang-42 nicolasrosa-forks trisct lemonerh freefxy nitishjaiswal chenglong-robot bingyuanw roger-lj91 klc5cr6k jmachuca77 duweidai westnight hanshan1 xinfushe billtt17 bauchengdorn's Issues
Performance on unseen dataset
Hi, I've read that supervised learning methods tend to overfit on the data they are trained on. For example, if you train on CityScapes but evaluate on KITTI, you get bad results. I wonder how your models perform in this kind of situation(different data at testing)? In the paper, there are only results for models trained and tested on the same dataset.
Thank you very much.
Typo in paper equation
Hi again,
When I was reading the paper a long time ago, a part of loss function didn't make sense to me. I think there is a potentially important typo there. Please confirm if I am correct or wrong.
In equation (2) the second term should be:
SIGMA (1- log(P) )
instead of SIGMA(log(1-P))
Let's assume if a true depth is k (in uniform distribution), you want:
P0 = 1
P1 = 1
.
.
.
P(K-1) = 1
and
PK = 0
P(K+1) = 0
.
.
PN = 0
This means there is a typo because your equation will give you log(0) for Pi when i>k-1
which will be infinity in LOSS.
About the data you used in KITTI
Does the data you used in KITTI contains both left and right camera, or just left camera?
Preprocessing of training Depth maps
Hi,
I wonder how the depth maps are interpolated for training? there doesn't seem to be any details of this in the paper
Thanks
Questions about the paper
Hi,
I was just trying to understand this paper. I had some questions regarding the paper. Since I am not from a CS domain, the questions might be a bit dumb. Sorry for that. It would be great if anyone could help me understand the paper.
- Why are there 2K weight vectors? How are we coming to this 2K number of weight vectors?
- If I am understanding Y correctly, it is the set of ordinal outputs for each spatial location. In this case, what does 'spatial location' mean? Is it each pixel? In that case, why is the size of Y taken as W X H X 2K? As the paper says, K is the number of sub-intervals between alpha and beta. So, the number of ordinal outputs can be K for each pixel. So to my understanding, Y can be of size W X H X K. I may be understanding it wrong as well.
- It is mentioned in the paper that " ˆl(w,h) is the estimated discrete value decoding from y(w,h)". What does 'estimated discrete value decoding' mean? Is it the predicted value of depth ordinal for each pixel by the developed architecture? If yes, how is ˆl(w,h) different from the Y value for each pixel? Is it that the Y value is the ground truth and ˆl(w,h) is the predicted value?
Thanks.
Best Regards
Ajay
Will the training code be released in the future?
evaluation Eigen split
Hi,
I was wondering if you could share your evaluation code or tell me which code did you use for evaluation?official kitti evaluation?
Did you use lidar raw data or post processed groundtruth provided by kitti?
2K
No module named 'ordinal_decode_layer', any ideas?
I0627 14:09:56.936329 3141 layer_factory.hpp:77] Creating layer decode_ord
ImportError: No module named 'ordinal_decode_layer'
Traceback (most recent call last):
File "demo_nyuv2.py", line 15, in
net = caffe.Net('models/NYUV2/deploy.prototxt', 'models/NYUV2/cvpr_nyuv2.caffemodel', caffe.TEST)
SystemError: <Boost.Python.function object at 0x1d53a50> returned NULL without setting an error
Kitti Benchmarking
Hello,
First, congratulations on your results. I'm also working with Monocular Depth Estimation and I have some questions about the metrics used in the Kitti Depth Prediction Benchmark.
-
Does your Network predict depth in meters (m)?
-
If yes, did you change anything for applying the following metrics?
SILog: Scale invariant logarithmic error [log(m)*100]
iRMSE: Root mean squared error of the inverse depth [1/km]
I'm asking because they use these different units: [log(m)*100)] and [1/km].
split images in training?
Hi, I find that you split a kitti image into 4 slices by width in your demo_kitti.py
, I wonder if you also split the input images when training ?
BTW: could you please offer detail image split files of eigen split? I have confusion in spliting it myself according to section 4.2 in eigen's paper
Message type "caffe.LayerParameter" has no field named "bn_param"
Thanks for make the code open-sourced.
When I run
python3 -m pudb demo_nyuv2.py --filename=./data/NYUV2/demo_01.png --outputroot=./result/NYUV2
I meet this error:
WARNING: Logging before InitGoogleLogging() is written to STDERR
W1117 10:42:23.021567 29694 _caffe.cpp:139] DEPRECATION WARNING - deprecated use of Python interface
W1117 10:42:23.021591 29694 _caffe.cpp:140] Use this instead (with the named "weights" parameter):
W1117 10:42:23.021595 29694 _caffe.cpp:142] Net('models/NYUV2/deploy.prototxt', 1, weights='models/NYUV2/cvpr_nyuv2.caffemodel')
[libprotobuf ERROR /var/tmp/portage/dev-libs/protobuf-3.8.0/work/protobuf-3.8.0/src/google/protobuf/text_format.cc:317] Error parsing text-format caffe.NetParameter: 52:12: Message type "caffe.LayerParameter" has no field named "bn_param".
F1117 10:42:23.022581 29694 upgrade_proto.cpp:90] Check failed: ReadProtoFromTextFile(param_file, param) Failed to parse NetParameter file: models/NYUV2/deploy.prototxt
*** Check failure stack trace: ***
Aborted (core dumped)
I need some hints on the error
Message type "caffe.LayerParameter" has no field named "bn_param"
Could you please give some hints?
No module named ordinal_decode_layer
Hi, thanks to your awesome work. When I try test demo.py, it shows:
ImportError: No module named ordinal_decode_layer
I make successfully pycaffe.
Depth decoding method in kitti_demo.py
I notice that in kitti_demo.py
, you use
ord_score = ord_score/counts - 1.0
ord_score = (ord_score + 40.0)/25.0
ord_score = np.exp(ord_score)
to decode the ordinal regression result (index) to depth in meters. What is the relationship between this equation and the equations in the original paper, which are:
How do you merge these two equations into one, and what alpha and beta do you use?
The values of running variance from pretrained file are negative
Ordinal Regression in cpp
What makes your python layer difference from https://github.com/luoyetx/OrdinalRegression ?
KITTI depth data used for training
Hi,
I just wanted to ask about the KITTI data: when training the model do you use sparse depth data from the LIDAR pointcloud, or do you use interpolated depthmaps as shown in figure 5 as GT?
Thanks!
Daniyar
Unable to reproduce"result/KITTI/demo_01_pred.png" exactly
Thank you for your contribution first of all.
After setting up the pycaffe of the repo and downloading the model I ran "demo_kitti.py" and "demo_nyuv2", which saves the inference results as png.
For NYUV2, I can recreate "result/NYUV2/demo_01_pred.png" exactly.
For KITTI however, the file "result/KITTI/demo_01_pred.png" is slightly different (see image below).
Can it be that the KITTI checkpoint model is not up to date?
P.s.: On a related note: When performing inference on the 697 images from the Eigen split using Garg crop and evaluating, I get an abs-rel-error of 0.098 instead of 0.072 as reported in the paper. I believe, both symptoms may have the same root cause?
About resnet101 arch
Hey, I noticed that your resnet model is a dilated version. So I wonder what architecture do you use? Do you use this DRN? Or some other architecture?
Thanks
Question about the gradient equation
Hi, dear all,
when I tried to train the net, i met a problem about the gradient calculation with Equation(4). I tried to write the LossLayer, but I don't know how to write the backward. which layer's output is the x(w,h) in the equation?
Thanks,
License
Hi Dear,
Could you tell me what is the license of this software, please?
According to GitHub's Policy, all repositories without an explicit license are considered Copyrighted materials. Do the authors intend to make this software free?
Thank you!
read deploy.prototxt failed
When I tried to run the demo_kitti.py, I failed with the error:
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0705 09:34:31.505373 33699 _caffe.cpp:135] DEPRECATION WARNING - deprecated use of Python interface
W0705 09:34:31.505487 33699 _caffe.cpp:136] Use this instead (with the named "weights" parameter):
W0705 09:34:31.505496 33699 _caffe.cpp:138] Net('./models/KITTI/deploy.prototxt', 1, weights='./models/KITTI/cvpr_kitti.caffemodel')
[libprotobuf ERROR google/protobuf/text_format.cc:245] Error parsing text-format caffe.NetParameter: 51:12: Message type "caffe.LayerParameter" has no field named "bn_param".
F0705 09:34:31.509136 33699 upgrade_proto.cpp:88] Check failed: ReadProtoFromTextFile(param_file, param) Failed to parse NetParameter file: ./models/KITTI/deploy.prototxt
*** Check failure stack trace: ***
Aborted (core dumped)
Then I replaced the "bn_param" with "batch_norm_param" and faced a new problem:
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0705 12:38:15.151402 33968 _caffe.cpp:135] DEPRECATION WARNING - deprecated use of Python interface
W0705 12:38:15.151561 33968 _caffe.cpp:136] Use this instead (with the named "weights" parameter):
W0705 12:38:15.151607 33968 _caffe.cpp:138] Net('./models/KITTI/deploy.prototxt', 1, weights='./models/KITTI/cvpr_kitti.caffemodel')
[libprotobuf ERROR google/protobuf/text_format.cc:245] Error parsing text-format caffe.NetParameter: 52:18: Message type "caffe.BatchNormParameter" has no field named "slope_filler".
F0705 12:38:15.156330 33968 upgrade_proto.cpp:88] Check failed: ReadProtoFromTextFile(param_file, param) Failed to parse NetParameter file: ./models/KITTI/deploy.prototxt
*** Check failure stack trace: ***
Aborted (core dumped)
I have tried many times and I can't solve it. Could you please provide a solution or an idea? Thanks!
What are the details of augmentation?
Hi, I just read your paper and codes. I have been wondering what steps you have taken to augment your images. And would it be possible to share how you deal with images while training?
Why is 2K?
Hello! Thanks for your outstanding work! But i have some questions about your paper.
When i read your paper, i wonder why the number of Y is 2K. And what's the double layers in Ordinal regression stand for?
How to interpolate the sparse Kitti depth image?
I want to transfer the sparse Kitti depth image into a dense one. Could someone give me some guide?
How to use the SID?
Hello.
Thanks for your awesome paper and I'd prepared to reproduce your training code.
But I have a question that how to use the SID? It seems like we need to calculate K thresholds by SID in the data loading stage and transmit the thresholds to loss layers. Is right?
Thanks.
How to train
Please provide some instructions to train.
KITTI model clarification
Hi,
thanks for sharing. Does kitti model uses kitti depth dataset for training ? or it uses the Eigen split? or it is the model you used for Robust vision challenge?
If not is it possible to share Eigen model you evaluated in the paper and also Robust vision challenge pretrained model aswell?
Thanks
Tensorflow or Pytorch code
Hi,
Thank you for sharing the inference code of your work.
Can you release the same code in either tensorflow or pytorch.
Thanks.
How to change training label?
Hi, I find that you add a post process to direct network output ord_score = np.exp((ord_score + 40.0)/25.0)
. I want to know how this come about.
What is more, do you process the training labels in the inverse way?
Question about the white boundary in NYU dataset
Hi,
I know this question may sound stupid. But I found for nyu dataset, the labeled image have white boundary around, and in the ground truth, those pixels have valid depth value (as image attached).
I wonder what is the conventional way to handle it in the test time. I read a few codes, but it seems no one tries to handle it. So is the network supposed to predict the value for these pixels as well?
I am very new in this field and feel it is a little bit weird to ask network to predict value in these meaningless region. Could any one tell me the right way to do it? Crop or mask out the region?
How many trainable parameters?
Hello, in your article you inform that the encoder used has 51M trainable parameters. What about the total number of trainable parameters? Does this value include the "Dense feature extractor" + "Scene Understanding modular" parameters?
Tensorflow
Hi,
Thank you for sharing the inference code of your work.
Can you please implement it in Tensorflow framework?
Thanks
Did the code also generate point cloud?
I want to know the code only generate the depth image from RGB or also generate the point cloud from RGB depth?
Depth image to point clouds with rgb
Hi,
I'm interested in converting the depth image in point cloud with RGP Infos.
I found this script on the internet:
but I think I have some informations missing like focal length and scaling factor.
Thanks for your help
Question about equation
The usage of SID
Since many issues are presented about SID, here, I offer my realization.
The alpha and beta are determined from the ground truth, where you should compute the max and min depth value for one certain dataset. When you fine-tune on a new dataset, don't forget to modify the value.
The K setting is shown in the Exp section 4.2.3, 80 is the best.
Though I haven't reproduced the perfect results on NYU v2 because of the different network architecture, it seems this strategy offers the promotion for accuracy.
I also try the learnable alpha and beta value but it seems it varies a lot for different scenes. When the alpha changes stably, I believe it suffers from overfitting.
Using the statistic value of alpha and beta can directly reduce the error since you will not get a too small or too large value. I think it can be deemed as an interesting trick to reduce the accuracy, but I haven't tried whether changing the alpha or beta to a more free value like 0 and 10 is helpful.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.