Giter Club home page Giter Club logo

sgm's People

Contributors

bill2239 avatar dhernandez0 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sgm's Issues

Black images with 2 neighbours

Hello,
I have been trying out this project on a Jetson TX1 with Cuda 8.0, and in most cases obtained great results and performances. However, when trying to compute the stereo matching with only 2 neighbours, the program returned completely black images.
This behavior is really strange considering that the algorithm runtime is actually of around 80fps when using this option, which is consistent with the expected performance.

Thank you very much, and I will remain available for any enquiry.

Windows port

Has anybody successfully made, or found, a windows port?

execution error

When I run this code by "./sgm dir p1 p2", there is a error that is "Error: no kernel image is available for execution on the device 48". What happend when I run it? How can I solve this problem? Thank you!

How are matching cost computation and horizontal path aggregation fused

Dear @dhernandez0 ,

Thank you for this great implementation of SGM. I'm relatively new to CUDA and get a bit confused while studying your code. As mentioned in the paper, the computation of the initial matching cost is fused with horizontal path aggregation. In the code, I can see that the matching cost is done on stream 1, and the leftToRight and rightToLeft computations are done on stream2 and stream3 respectively, which means they are done in parallel. I'm wondering how is this possible? Shouldn't the matching cost be completed before the path aggregation computation starts as the latter depends on the former? Or if there exists some sort of synchronization between these two parts in the code that I didn't discover?

Sorry for disturbing you with this trivial question. I would appreciate it a lot if you can help me understand it.

help to understand "recompute" and use of rp0,rp1,rp2,rp3

Hi dhernandez,
It is so far the best GPU implementation I have found for SGM. good going. I am trying to implement SGM as one part of my project thesis. I have tried to understand your code and i have a few questions.

  1. why this recompute is required in the first place. can't the LEFT-RIGHT be similar to UP_DOWN cost aggregation
  2. In LEFT_RIGHT cost aggregation, recompute is ON and it is using rp0,rp1,rp2,rp3 registers. Can you explain how these are being used?
  3. in cost_aggregaiton.h line num 106: else block is not being used at all.

any help would be appreciated. thank you

Some wrong with the processing speed

Hi, Daniel Hernandez-Juarez
Thanks for your code.
It helps me a lot to understand the SGM.
I have run your code on Jeston TX2 smoothly.
But I have some trouble about the processing speed.
I changed the input images as 640*480, and got the processing speed as following:
4 path directions: 20fps
8 path directions: 9fps
All of them are about half of your results.
I don't know the reason of that. Maybe the Pascal architecture cannot runs your code very well?
If you know something about that, could you help me?
Thank you!

GpuMat input to compute results invalid output

I am trying to change the compute function to work with cv::cuda::GpuMat directly instead of cv::Mat.

But i am having issues with the cudaMemcpyAsync
CUDA_CHECK_RETURN(cudaMemcpyAsync(d_im0, left.ptr<uint8_t>(), sizeof(uint8_t)*size, cudaMemcpyDeviceToDevice, stream1));

In the above code, "left" is a GpuMat. I have tried various methods to overcome this issue. But i was not successful.

Find the faulty image attached.

aftermemcpyasync

How to copy the GpuMat data to "d_im0" properly?

PS: Similar issue is discussed in http://answers.opencv.org/question/167465/how-to-pass-an-already-stored-data-in-gpu-by-gpumat-to-a-kernel/. Kindly let us know if this could be the reason for issue and a work around.

The processing speed is different.

Hi.
Thanks for your code.
I run the code on the GTX TITAN X and TX2 respectively.
But for the 4path, the speed of image 2 is roughly 232fps and 11fps respectively.
For the 8path, the speed of image 2 is roughly 123fps and 5fps respectively.
I cannot found the reason, could you tell me?

Diparity layers

Hi,
I've used the algorithm and got really good results but once I've reprojected the 2D points to 3D by using the calculated disparity values with the following result point-cloud Where at the end of SGM implementation we have an array with aggregated costs for each pixel. The disparity is equivalent to the index with the lowest cost value. I think this results in individually layers in the point-cloud, i.e round surfaces are cut into many layers (see example point-cloud).
Please anyone have an idea what is the reason to have such layers which in my case (my stereo vision setups) are distant by 5mm?

Thank you for your help!
Arrfou

How to set p1, p2 value

Hi,

First of all, thanks for sharing your code to community.

I have tried to run your code by ./sgm dir p1=1000 p2=2000 on the second example image set. However, what I got is below disparity image. I guess it may caused by setting p1, p2 value incorrectly? Would you please also provide some introduction to set p1, p2 value?

Mike

0000000000

Indexing bug at cost_aggregation.h:49

@dhernandez0
At cost_aggregation.h line 49,
49 if(add_index > 0 && *col > cols) {
Sorry if I am wrong but shouldn't this be (*col > cols - 1) or (*col >= cols)?
If it is the way it is right now, the value at col would reach W, whereas it should be from 0 to W-1.

Disparity image looks very bad

Hey,
I am using GTX 730 graphics card it has compute power of 3.0+.
I implemented the code with some changes so as to get it running(most of the errors were coming due to restrict , and when I removed the __restrict it compiled and ran properly). But I didn't change any of the parameters. I am attaching the output file. Please can you tell me what all parameters do I need to change. It looks like the image is fragmented in many pieces. I have used 2.png from the examples to make the disparity map.

This one is with 4 paths
1
This one is with 8 paths
3

How to get stable depth map

Hey Hernandez, great CUDA code by the way, I am testing out your code with a stereo camera. I am saw a few of your result in your paper and in your video, which I find the disparity map are very smooth. By when I tested out with a camera, which is fixed down pointing towards a static scene, the depth map is a bit unstable especially in the far away area. Depth value of a few area is varying between 1 meter to 20 meter. I play with the p1 and p2 value but still cannot get very static depth map. Any suggestion from you? Thanks!

Clarification regarding Dataset and p1 and p2 value

Dear All

I build sgm in jetson tx2 and executed the sample data-set the output and performance was really good but when i tried with my own dataset which was taken from my own stereo camera the output was noisy.Can some one help me with the clarification. i am attaching the image below

2_4path

cone dataset

1

my own dataset

1

1

Type of disparity image

Hello, in the final results, disparity image type is eight, disparityis 0-127 ,so the distance of only 128 changes, in 0.6-1m range, reduce disparity1, distance add 1 cm, and in the range of 3 meters, disparity decrease 1, distance will add 10 cm. Is there a way to change the disparity image type to 16 bits?

Understanding d_cost arrangement in memory

Hi Mr Hernandez,

First, I just wanted to say great job with this repository. It's one of the best implementations of CUDA accelerated SGM I've come across. Thank you very much for this!

I'm trying to dig into this code and implement some extra functionality off this code. I've been struggling trying to understand the representation of the d_cost volume.

I'm working with an image of width=960, height=800, disparity_levels=128. Therefore the volume of d_cost is 800960128=98304000 which makes sense. However, I'm not sure I understand the way the elements are arranged in this data structure. Is the volume arranged in row->column->disparity order?

i.e,
If i wanted to find the cost associated with co-ordinate location (row=2,column=1,disparity_level=0),
would this global memory indexing scheme hold true?
total_columns = 960;
MAX_DISPARITY=128;
index = d_cost[row x total_columns+column+(total_rows x total_columns) x disparity_level]
index = d_cost[2 x 960+1+(800 x 960) x 0]
index = d_cost[1921]

And with this assumption, the same 2D co-ordinate location in the next disparity level would lie in index location d_cost[2 x 960+1+(800 x 960) x 1] = d_cost[768000+1921] = d_cost[769921]?

I tried using this method arrangement and the results don't seem to make sense. I would greatly appreciate any input regarding the way the d_cost data-structure is laid out in memory.

This isn't an issue regarding the code repository, so I understand if you wish to close this issue immediately.

Thanks in advance,
Shreyas

Compile at TK1 jetson

I tried to compile on my TK1 jetson.
But some errors were occurred.


/home/ubuntu/SOUNANSU/sgm/util.h(142):` error: asm operand type size(4) does not match type/size implied by constraint 'l'

/home/ubuntu/SOUNANSU/sgm/util.h(148):` error: asm operand type size(4) does not match type/size implied by constraint 'l'

/home/ubuntu/SOUNANSU/sgm/util.h(157):` error: asm operand type size(4) does not match type/size implied by constraint 'l'

So, I changed util.h as blow,


diff --git a/util.h b/util.h
index 1b0a76c..f4dc940 100644
--- a/util.h
+++ b/util.h
@@ -139,22 +139,22 @@ __inline__ __device__ int shfl_xor_32(int scalarValue, const int n) {

 __device__ __forceinline__ uint32_t ld_gbl_ca(const __restrict__ uint32_t *addr) {
        uint32_t return_value;
-       asm("ld.global.ca.u32 %0, [%1];" : "=r"(return_value) : "l"(addr));
+       asm("ld.global.ca.u32 %0, [%1];" : "=r"(return_value) : "r"(addr));
        return return_value;
 }

 __device__ __forceinline__ uint32_t ld_gbl_cs(const __restrict__ uint32_t *addr) {
        uint32_t return_value;
-       asm("ld.global.cs.u32 %0, [%1];" : "=r"(return_value) : "l"(addr));
+       asm("ld.global.cs.u32 %0, [%1];" : "=r"(return_value) : "r"(addr));
        return return_value;
 }

 __device__ __forceinline__ void st_gbl_wt(const __restrict__ uint32_t *addr, const uint32_t value) {
-       asm("st.global.wt.u32 [%0], %1;" :: "l"(addr), "r"(value));
+       asm("st.global.wt.u32 [%0], %1;" :: "r"(addr), "r"(value));
 }

 __device__ __forceinline__ void st_gbl_cs(const __restrict__ uint32_t *addr, const uint32_t value) {
-       asm("st.global.cs.u32 [%0], %1;" :: "l"(addr), "r"(value));
+       asm("st.global.cs.u32 [%0], %1;" :: "r"(addr), "r"(value));
 }

 __device__ __forceinline__ uint32_t gpu_get_sm_idx(){

Are there crrect?

Maximum disparity has to be 128

Hello Daniel,
I want to increase the disparity level to more than 256. But I assume this a limitation (I guess it is the GPU memory size limitation), Is that possible to increase the disparity level even with low image resolution.
Thanks in advance
Arrfou

Performance drops on Jetson TX1 and TX2

We just dis some experiments on both Jetson TX1 and TX2 board, but the fps results were quite lower than expected. In fact, our records are:

  • Jetson TX1: 13.4 fps
  • Jetson TX2: 27 fps
    We all set the maximum power mode for both versions and use CUDA 9, Ubuntu 16 to run the experiment.
    So please check and help us to answer the problem here.

Parameter setting on Kitti dataset

Hi,

Thanks to share your code to the community!

I find out that your have tested the alg on Kitti dataset in the paper. Do you mind to share the parameters (e.x. p1, p2 value and the resized image size) you are using on Kitti?

Thanks a lot.

Mike

stranger result of disparity map.

test in ubuntu + cuda10.2 , and maybe i got worng result of disparity map, it's different to disparity map from other SGM implement.
is it my mistake in set 'p1' or 'p2'?
thanks

Same code different machines, different results

I have tested your code on two different machines. One is running on a Maxwell card, the other is running on a Pascal card. The one on Maxwell is running fine, the one on Pascal gives me bad disparity. Any idea why? Could you give me some guidelines to debug? i enclosed the bad one.
image

SGM on OpenCV

Hello,
I am trying to use your GPU SGM code on Jetson TX1 with OpenCV4Tegra and CUDA 8, but I got the results as black images with unreasonable processing time around 0.1ms! Could you please tell me what configuration (i.e. OpenCV version, Cuda version, and the JetPacktx1 version) you used on Tigra x1/Jeston TX1?
However, SGM compiles and runs on PC with (GTX980, OpenCV 2 and Cuda 8.0) successfully.
Thanks in Advance.

Disparity value 64 instead of 128

Hello All

Is it possible to change the value from 128 to 64 for maximum disparity value. i saw a closed issue #4
regarding disparity value and tried changing the uint32_t to uint16_t but not sure i am doing correctly. Can some one help me for changing maximum disparity value to 64.

Build error

Hello,
I am trying to use your SGM code, but I can't build it, because I get "undefined reference to" error. I use Ubuntu 16.04 LTS, OpenCV 3.3.1 (included in ROS), CMake 3.5.1, CUDA 8.0 and nVidia GTX 1080. After several hours of searching in google and trying to solve this error I still can't build your code. It seems that the problem is in OpenCV. I am newbie in Linux, CV and CUDA, so I need help.

Here my build log:

anton@smartcar:~/sgm/build$ cmake ..

  • The C compiler identification is GNU 5.4.0
  • The CXX compiler identification is GNU 5.4.0
  • Check for working C compiler: /usr/bin/cc
  • Check for working C compiler: /usr/bin/cc -- works
  • Detecting C compiler ABI info
  • Detecting C compiler ABI info - done
  • Detecting C compile features
  • Detecting C compile features - done
  • Check for working CXX compiler: /usr/bin/c++
  • Check for working CXX compiler: /usr/bin/c++ -- works
  • Detecting CXX compiler ABI info
  • Detecting CXX compiler ABI info - done
  • Detecting CXX compile features
  • Detecting CXX compile features - done
  • Found OpenCV: /opt/ros/kinetic (found version "3.3.1")
  • Looking for pthread.h
  • Looking for pthread.h - found
  • Looking for pthread_create
  • Looking for pthread_create - not found
  • Looking for pthread_create in pthreads
  • Looking for pthread_create in pthreads - not found
  • Looking for pthread_create in pthread
  • Looking for pthread_create in pthread - found
  • Found Threads: TRUE
  • Found CUDA: /usr/local/cuda (found version "8.0")
  • Configuring done
  • Generating done
  • Build files have been written to: /home/anton/sgm/build

anton@smartcar:~/sgm/build$ make
[ 14%] Building NVCC (Device) object CMakeFiles/sgm.dir/sgm_generated_costs.cu.o
[ 28%] Building NVCC (Device) object CMakeFiles/sgm.dir/sgm_generated_main.cu.o
[ 42%] Building NVCC (Device) object CMakeFiles/sgm.dir/sgm_generated_median_filter.cu.o
[ 57%] Building NVCC (Device) object CMakeFiles/sgm.dir/sgm_generated_hamming_cost.cu.o
[ 71%] Building NVCC (Device) object CMakeFiles/sgm.dir/sgm_generated_disparity_method.cu.o
/home/anton/sgm/disparity_method.cu(40): warning: variable "d_L7" was declared but never referenced

/home/anton/sgm/disparity_method.cu(40): warning: variable "d_L7" was declared but never referenced

/home/anton/sgm/disparity_method.cu(40): warning: variable "d_L7" was declared but never referenced

/home/anton/sgm/disparity_method.cu(40): warning: variable "d_L7" was declared but never referenced

/home/anton/sgm/disparity_method.cu(40): warning: variable "d_L7" was declared but never referenced

[ 85%] Building NVCC (Device) object CMakeFiles/sgm.dir/sgm_generated_debug.cu.o
Scanning dependencies of target sgm
[100%] Linking CXX executable sgm
CMakeFiles/sgm.dir/sgm_generated_main.cu.o: In function disparity_errors(cv::Mat, char const*, int*, int*)': /home/anton/sgm/main.cu:54: undefined reference to cv::imread(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, int)'
CMakeFiles/sgm.dir/sgm_generated_main.cu.o: In function main': /home/anton/sgm/main.cu:164: undefined reference to cv::imread(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, int)'
/home/anton/sgm/main.cu:169: undefined reference to cv::imread(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int)' /home/anton/sgm/main.cu:177: undefined reference to cv::_OutputArray::_OutputArray(cv::Mat&)'
/home/anton/sgm/main.cu:177: undefined reference to cv::_InputArray::_InputArray(cv::Mat const&)' /home/anton/sgm/main.cu:181: undefined reference to cv::_OutputArray::_OutputArray(cv::Mat&)'
/home/anton/sgm/main.cu:181: undefined reference to cv::_InputArray::_InputArray(cv::Mat const&)' /home/anton/sgm/main.cu:211: undefined reference to cv::_InputArray::_InputArray(cv::Mat const&)'
/home/anton/sgm/main.cu:211: undefined reference to cv::imwrite(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, cv::_InputArray const&, std::vector<int, std::allocator<int> > const&)' /home/anton/sgm/main.cu:220: undefined reference to cv::_InputArray::_InputArray(cv::Mat const&)'
/home/anton/sgm/main.cu:220: undefined reference to `cv::imwrite(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, cv::_InputArray const&, std::vector<int, std::allocator > const&)'
collect2: error: ld returned 1 exit status
CMakeFiles/sgm.dir/build.make:169: recipe for target 'sgm' failed
make[2]: *** [sgm] Error 1
CMakeFiles/Makefile2:67: recipe for target 'CMakeFiles/sgm.dir/all' failed
make[1]: *** [CMakeFiles/sgm.dir/all] Error 2
Makefile:83: recipe for target 'all' failed
make: *** [all] Error 2

Best regards, Anton.

"shadow" appears on disparity result

Dear @dhernandez0 ,

Thank you very much for the awesome SGM algorithms on GPU.

I'm testing your repository and the results look very good.
However there is a shadow appears on disparity results. Configuration : PATH_AGGREGATION = 8

Screenshot from 2021-03-07 11-08-38

I used Kalibr for the calibration step, tested with OpenCV's SGBM and got good results.

How do you think about this results ? Could you guide me how to overcome this issue ?

Many thanks in advance.

Best regards,
Hiep

NO makefile

I have no makefile after cmake, I can't find the reason. thank you

How to filter sky noise?

Hello, I have a problem. There is so much sky noise(sky matching error), when I test the SGM in the open air. Did you find the situation? Do you know how to filter the sky noise(or detect the sky and then delete) on GPU?

casting with uint8_t instead of float

Hello,
I have encountered this problem before and I guess line 204 in main.cu could be
const float d = disparity_im.at<uint8_t >(i, j);
Otherwise the output images in CV_16UC1 format are blank.

Thanks
Arrfou

disparity is quite noisy with own data

thanks for the sharing of your work. the speed is really fast.

but i got much noisy disparity with my own camera data [much more complex scene],
the disparity of the example stereo images provided is smoother.
any idea?

TX2 and 8 paths does not work

Obviously, when you were building the software there was no TX2, so you did not prepare the code for it, but I am letting you know that there is an issue when 8 paths option is set:

cudaDeviceSynchronize() returned unspecified launch failure(4) at disparity_method.cu:150

sgm in jetson tx2

Hello All

i am new to jetson and cuda when i build the sgm code in jetson after executing i am getting below error.
"Error: invalid device function 8"
will be really helpfull if some one suggest me were i am going wrong.

How to set the value of P1 and P2?

Thank you for sharing your code and thesis. I am not using KITTI dataset. Can you tell me how to set the value of P1 and P2? I am looking forward to your reply。

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.