Giter Club home page Giter Club logo

localexpstereo's Introduction

Continuous 3D Label Stereo Matching using Local Expansion Moves

Local Expansion Moves

This is an implementatioin of a stereo matching method described in

@article{Taniai18,
  author    = {Tatsunori Taniai and
               Yasuyuki Matsushita and
               Yoichi Sato and
               Takeshi Naemura},
  title     = {{Continuous 3D Label Stereo Matching using Local Expansion Moves}},
  journal   = {{IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)}},
  year      = {2018},
  volume    = {40},
  number    = {11},
  pages     = {2725--2739},
  doi       = {10.1109/TPAMI.2017.2766072},
}

[Project Site] [IEEE preprint] [arXiv preprint (supplemented)].

The code is for research purpose only. If you use our code, please cite the above paper. Along with our TPAMI paper we also encourage to cite the following conference paper too, where we describe the fundamental idea of our optimization technique and also propose a new MRF stereo model (used in both our CVPR and TPAMI papers) that effectively combines the slanted patch matching (Bleyer et al., 2011) and curvature regularization (Olsson et al., 2013) terms.

@inproceedings{Taniai14,
  author    = {Tatsunori Taniai and
               Yasuyuki Matsushita and
               Takeshi Naemura},
  title     = {{Continuous Stereo Matching using Locally Shared Labels}},
  booktitle = {{IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}},
  year      = {2014},
  pages     = {1613--1620},
}

Running environment

  • Visual Studio 2017 Community (installed with the VC++ 2015 vc140 toolset if using the following OpenCV build)
  • OpenCV 3 (OpenCV 3.1.0 package will be automatically installed via NuGet upon the initial build)
  • Maxflow code by Boykov and Kolmogorov [Code v3.01] [Code v3.04]

How to Run?

  1. Download and extract maxflow source code to "maxflow" directory. Modify instances.inc to add the following line
template class Graph<float,float,double>;
  1. Download and extract an example dataset (see Adirondack below) in "data/MiddV3/trainingH/Adirondack".
  2. Build the solution with release mode. (Doing this will automatically install OpenCV3 package via NuGet. If not, you need to manually install OpenCV3 binaries for the corresponding version of the platform toolset . For the platform toolset vc140, I installed OpenCV by doing "Install-Package opencvcontrib -Version 3.1.0" on the Package Manager console of VS2017.)
  3. Run demo.bat file. Results will be saved in "results/cones", "results/teddy", and "results/Adirondack".

Options

  • -mode MiddV2: Use settings for Middlebury V2.
  • -mode MiddV3: Use settings for Middlebury V3. Assume MC-CNN matching cost files (im0.acrt, im1.acrt) in targetDir.
  • -targetDir {string}: Directory that contains target image pairs.
  • -outputDir {string}: Directory for saving results. disp0.pfm is the primary result. Intermediate results are also saved in "debug" sub-directory.
  • -doDual {0,1}: Estimate left and right disparities and do post-processing using consistency check.
  • -iterations {int}: Number of main iterations.
  • -pmIterations {int}: Number of initial iterations performed before main iterations without smoothness terms (this accelerates inference).
  • -ndisp {int}: Define the disparity range [0, ndisp-1]. It not specified, try to retrieve from files (calib.txt or info.txt).
  • -smooth_weight {float}: Smoothness weight (lambda in the paper).
  • -filterRedious {int}: The redius of matching windows (ie, filterRedious/2 is the kernel radius of guided image filter).
  • -mc_threshold {float}: Parameter tau_cnn in the paper that truncates MC-CNN matching cost values.

Updates

  • The function of initial iterations (option: pmIterations) is added to accelerate the inference.
  • The implementation of guided image filter has been improved from the paper, which reduces the running time of our method by half.

Pre-computed MC-CNN matching costs

We use matching cost volumes computed by MC-CNN-acrt. We provide pre-computed matching cost data for 30 test and training image pairs of Middlebury benchmark V3. For demonstration, please use Adirondack below that contains image pairs, calibration data, and ground truth.

Remarks:

  • Only left volume data (im0.acrt) is provided. Right volume data can be recovered from left.
  • These matching costs are raw outputs from CNNs without cross-based filter and SGM aggregation.
  • We also provide MC-CNN-Chainer, pre-trained MC-CNN models in Chainer for easily producing these data on your own.

Tips

Test your own matching costs

By replacing matching cost data files of MC-CNN (im0.acrt and im1.acrt), you can easily use your own matching costs other than MC-CNN without changing the code. These files directly store a 3D float volume and can be read as follows.

float volume0[ndisp][height][width];
FILE *file = fopen("im0.acrt", "rb");
fread(volume0, sizeof(float), ndisp*height*width, file);

Here, a float value volume0[d][y][x] stores a matching cost between two (left and right) image patches centered at im0(x, y) and im1(x-d, y). The ndisp will be loaded from calib.txt, or can be specified by the -ndisp argument. Note that values of volume0[d][y][x] for x-d < 0 (i.e., where im1(x-d, y) is out of the image domain) are ignored and filled by volume0[d][y][x+d]. If your matching costs provide valid values for these regions, you should modify the code to turn off this interpolation by disabling the fillOutOfView function.

During the inference, the algorithm computes the data term D_p(a,b,c) at p = (x, y) by local 3D aggregation using a plane label (a,b,c) as below.

D_p(a,b,c) = sum_{s=(u,v) in window W_p} w_ps * min(volume0[u*a + v*b + c][v][u], options.mc_threshold)
(Equation (6) in the paper)

Here, w_ps is the filter kernel of guided image filtering, and volume0[ua + vb + c][v][u] is computed with linear interpolation in d-space (because ua + vb + c is not integer).

Possible extensions and references

We suggest possibe extensions of the algorithm and list related references.

Use superpixels instead of grid cells

We currently use only regular grid-cells for defining local expansion moves. When extending to use superpixels, following papers will be useful.

  • Taniai et al., "Joint Recovery of Dense Correspondence and Cosegmentation in Two Images" (CVPR 2016)
  • Li et al., "PMSC: PatchMatch-Based Superpixel Cut for Accurate Stereo Matching" (IEEE Trans. Circuits Syst. Video Technol.)
  • Hur and Roth, "MirrorFlow: Exploiting Symmetries in Joint Optical Flow and Occlusion Estimation" (ICCV 2017)

Use simultaneous fusion moves (fusion space) instead of expansion moves

Our loca expansion move method is currently based on binary fusion, which combines a current solution with a single candidate label in one fusion operation. This operation can be replaced with simultaneous fusion (or fusion space), which combines combines a current solution with multiple candidate labels in one fusion operation. This technique leads to better avoidance of local minimums. It can be implemented by using TRW-S as shown in the following papers.

  • Ulén and Olsson, "Simultaneous Fusion Moves for 3D-Label Stereo" (EMMCVPR 2013)
  • Liu et al., "Layered Scene Decomposition via the Occlusion-CRF" (CVPR 2016)

Apply to other problems than stereo

Our loca expansion move method is also useful for high-dimensional label estimation problems in optical flow and other dense correspondence problems. In the first paper below, we infer 4 DoF motion labels (similrity transformation) by using additional candidate label proposers. The second paper estimates 8 DoF motion labels (homography) using our method.

  • Taniai et al., "Joint Recovery of Dense Correspondence and Cosegmentation in Two Images" (CVPR 2016)
  • Hur and Roth, "MirrorFlow: Exploiting Symmetries in Joint Optical Flow and Occlusion Estimation" (ICCV 2017)

localexpstereo's People

Contributors

anna-szal avatar t-taniai avatar tackson avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

localexpstereo's Issues

How to get the first correct plane after Label Randomization?

Dear Professor,

I have also a question:
How can the algorithm know that the plane which is represented by the label is correct?
Because labels after the first correct one can be estimated by using Energy Function, but the first label has no Reference.

I have read your code, but I can not understand the part of it very well. OTZ

Thanks for your reply!
Wei Xue

How do you bulid a graph

Dear author:
The function of expansionMoveBK in FastGCStereo.h used the function of add_tweights and add_edge in Graph Class.In my opioin, the add_edge is the way to set a t_link to the graph and the add_tweights is the way to set a n_link to the graph.So it's make me feel confused that the code mixed two functions in function expansionMoveBK.The paper didn't explain how to build the graph,so i can't read the code. My specific question have written in the annotating codes below.

The part of source code of expansionMoveBK:
double expansionMoveBK(updateMask, Plane label1, region, proposalCosts, mode = 0):
~~
~~
for (int y = 0; y < region.height; y++){
for (int x = 0; x < region.width; x++){
int s = yregion.width + x; //I think it's building two links from node s to the source and the terminal.
graph.add_tweights(s, subCurrent.at(y, x), proposalCosts.at(y, x));
~~
~~
for (int y = 0; y < region.height; y++){
for (int x = 0; x < region.width; x++){
int s = y
region.width + x;
graph.add_tweights(s, subCurrent.at(y, x), proposalCosts.at(y, x));

			bool x0 = x == 0;
			bool x1 = x == region.width - 1;
			bool y0 = y == 0;
			bool y1 = y == region.height - 1;

			if (x0 || x1 || y0 || y1)
			{
				cv::Point ps = cv::Point(x, y) + region.tl();
				for (int k = 0; k < stereoEnergy->neighbors.size(); k++)
				{
					cv::Point pt = ps + stereoEnergy->neighbors[k];
					if (region.contains(pt))
						continue;
					if (imageDomain.contains(pt) == false)
						continue;

					// pt is always label0;**why the pt is always label0?**
					float _cost00 = stereoEnergy->computeSmoothnessTerm(currentLabeling.at<Plane>(ps), currentLabeling.at<Plane>(pt), ps, k, mode);
					float _cost10 = stereoEnergy->computeSmoothnessTerm(label1, currentLabeling.at<Plane>(pt), ps, k, mode);

					graph.add_tweights(s, _cost00, _cost10);// **i don't understand why we have to rebuild two t-links**  
				}
			}
		}
	}

	// ee <-> ge
	// ***
	// **@
	// ***
	for (int y = 0; y < region.height; y++){
		for (int x = 0; x < region.width - 1; x++){
			int i = y*region.width + x;
			int j = y*region.width + x + 1;
			float B = cost10[StereoEnergy::NB_GE].at<float>(y, x);
			float C = cost01[StereoEnergy::NB_GE].at<float>(y, x);
			float D = cost00[StereoEnergy::NB_GE].at<float>(y, x);
			graph.add_edge(i, j, std::max(0.f, B + C - D), 0); **// i think it's building the n_links ,but don't know why we set the value at  std::max(0.f, B + C - D)**
			graph.add_tweights(i, C, 0);**// the old question why we rebulid the t_links and don' know how to set the value.** 
			graph.add_tweights(j, D - C, 0);
		}
	}

~~
~~

Regarding Building the solution

what do we have to build in release mode I mean which files
as if I use main.cpp it says error as it is not getting LocalExpansionStereo.exe file

image

Different results with Middlebury results

Hi,
I tried to reobtain middlebury evaluation results but I got slightly different result than middlebury results. Did you fine tune the parameters or other things that not mentioned.
Edit : I used your costs provided in drive.
My Result
local_exp
Expected Result
local_exp2

the definition of edges variables

dear author:
there is still one question after i've read the paper <WHAT ENERGY FUNCTIONS CAN BE MINIMIZED VIA GRAPH CUTS?>.The paper tell us that how to decompose the edges.the paper has a table to show ABCD variables.the table 3 in paper showing :A=Eij(0,0)=V(fp,fq); B=Eij(0,1)=V(fp,α);C=Eij(1,0)=V(α,fq);D=Eij(1,1)=V(α,α).
but the code give me a different definition about the variable D.
the function computeSmoothnessTermsExpansion in class StereoEnergy have calculated the cost00,cost01 and cost10:

void computeSmoothnessTermsExpansion(const cv::Mat& labeling0_m, Plane label1, cv::Rect region, std::vectorcv::Mat& cost00, std::vectorcv::Mat& cost01, std::vectorcv::Mat& cost10, bool onlyForward = false, int mode = 0) const
{
cv::Rect rect_ee = cv::Rect(M + region.x, M + region.y, region.width, region.height);
cv::Mat label0_ee = labeling0_m(rect_ee);
cv::Mat coord_ee = coordinates_m(rect_ee);
cv::Scalar sc = label1.toScalar();
cv::Mat disp0_of_ee_at_ee = cvutils::channelDot(label0_ee, coord_ee);
cv::Mat disp1_at_ee = cvutils::channelSum(coord_ee.mul(sc));
//cv::Mat disp1_at_ee = label1.toDispMap(region); // This changes results due to small numerical differences.

	if (disp0_of_ee_at_ee.depth() != CV_32F){
		disp0_of_ee_at_ee.convertTo(disp0_of_ee_at_ee, CV_32F);
		disp1_at_ee.convertTo(disp1_at_ee, CV_32F);
	}
	cost00 = std::vector<cv::Mat>(neighbors.size());
	cost01 = std::vector<cv::Mat>(neighbors.size());
	cost10 = std::vector<cv::Mat>(neighbors.size());
	for (int i = 0; i < neighbors.size(); i++){
		if (onlyForward && (neighbors[i].y * width + neighbors[i].x <= 0))
			continue;
		cv::Rect rect_le = rect_ee + neighbors[i];
		cv::Mat label0_le = labeling0_m(rect_le);
		cv::Mat coord_le = coordinates_m(rect_le);
		std::cout<<label0_le.at<cv::Vec4f>(0,0)<<std::endl;
		cv::Mat disp0_of_le_at_ee = cvutils::channelDot(label0_le, coord_ee);
		cv::Mat disp0_of_ee_at_le = cvutils::channelDot(label0_ee, coord_le);
		cv::Mat disp0_of_le_at_le = cvutils::channelDot(label0_le, coord_le);
		cv::Mat disp1_at_le = cvutils::channelSum(coord_le.mul(sc));
		if (disp0_of_le_at_ee.depth() != CV_32F){
			disp0_of_le_at_ee.convertTo(disp0_of_le_at_ee, CV_32F);
			disp0_of_ee_at_le.convertTo(disp0_of_ee_at_le, CV_32F);
			disp0_of_le_at_le.convertTo(disp0_of_le_at_le, CV_32F);
			disp1_at_le.convertTo(disp1_at_le, CV_32F);
		}
		cv::Mat smoothnessCoeffL_nb = smoothnessCoeff[mode][i](rect_ee);
	**//cost00:|dp(fp)−dp(fq)| + |dq(fq)−dq(fp)|<=>V(fp,fq)=A**
		cost00[i] = cv::abs(disp0_of_ee_at_ee - disp0_of_le_at_ee) + cv::abs(disp0_of_ee_at_le - disp0_of_le_at_le);
		cv::threshold(cost00[i], cost00[i], params.th_smooth, 0, cv::THRESH_TRUNC);
		cost00[i] = cost00[i].mul(smoothnessCoeffL_nb, params.lambda);
        **//cost01:|dp(fp)−dp(fα)| + |dq(fp)−dq(fα)|<=>V(fp,α)=B**
		cost01[i] = cv::abs(disp0_of_ee_at_ee - disp1_at_ee) + cv::abs(disp0_of_ee_at_le - disp1_at_le);
		cv::threshold(cost01[i], cost01[i], params.th_smooth, 0, cv::THRESH_TRUNC);
		cost01[i] = cost01[i].mul(smoothnessCoeffL_nb, params.lambda);
      **//cost10:|dp(fα)−dp(fq)| + |dq(fq)−dq(fα)|<=>V(fq,α)=C**
		cost10[i] = cv::abs(disp1_at_ee - disp0_of_le_at_ee) + cv::abs(disp1_at_le - disp0_of_le_at_le);
		cv::threshold(cost10[i], cost10[i], params.th_smooth, 0, cv::THRESH_TRUNC);
		cost10[i] = cost10[i].mul(smoothnessCoeffL_nb, params.lambda);
	}

The definition of the cost00 means A in table,cost01 means B,cost10 means C in table.but the code use cost00 as D in table as following in function expandtionMoveBK :
// ee <-> gg
// ***
// ***
// **@
for (int y = 0; y < region.height - 1; y++){
for (int x = 0; x < region.width - 1; x++){
int i = y*region.width + x;
int j = (y + 1)*region.width + x + 1;
float B = cost10[StereoEnergy::NB_GG].at(y, x);
float C = cost01[StereoEnergy::NB_GG].at(y, x);
float D = cost00[StereoEnergy::NB_GG].at(y, x);
graph.add_edge(i, j, std::max(0.f, B + C - D), 0);
graph.add_tweights(i, C, 0);
graph.add_tweights(j, D - C, 0);
}
}
}
The variable D should connect with α :D=V(α,α),while the D in code is uncorrelated with α. the cost01 and cost10 are comply with paper.could you please explain the definition of D in code?

Is there a little bug in Plane.h?

Plane(float a, float b, float c, float y) : a(a), b(b), c(c), v(v){}

Above is the code in Plane.h line 12.
Maybe the float y should be float v.

what's the mean of ‘ee'? such as: "cv::Mat IL_ee = I_m(rect_ee);"

Dear Professor,
I am a Undergraduate, i cannot understand as follows (in file ’StereoEnergy.h'):

1.Which words are called 'ee' for short.
So that I cannot understand these objects: disp0_of_le_at_ee, disp0_of_ee_at_le, disp0_of_le_at_le ...

2.why the object 'cost' be computed at 4 times as 'cost00', 'cost01', 'cost10' and ’cost11‘ ?

Thanks for your times!
Wei Xue

Assertion Failed when run on 'cones' data

Hi, When I compiled and run in debug mode, it give the following error. what might be the reason? -thanks,

----------- parameter settings -----------
mode : MiddV2
outputDir : ../results
targetDir : ../data/MiddV2/cones
threadNum : -1
doDual : 0
pmIterations : 2
iterations : 5
ndisp : 100
filterRadious : 20
smooth_weight : 1.000000
mc_threshold : 0.500000

Running by Middlebury V2 mode.
ndisp = 100
0 0.0 14625435998 14625397810 38188 98.62 98.63
OpenCV Error: Assertion failed (elemSize() == (((((DataType<_Tp>::type) & ((512 - 1) << 3)) >> 3) + 1) << ((((sizeof(size_t)/4+1)*16384|0x3a50) >> ((DataType<_Tp>::type) & ((1 << 3) - 1))*2) & 3))) in cv::Mat::at, file c:\users\shufei.fan\workspace\projects\localexpstereo\packages\opencvcontrib.3.1.0\build\native\include\opencv2\core\mat.inl.hpp, line 962
Op

Evaluation failure

Hi

I was giving this project a go and tried evaluating it against the Bicycle2 sample. I'm using the matching costs provided in the Adirondack link. However, the output I get is the following:

Screen Shot 2020-05-25 at 3 32 39 am

I'm already using the corresponding calib.txt:

cam0=[1948.17 0 532.418; 0 1948.17 488.228; 0 0 1]
cam1=[1948.17 0 614.35; 0 1948.17 488.228; 0 0 1]
doffs=81.931
baseline=173.557
width=1426
height=976
ndisp=125

The options I use are:

"%bin%" -targetDir "%datasetroot%\test2" -outputDir "%resultsroot%\test2" -mode MiddV3 -smooth_weight 0.5

This is the debug output:

Time	Eng	Data	Smooth	all	nonocc
0.000000	-nan(ind)	-nan(ind)	138868.715077	-nan(ind)	99.667810
34.431000	-nan(ind)	-nan(ind)	211043.893761	-nan(ind)	99.978780
68.153000	-nan(ind)	-nan(ind)	268027.314525	-nan(ind)	99.978780
129.158000	-nan(ind)	-nan(ind)	3541.673733	-nan(ind)	99.913212
189.342000	-nan(ind)	-nan(ind)	4004.365720	-nan(ind)	90.420596
250.601000	-nan(ind)	-nan(ind)	4623.280927	-nan(ind)	99.924280
311.898000	-nan(ind)	-nan(ind)	2288.273245	-nan(ind)	100.000000
373.796000	-nan(ind)	-nan(ind)	1892.510750	-nan(ind)	100.000000

Adirondack works fine though. I haven't tried every sample, but I get similar results from Classroom2 and Jadeplant as well. Does anyone know why this might be the case? Am I missing something?

Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.