Giter Club home page Giter Club logo

dkm's Introduction

DKM: Dense Kernelized Feature Matching for Geometry Estimation


DKM: Dense Kernelized Feature Matching for Geometry Estimation
Johan Edstedt, Ioannis Athanasiadis, Mårten Wadenbäck, Michael Felsberg
CVPR 2023

WARNING: DKM is trained on a specific resolution, and is sensitive to the image resolution used. This means that feeding images of a different resolution (higher or lower), may give significantly worse performance. If you're finding DKM to give poor performance, please contact me, it's probably something to do with the input.

How to Use?

Our model produces a dense (for all pixels) warp and certainty.

Warp: [B,H,W,4] for all images in batch of size B, for each pixel HxW, we ouput the input and matching coordinate in the normalized grids [-1,1]x[-1,1].

Certainty: [B,H,W] a number in each pixel indicating the matchability of the pixel.

See demo for two demos of DKM.

See api.md for API.

Qualitative Results

mount_rushmore.mp4
milan_cathedral.mp4
piazza_san_marco.mp4
tower_of_london.mp4

Benchmark Results

Megadepth1500

@5 @10 @20
DKMv1 54.5 70.7 82.3
DKMv2 56.8 72.3 83.2
DKMv3 (paper) 60.5 74.9 85.1
DKMv3 (this repo) 60.0 74.6 84.9

Megadepth 8 Scenes

@5 @10 @20
DKMv3 (paper) 60.5 74.5 84.2
DKMv3 (this repo) 60.4 74.6 84.3

ScanNet1500

@5 @10 @20
DKMv1 24.8 44.4 61.9
DKMv2 28.2 49.2 66.6
DKMv3 (paper) 29.4 50.7 68.3
DKMv3 (this repo) 29.8 50.8 68.3

Navigating the Code

Install

Run pip install -e .

Demo

A demonstration of our method can be run by:

python demo_match.py

This runs our model trained on mega on two images taken from Sacre Coeur.

Benchmarks

See Benchmarks for details.

Training

See Training for details.

Reproducing Results

Given that the required benchmark or training dataset has been downloaded and unpacked, results can be reproduced by running the experiments in the experiments folder.

Using DKM matches for estimation

We recommend using the excellent Graph-Cut RANSAC algorithm: https://github.com/danini/graph-cut-ransac

@5 @10 @20
DKMv3 (RANSAC) 60.5 74.9 85.1
DKMv3 (GC-RANSAC) 65.5 78.0 86.7

Acknowledgements

We have used code and been inspired by https://github.com/PruneTruong/DenseMatching, https://github.com/zju3dv/LoFTR, and https://github.com/GrumpyZhou/patch2pix. We additionally thank the authors of ECO-TR for providing their benchmark.

BibTeX

If you find our models useful, please consider citing our paper!

@inproceedings{edstedt2023dkm,
title={{DKM}: Dense Kernelized Feature Matching for Geometry Estimation},
author={Edstedt, Johan and Athanasiadis, Ioannis and Wadenbäck, Mårten and Felsberg, Michael},
booktitle={IEEE Conference on Computer Vision and Pattern Recognition},
year={2023}
}

dkm's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dkm's Issues

About previous sota pdc-net+

Why does PDC-net perform reasonably well in pck but much worse in two-view geometry estimation?
pdc-net pck:
image
image
image
Is it because the confidence it predicts is learned self-supervised? Also is it fair to compare DKM with pdc-net without retraining it fully supervised?
If I understand it wrong, please point it out, Thank you! : )

Match Score

Hey!
First of all great work, I was wondering if we can get a match score of 2 images from this model?

Whether the inference speed can be optimized

Great job! I do inference on 3090, it takes about 0.7 seconds to calculate the time of two images, and what other operations can be used for inference acceleration, in addition to reducing the image resolution.

Logics in function "upsample_preds" hard to understand

Thanks for your amazing work!

While trying to utilize this amazing work, I can not understand on implementation in the code.

In funciton "upsample_preds" (used only in function match), the final flow is further refined using "self.conv_refiner", however, the estimated residual flow is not added over the original final flow. Instead, a re-sampling is executed.

Related Codes:

query_coords = torch.meshgrid((
        torch.linspace(-1 + 1 / h, 1 - 1 / h, h, device="cuda:0"),
        torch.linspace(-1 + 1 / w, 1 - 1 / w, w, device="cuda:0"),
    ))
query_coords = torch.stack((query_coords[1], query_coords[0]))
query_coords = query_coords[None].expand(b, 2, h, w)
delta_certainty, delta_flow = self.conv_refiner["1"](query, support, flow)
displaced_query_coords = torch.stack((
        query_coords[:, 0] + delta_flow[:, 0] / (4 * w),
        query_coords[:, 1] + delta_flow[:, 1] / (4 * h),
    ),dim=1,)
flow = F.grid_sample(flow, displaced_query_coords.permute(0, 2, 3, 1))

Traning Code

Congratulations on the great work!
Do you have plans for releasing the training code ? I'm thinking on fine-tuning your weights for more specific tasks.

Questions about the key points on Megadepth test images

Hi,

Thanks for providing such elegant code. I ran the megadeath test script and saved the matches in files. I found the key points' locations are shifted from the input images. Do you have any ideas to solve it?

image

Best,
Yongqing

about image resolution

Hi! In Table 8 of the original paper, do you keep the test image resolution the same as the training image resolution? For example, when training on 384x512 image pairs, do you also resize all the test image pairs to 384x512 for testing? Actually I am following roma. But I found this dense pipeline is a bit sensitive to the resolution setting. So I want to find a way to make the method generalize well to different resolutions. :)

Unexpected Match Point with Transparent Background

Thank you for your outstanding work!!!
I'm still working on my final project, where I'm trying to estimate the rotation angle between two identical objects. Initially, I tried using the SIFT algorithm, but the results were terrible. Therefore, I'm exploring deep learning methods to calculate the angles.
However, I'm encountering an issue that I hope you can help me with. I'm experiencing an issue where I get a match point regardless of the background color I set. The original image on the left has no background, but it seems like the algorithm is automatically adding a background. I'd like to know if there's a parameter to avoid this behavior.
Steps to Reproduce:

  1. Load the original image with a transparent background, I tried with opencv and PIL.
  2. Set the background color to any value (e.g., white, black, etc.)
  3. Run the algorithm with this error

Thanks
Qidi
111

when demo dkmV3 release ?

Hi, your great work is really amazing ! Can't wait for full release demo of DKMv3 . When do you intend to do that ?

How to use GC-RANSAC for pose estimation?

I noticed that you use GC-RANSAC for pose estimation. I tried to look into the GC-RANSAC repository but couldn't get it to work. I was hoping to take a look at your code and inquire about which version of GC-RANSAC you are using?

About testing results

Hi Johan,

When I train about 160k steps, I got the checkpoint for testing, and then obtained different testing results. For example, 1st testing result: 58.345 (auc5), 2nd testing result: 59.449, 3th testing result: 59.853.
I think this is caused by the randomness of sampling. Will this randomness be alleviated as the number of iterations increases?

Thank you so much for your help!

np.random.choice for good_samples

Hi Johan, thanks for your excellent work!

I wonder why np.random.choice is used for selecting good_samples instead of ordering the confidences. Is it for selecting more sparse keypoints?

what does low_res_certainty do in dkm.py?

Great work!
I wonder what is the effect of
"
low_res_certainty = factorlow_res_certainty(low_res_certainty < cert_clamp)
...
dense_certainty = dense_certainty - low_res_certainty
" in dkm/models/dkm.py?
Also, shouldn't this
"
query_coords = torch.meshgrid(
(
torch.linspace(-1 + 1 / hs, 1 - 1 / hs, hs, device=device),
torch.linspace(-1 + 1 / ws, 1 - 1 / ws, ws, device=device),
)
)
" be
"
query_coords = torch.meshgrid(
(
torch.linspace(-1 + 1 /( 2hs), 1 - 1 /(2 hs), hs, device=device),
torch.linspace(-1 + 1 /( 2* w)s, 1 - 1 /( 2* ws), ws, device=device),
)
)
"?
Thanks!

question about internal resolution and homography estimation

Hello author, thank you for your outstanding work! My input image resolution is (3648,5742), and getting the homography matrix is not ideal, I tried to change the resolution in the "int.py " file, then setting it to "return DKMv3(weights, 3648, 5472, upsample_preds = False, device= device)", which still fails to approach the true geometric transformation relationship. How can I improve it?

Question about Coordinate Embeddings implementations

Hi Johan,

First of all, I want to express my gratitude for your recent contributions to the field, particularly your papers DKM and RoMa. They have been a great source of inspiration and motivation for my own research endeavors.

However, while reviewing the accompanying code implementation, I noticed some discrepancies regarding the Coordinate Embedding with random Fourier features because of the default behaviour of pytorch.

Issue Details

In the paper, it's stated that the entries for Fourier basis frequencies are sampled from a normal distribution. It is implemented using a Conv2d layer in pytorch. However, I found that the default initialization for Convolutional weight is Kaiming uniform.

In addition, I found that the Fourier basis frequencies are part of trainable parameters instead of fixed sampling in a previous work

I've visualized the histogram of the pretrained pos_conv weights
image

with the following code:

import torch
from torch import nn

from dkm import DKMv3_outdoor
from roma import roma_outdoor

import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('darkgrid')

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
dkm_weight=DKMv3_outdoor(device=device).decoder.gps['16'].pos_conv.weight.detach().cpu().numpy()
roma_weight=roma_outdoor(device=device, coarse_res=560, upsample_res=(864, 1152)).decoder.gps['16'].pos_conv.weight.detach().cpu().numpy()
random_weight=nn.Conv2d(2, 256, kernel_size=(1, 1), stride=(1, 1)).weight.detach().cpu().numpy()

plt.figure(figsize=(8,5))
plt.hist(random_weight.flatten(), bins=25, alpha=0.5,density=True, label='Init',range=(-1.5,1.5))
plt.hist(roma_weight.flatten(), bins=25, alpha=0.5,density=True, label='ROMA',range=(-1.5,1.5))
plt.hist(dkm_weight.flatten(), bins=25, alpha=0.5,density=True, label='DKM',range=(-1.5,1.5))
plt.legend()
plt.title('Histogram of the weights pos_conv')

The resulting distribution appears more Gaussian, possibly due to the weight decay in the AdamW optimizer, which assumes a normal prior for parameters.

Questions

  1. Would it make a difference to use different distribution for initialization?
  2. Is it necessary to train the Fourier basis instead of fixing it?

Based on insights from a previous work, I suspect that the initialization distribution might not matter greatly, with only the standard deviation being crucial. However, I'm curious about whether training could lead to better Fourier basis frequencies. I would greatly appreciate it if you could further investigate these aspects.

Thank you once again for your exceptional contributions to the field.

Questions about global matcher

Hi Johan,
Thanks for your great contribution,

I noticed that you used Gaussian processes to encode feature maps in the global matcher. We find this approach very novel and completely different from the global 4D-correlation volume used in previous methods.

We wondered what motivates you to use Gaussian processes to model this, and why is the Gaussian process suitable for solving this warp mapping prediction problem?

Best wishes,
Weiguang Zhang

Error when using option `do_pred_in_og_res=True`

When using

dense_matches, dense_certainty = model.match(img1PIL, img2PIL, check_cycle_consistency=False, do_pred_in_og_res=True)

I got:

    738             if do_pred_in_og_res:  # Will assume that there is no batching going on.
    739                 og_query, og_support = self.og_transforms((im1, im2))
--> 740                 query_to_support, dense_certainty = self.matcher.upsample_preds(
    741                     query_to_support,
    742                     dense_certainty,

/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py in __getattr__(self, name)
   1129                 return modules[name]
   1130         raise AttributeError("'{}' object has no attribute '{}'".format(
-> 1131             type(self).__name__, name))
   1132 
   1133     def __setattr__(self, name: str, value: Union[Tensor, 'Module']) -> None:

AttributeError: 'RegressionMatcher' object has no attribute 'matcher'

I belive you need to rename the variable self.matcher to self.decoder in line 740 of dkm.py (click here to see)

With Synthetic Dataset Training Codes

Hi!

While trying to rerpduce the results using Mega + Synthetic Dataset using "train_mega_synthetic.py", I notice that in the training code, model is set to be DKM (version 1), in Line 31-33. Does this indicate the training scripts "train_mega_synthetic.py" is designed for DKM (v1). Or the scripts are suitable for both versions?

Thanks!

About the pretrained model with resnet18

Hi Johan, I really appreciate your great work.you have provided the pretrained model with resnet50.I noticed from the code that you also have a model for resnet18,Is it possible to provide the pretrained model with resnet18? Thank you very much!

About the pretrained model

Hi Johan, I really appreciate your great work. Is it possible to provide the pretrained model here? Thank you very much!

Inference time

Thanks for your great work and impressive results!
Since your results are so impressive, it is straightforward to consider combing your work as part of the downstream task, like visual odometry. So I am curious about the inference time for DKM compared to other methods, like LoftR or optical flow method RAFT. Have you done any experiments like this? I haven't seen any report about inference time in paper, so it will be so nice for you to help.

question about loading the training images

Thanks for your excellent work.

I noticed a confusing detail. When you load images from Megadepth dataset, you resize the image to a fixed ht and wt, that will change the original aspect ratio(some images may have been taken vertically). I'm wondering why you do not maintain the aspect ratio and pad the image to the specified size, which is a common practice in other computer vision tasks. Is it because the padded areas significantly interfere with the estimation of the warp? If so, would masking out the warp generated by the padded areas be a good solution?

Thanks again!

3d point projection, best way to fetch matches

Hi,
I have known camera extrinsics/intrinsics and I like to 3d project the pixels of 2 images matched by DKM. What would be a good point to get the matches between the images?
Thanks!
Daniel

A small detail question about dense-flow

The work is very exciting! I ask about the scaling process in the following code. Why divide by 4 here? Isn't ins * diaplacement[] supposed to restore to the original image, and /w is to normalize the result?
dense_flow = torch.stack(
(
dense_flow[:, 0] + ins * displacement[:, 0] / (4 * w),
dense_flow[:, 1] + ins * displacement[:, 1] / (4 * h),
),
dim=1,
)

Keep aspect ratio of images?

I noticed that DKM resizes the image to a fixed size and aspect ratio (horizontal). I think preserving the original aspect ratio of the images could improve performance, especially for extreme image pairs like this one:

BARCELONA -VUNAV-1024x682 2 BARCELONA -VUNAV-1024x682 3

There is any tweak (code modification) to allow this into the current DKM weights?

purpose of conf_thresh parameter in sample function is confusing

Hello. Not completely sure if this is an issue, but I am a bit confused about what the conf_thresh parameter should be used for. I'm referring to this line in dkm.py. Shouldn't this line come after the filtering with relative_confidence_threshold? Is it correct to say that conf_thresh filters out outliers and the inliers are given the same probability?

DKM:sample

In DKM's sample method, do you intend to set any certainty greater than 0.05 to 1, instead of, perhaps, setting any certainty less than 0.05 to 0?

dense_certainty[dense_certainty > upper_thresh] = 1

I ask, because you then go on to sample from the matches based on that certainty which can result in points with certainty less than 0.05 being sampled. In fact, if all the matches have certainty below 0.05, then expansion_factor*num matches would still be selected.

Pretrained weights licensing

Dear @Parskatt, dear authors, thank you for this repo! Great work!
Could you please clarify on the license of the provided DKMv3 model's weights for indoor and outdoor? Were they trained on datasets that imply open license?
Thank you in advance!

e_R reaches 180°

I noticed that the estimation error of the rotation angle of the model for quite a lot images reached a maximum of 180°. Do you have any idea what might be the reason for this?
55

scannet test code

Hi,

Thank you for sharing your work. It's been helpful for my research.
However, I'm quite new to the dense matching method and have some simple questions regarding the scannet data test code.

I noticed that the error computing process is iterated five times. Is it a normal metric in other dense matching tasks like loftr or PDC-Net+?
https://github.com/Parskatt/DKM/blob/main/dkm/benchmarks/scannet_benchmark.py#L98

for _ in range(5):
    shuffling = np.random.permutation(np.arange(len(kpts1)))
    ...

Additionally, I saw another line that adds errors after the loop. This might include the same error twice.
https://github.com/Parskatt/DKM/blob/main/dkm/benchmarks/scannet_benchmark.py#L120

for _ in range(5):
    ...  
    tot_e_t.append(e_t)
    tot_e_R.append(e_R)
    tot_e_pose.append(e_pose)
tot_e_t.append(e_t)
tot_e_R.append(e_R)
tot_e_pose.append(e_pose)

Thanks in advance.

torch.linalg.inv

Thanks for your excellent work! May I ask for one possible solution for the problem shown as below? Thank you so much!

Traceback (most recent call last):
File "experiments/dkm/train_DKMv3_outdoor.py", line 259, in
train(args)
File "experiments/dkm/train_DKMv3_outdoor.py", line 250, in train
wandb.log(megadense_benchmark.benchmark(model))
File "/mnt/data-disk-1/home/cpii.local/wtwang/IM/codes/DKM/dkm/benchmarks/megadepth_dense_benchmark.py", line 72, in benchmark
matches, certainty = model.match(im1, im2, batched=True)
File "/mnt/data-disk-1/home/cpii.local/wtwang/IM/codes/DKM/dkm/models/dkm.py", line 695, in match
dense_corresps = self.forward(batch, batched = True)
File "/mnt/data-disk-1/home/cpii.local/wtwang/IM/codes/DKM/dkm/models/dkm.py", line 631, in forward
dense_corresps = self.decoder(f_q_pyramid, f_s_pyramid)
File "/mnt/data-disk-1/home/cpii.local/wtwang/miniconda3/envs/im/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/mnt/data-disk-1/home/cpii.local/wtwang/IM/codes/DKM/dkm/models/dkm.py", line 494, in forward
new_stuff = self.gps[new_scale](f1_s, f2_s, dense_flow=dense_flow)
File "/mnt/data-disk-1/home/cpii.local/wtwang/miniconda3/envs/im/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/mnt/data-disk-1/home/cpii.local/wtwang/IM/codes/DKM/dkm/models/dkm.py", line 360, in forward
K_yy_inv = torch.linalg.inv(K_yy + sigma_noise)
torch._C._LinAlgError: linalg.inv: (Batch element 0): The diagonal element 512 is zero, the inversion could not be completed because the input matrix is singular.

Training on Kitti

Hi Johan,

Thank you for your wonderful work.

I am attempting to fine-tune the outdoor weight using the KITTI dataset. I followed the code from train_DKMv3_outdoor.py and made modifications to the MegadepthBuilder and MegadepthDenseBenchmark for the KITTI odometry dataset. After training for only 1,000 steps, I observed that the AUC5 result dropped from 0.7 to 0.3. Could there be an issue with my modifications, or are there specific training details I should pay attention to (such as resolution, depth, etc.)?

Thank you for your assistance!

Inference in batched mode: all values in certainty is zeros

Hi, i have this problem:

image1.shape, image2.shape : 3, 256, 256

dkm_model = DKMv3_outdoor(device=device)
dkm_model.w_resized = 256
dkm_model.h_resized = 256
dkm_model.upsample_preds = False
warp, certainty = dkm_model.match(image1, image2, batched=True, device=device)

then all values in 'certainty' tensor is zeros (torch.sum(certainty) is 0.0), so this line code "matches, certainty = dkm_model.sample(warp, certainty)" will cause error:
603 expansion_factor = 4 if "balanced" in self.sample_mode else 1
--> 604 good_samples = torch.multinomial(certainty,
605 num_samples = min(expansion_factor*num, len(certainty)),
606 replacement=False)
607 good_matches, good_certainty = matches[good_samples], certainty[good_samples]
608 if "balanced" not in self.sample_mode:

RuntimeError: invalid multinomial distribution (sum of probabilities <= 0)

The issues encountered during training the network.

It's a great work!!! But I met some problem, when I ran your training code, the weights named "train_DKMv3_outdoor_latest.pth" was 852MB. However, the model you provided with the training code is only 258MB. I can successfully run the provided model, but I encounter errors when using my own trained model. Where is the problem and how can I obtain a trained model of the same size that is able to run properly like the one you provided?

    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for RegressionMatcher:
Missing key(s) in state_dict: "encoder.net.conv1.weight", "encoder.net.bn1.weight", "encoder.net.bn1.bias", "encoder.net.bn1.running_mean", "encoder.net.bn1.running_var", "encoder.net.layer1.0.conv1.weight",
encoder.net.layer1.0.bn1.weight", "encoder.net.layer1.0.bn1.bias", "encoder.net.layer1.0.bn1.running_mean", "encoder.net.layer1.0.bn1.running_var", "encoder.net.layer1.0.conv2.weight", "encoder.net.layer1.0.bn2.weight", 
......

.conv_refiner.1.disp_emb.weight", "decoder.conv_refiner.1.disp_emb.bias". 
	Unexpected key(s) in state_dict: "model", "n", "optimizer", "lr_scheduler". 

the structure and information of 'warp'

I am performing an image matching test on demo_match.py. What is the structure of the warp obtained in this code ‘warp, certainty = dkm_model.match(im1_path, im2_path, device=device)’, and what is the information stored in it?I want to get the pixel matching relationship of two pictures through ‘warp’, how should I do it?Thank you very much!

synthetic dataset

Can you post that part of the code that works on the synthetic dataset? I want to try it on cross-spectrum

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.