Giter Club home page Giter Club logo

mickey's Introduction

Matching 2D Images in 3D: Metric Relative Pose from Metric Correspondences

Axel Barroso-Laguna · Sowmya Munukutla · Victor Adrian Prisacariu · Eric Brachmann

CVPR 2024 (Oral)

This is the reference implementation of the paper "Matching 2D Images in 3D: Metric Relative Pose from Metric Correspondences" presented at CVPR 2024.

The paper introduces Metric Keypoints (MicKey), a feature detection pipeline that regresses keypoint positions in camera space. MicKey presents a differentiable approach to establish metric correspondences via descriptor matching. From the metric correspondences, MicKey recovers metric relative poses. MicKey is trained in an end-to-end fashion using differentiable pose optimization and requires only image pairs and their ground truth relative poses for supervision.

teaser

Setup

Assuming a fresh Anaconda distribution, you can install dependencies with:

conda env create -f resources/environment.yml
conda activate mickey

We ran our experiments with PyTorch 2.0.1, CUDA 11.6, Python 3.8.17 and Debian GNU/Linux 11.

Evaluating MicKey

MicKey aims at addressing the problem of instant Augmented Reality (AR) introduced in the Map-free benchmark. In the Map-free set up, instead of building 3D maps from hundreds of images and scale calibrations, they propose to use only one photo of a scene as the map. The Map-free benchmark then evaluates how accurate is the estimated metric relative pose between the reference image (the map) and the query image (the user).

Download Map-free dataset

You can find the Map-free dataset in their project page. Extract the test.zip file into data/mapfree. Optionally, if you want to train MicKey, also download train and val zip files.

Pre-trained Models

We provide two MicKey models.

  • mickey.ckpt: These are the default weights for MicKey, without using the overlapping scores provides in Map-free dataset and following the curriculum learning strategy described in the paper.
  • mickey_sc.ckpt: These are the weights when training MicKey using the min and max overlapping scores defined in Map-free.

Extract mickey_weights.zip into weights/. In the zip file, we also provide the default configuration needed to run the evaluation.

Run the submission script

Similar to Map-free code base, we provide a submission script to generate submission files:

python submission.py --config path/to/config --checkpoint path/to/checkpoint --o results/your_method

The resulting file results/your_method/submission.zip can be uploaded to the Map-free online benchmark website and compared against existing methods in the leaderboard.

Run the local evaluation

The Map-free benchmark does not provide ground-truth poses for the test set. But we can still evaluate our method locally on the validation set.

python submission.py --config path/to/config --checkpoint path/to/checkpoint --o results/your_method --split val

and evaluate it as:

python -m benchmark.mapfree --submission_path results/your_method/submission.zip --split val

Download MicKey correspondences and depth files

We provide the depth maps and correspondences computed by MicKey.

Refer to the Map-free benchmark to learn how to load precomputed correspondes and depth maps in their feature matching pipeline.

Running MicKey in custom images

We provide a demo script to run the relative pose estimation pipeline on custom image pairs. As an example, we store in data/toy_example two images with their respective intrinsics. The script computes their metric relative pose and saves the corresponding depth and keypoint score maps in the image folder. Run the demo script as:

python demo_inference.py --im_path_ref data/toy_example/im0.jpg \
                         --im_path_dst data/toy_example/im1.jpg \
                         --intrinsics data/toy_example/intrinsics.txt \
                         --checkpoint path/to/checkpoint \
                         --config path/to/config

To generate the 3D assets as in MicKey's webpage, you can turn on the --generate_3D_vis flag. This will generate a rendered image with the input images, their computed 3D camera positions, and the set of 3D point inliers.

Training MicKey

Besides the test scripts, we also provide the training code to train MicKey.

We provide two default configurations in config/MicKey/:

  • curriculum_learning.yaml: This configuration follows the curriculum learning approach detailed in the paper. It hence does not use any image overlapping information but only relative ground truth poses during training.
  • overlap_score.yaml: This configuration relies on the image overlapping information to only choose solvable image pairs during training.

To train MicKey default model, use:

python train.py --config config/MicKey/curriculum_learning.yaml \
                --dataset_config config/datasets/mapfree.yaml \
                --experiment experiment_name \
                --path_weights path/to/checkpoint/folder

Resume training from a checkpoint by adding --resume {path_to_checkpoint}.

The top models, according to the validation loss, the VCRE metric, and the pose AUC score, are saved during training. Tensorboard results and checkpoints are saved into the folder dir/to/weights/experiment_name.

Note that by default, the configuration is set to use 4 GPUs. You can reduce the expected number of GPUs in the config file (e.g., NUM_GPUS: 1).

Changelog

  • 13 August 2024: Added visualization code.
  • 7 June 2024: Added precomputed depth maps and keypoint correspondences.

BibTeX

If you use this code in your research, please consider citing our paper:

@inproceedings{barroso2024mickey,
  title={Matching 2D Images in 3D: Metric Relative Pose from Metric Correspondences},
  author={Barroso-Laguna, Axel and Munukutla, Sowmya and Prisacariu, Victor and Brachmann, Eric},
  booktitle={CVPR},
  year={2024}
}

License

Copyright © Niantic, Inc. 2024. Patent Pending. All rights reserved. This code is for non-commercial use. Please see the license file for terms.

Acknowledgements

We use parts of code from different repositories. We thank the authors and maintainers of the following repositories.

mickey's People

Contributors

axelbarroso avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mickey's Issues

cross-domain image matching

Thanks for your work!
I want to know if this work can do cross-domain image matching? For example, visible light images and infrared images?

How to perform image pair matching

Thank you for your work. I would like to ask if the input is an image pair, one as a reference image and the other as a source image, can the source image be converted to the perspective of the reference image through the Mickey model? Similar to solving the transformation matrix between the two? Please tell me how to implement it in the code, because the demo seems to be more inclined to output the depth map and the confidence score map, but it does not give me the conversion result. Looking forward to your reply!

Errors in multi-gpu training

When I ran multi-gpu training of Mikey using 4*3090, I met the following errors. I never meet such problems when using one GPU. It seems that something wrong with the JPEG images, but the map-free datasets were downloaded without any processing.

./train.sh: line 1: 23 Killed python3 train.py
[rank: 3] Child process with PID 27 terminated with code -9. Forcefully terminating all other processes to avoid zombies 🧟
RuntimeError: DataLoader worker (pid 2655) is killed by signal: Killed.
_error_if_any_worker_fails()
File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/_utils/signal_handling.py", line 66, in handler
transform = torch.eye(3)
File "/opt/data/private/zhanghaolin_project/local_feature/mickey-main/lib/datasets/utils.py", line 92, in correct_intrinsic_scale
K = correct_intrinsic_scale(K, resize[0] / W, resize[1] / H)
File "/opt/data/private/zhanghaolin_project/local_feature/mickey-main/lib/datasets/mapfree.py", line 47, in read_intrinsics
self.K, self.K_ori = self.read_intrinsics(self.scene_root, resize)
File "/opt/data/private/zhanghaolin_project/local_feature/mickey-main/lib/datasets/mapfree.py", line 26, in init
MapFreeScene(
File "/opt/data/private/zhanghaolin_project/local_feature/mickey-main/lib/datasets/mapfree.py", line 191, in
data_srcs = [
File "/opt/data/private/zhanghaolin_project/local_feature/mickey-main/lib/datasets/mapfree.py", line 190, in init
dataset = self.dataset_type(self.cfg, 'val')
File "/opt/data/private/zhanghaolin_project/local_feature/mickey-main/lib/datasets/datamodules.py", line 107, in val_dataloader
return fn(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/trainer/call.py", line 179, in _call_lightning_datamodule_hook
return call._call_lightning_datamodule_hook(self.instance.trainer, self.name)
File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/trainer/connectors/data_connector.py", line 309, in dataloader
return data_source.dataloader()
File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/trainer/connectors/data_connector.py", line 342, in _request_dataloader
dataloaders = _request_dataloader(source)
File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/loops/evaluation_loop.py", line 166, in setup_data
self.epoch_loop.val_loop.setup_data()
File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/loops/fit_loop.py", line 324, in on_run_start
self.on_run_start()
File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/loops/fit_loop.py", line 201, in run
self.fit_loop.run()
File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/trainer/trainer.py", line 1033, in _run_stage
results = self._run_stage()
File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/trainer/trainer.py", line 987, in _run
self._run(model, ckpt_path=ckpt_path)
File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/trainer/trainer.py", line 580, in _fit_impl
return function(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 105, in launch
return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/trainer/call.py", line 43, in _call_and_handle_interrupt
call._call_and_handle_interrupt(
File "/usr/local/lib/python3.8/dist-packages/pytorch_lightning/trainer/trainer.py", line 544, in fit
trainer.fit(model, datamodule_end, ckpt_path=ckpt_path)
File "/opt/data/private/zhanghaolin_project/local_feature/mickey-main/train.py", line 89, in train_model
train_model(args)
File "/opt/data/private/zhanghaolin_project/local_feature/mickey-main/train.py", line 99, in
Traceback (most recent call last):
Premature end of JPEG file
Premature end of JPEG file
Premature end of JPEG file
Premature end of JPEG file
Premature end of JPEG file
Premature end of JPEG file
Premature end of JPEG file
Premature end of JPEG file
Training with 0.00/1.00 image overlap

Could you give me any instructions?

Thanks for your time!

Haolin

Training cost

Hi,

Could you please estimate the rough time and number of GPUs used to train your models?

KeyError in backward_step Method Due to Missing depth0 in batch Dictionary

I have found a issue with the code, here is the line:

elif batch['depth0'].requires_grad:

There is a condition that checks if batch['depth0'].requires_grad before performing a backward pass. However, the batch dictionary does not have a key depth0, which can cause an issue.
Here's the problematic code segment:

elif batch['depth0'].requires_grad:
    torch.autograd.backward((torch.log(batch['final_scores'] + 1e-16),
                             batch['depth_kp0'], batch['depth_kp1']),
                            (probs_grad[0], outputs['depth0'].grad, outputs['depth1'].grad))

Is this step necessary? The default training implementation does not execute this segment of code ? Please let me know if I have mistaken

Basic question on resizing images

I'm a beginner with ML models for perception, so apologies if this question is basic.

In demo_inference.py there is a way to read images, which are resized to a default size of (540, 720). Then, there is a read_intrinsics function with a default parameter resize.

My question is: If input images do not have a size of (540, 720), does the resizing for the intrinsics need to be applied too? If not, when would one use it?

Run on images from different cameras?

Hi, can this code be used to calculate the extrinsics between two images from different camera types?

Eg, a DSLR and a GoPro, for example.

Or does it assume the same camera and lens for an image pair?

Thanks!

visualization

hello, Could you provide visuliaztion code generating Fig6 in the paper

Predicting X and Y coordinates directly.

Hi,

Really exciting work!

I had a quick question - did you consider directly predicting the X and Y coordinates in 3D instead of the offsets and using the intrinsics to project them?

Question about the evaluated results

Dear authors,

Thanks for your excellent work!

After running the evaluation with your released checkpoint (mickey.ckpt), I got the following results:
{
"Average Median Translation Error": 1.9044957215846383,
"Average Median Rotation Error": 37.06174682877119,
"Average Median Reprojection Error": 142.83572575743437,
"Precision @ Pose Error < (25.0cm, 5deg)": 0.09847359178711333,
"AUC @ Pose Error < (25.0cm, 5deg)": 0.2579417106896131,
"Precision @ VCRE < 90px": 0.45265432932594896,
"AUC @ VCRE < 90px": 0.7204040026315026,
"Estimates for % of frames": 1.0
}

However, they are not comparable with the results in your paper.
For example, "Average Median Reprojection Error": 142.83572575743437 (ckpt) vs 126.9 (paper), "Average Median Translation Error": 1.9044957215846383 (ckpt) vs 1.59 (paper), "Average Median Rotation Error": 37.06174682877119 (ckpt) vs 25.9 (paper).
Could you please offer any instruction?

Thanks for your time.

Haolin

Obtaining pose confidence measurements

Hi, thanks for the interesting work. We'd like to use this in the context of detecting loop closures for visual SLAM.

As the dataset we are using does not really line up well with what MicKey was trained on (indoor scenes of partially constructed buildings), I don't expect MicKey to perform well out of the box. However, I'd like to somehow quantify "how bad is the domain gap".

In the website, you show a confidence metric between two images:
image

Is it possible to obtain this number from the pose matcher? The output of the model from demo_inference.py is:

Data keys: dict_keys(['image0', 'image1', 'K_color0', 'K_color1', 'kps0_shape', 
'kps1_shape', 'depth0_map', 'depth1_map', 'down_factor', 'kps0', 'depth_kp0', 
'scr0', 'kps1', 'depth_kp1', 'scr1', 'scores', 'dsc0', 'dsc1', 'kp_scores', 'final_scores', 
'R', 't', 'inliers', 'inliers_list'])

Is the confidence in one of these output variables?

(a more general question: What do all of these mean?)

Query regarding the Multi-frame Map-free benchmark

Hi,

Thank you again for the work, I have a query regarding the benchmark. MicKey only takes in two images and provides the corresponding output. However, I see that MicKey is part of the leaderboard in the Multi-frame, Map-free visual Relocalization benchmark (link). According to a previous comment on one of the issues, MicKey has not yet been tested on multiple views. (link)

Could you explain how Mickey is part of the leaderboard/ What that benchmark means?

Thanks,
Suraj

A naive question about depth prediction

I am new to MVS research, so pardon me if the question is naive.
How do you ensure that the predicted depth is a correct metric depth? In other words, how do you ensure the depth predictions have correct scale. AFAIK you do not supervise your depth maps directly.

How to get poses for more than 2 images (in the same coordinate world)

Hi, Thank you for the work and for making the code available. For a use case, I would like to run the model for more than two images and get the poses of the camera. The naive way was to run with the reference image fixed while the destination changes. But that did not work. Could you comment on how I could run the model on more than 2 images and use the output poses?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.