Giter Club home page Giter Club logo

imp-release's Introduction

IMP: Iterative Matching and Pose Estimation with Adaptive Pooling

In this paper we propose an iterative matching and pose estimation framework (IMP) leveraging the geometric connections between the two tasks: a few good matches are enough for a roughly accurate pose estimation; a roughly accurate pose can be used to guide the matching by providing geometric constraints. To this end, we implement a geometry-aware recurrent attention-based module which jointly outputs sparse matches and camera poses. Specifically, for each iteration, we first implicitly embed geometric information into the module via a pose-consistency loss, allowing it to predict geometry-aware matches progressively. Second, we introduce an efficient IMP, called EIMP, to dynamically discard keypoints without potential matches, avoiding redundant updating and significantly reducing the quadratic time complexity of attention computation in transformers.

With this code, you can train your own matcher from scratch with better performance than SuperGlue. As a trained model supports different number of iterations (self/cross), you can choose a light version with fewer layers for easy tasks, e.g., VO/SLAM and a heavy version with more layers for tough tasks such as long-term relocalization.

Dependencies

  • Python==3.9
  • PyTorch == 1.12
  • opencv-contrib-python == 4.5.5.64
  • opencv-python == 4.5.5.64

Data preparation

Please download the preprocessed data of Megadepth (scene_info and Undistorted_SfM) from here.

The data structure of Megadepth should be like this:

- Megadepth
 - phoenix
 - scene_info
    - 0000.0.npz
    - ...
 - Undistorted_SfM
    - 0000
        - images
        - sparse 
        - stereo

Then Use the command to extract local features (spp/sift), build correspondences for training:

python3 -m dump.dump_megadepth --feature_type spp --base_path  path_of_megadepth  --save_path your_save_path

The data structure of generated samples for training should like this:

- your_save_path
    - keypoints_spp
        - 0000
            - 3409963756_f34ab1229a_o.jpg_spp.npy
    - matches_spp # not used in the training process
        - 0000
            - 0.npy
    - matches_sep # this is used for loading data with multi-thread (tried h5py, but failed)
        - 0000
            - 0.npy
    - nmatches_spp # contains the number of valid matches (used for random sampling in the training process)
        - 0000_spp.npy 
    - mega_scene_nmatches_spp.npy # merged info of all scenes in nmatches_spp

Instead of generating training samples offline, you can also do it online and adopt augmentations (e.g. perspective transformation, illumination changes) to further improve the ability of the model. Since this process is time-consuming and there might be bugs in the code, it would be better to do a test of dumping and training on scenes in assets/megadepth_scenes_debug.txt .

Training

Please modify save_path and base_path in configs/config_train_megadepth.json. Then start the training as:

python3 train.py --config configs/config_train_megadepth.json

The base_path in configs/config_train_megadepth.json should be the same as the save_path used in dump_megadepth . It requires 4 2080ti/1080ti gpus or 2 3090 gpus for batch size of 16.

Results

  1. Download the pretrained weights from here and put them in the weights directory.

  2. Prepare the testing data from YFCC and Scannet datasets.

  • Download YFCC dataset:
   bash download_data.sh raw_data raw_data_yfcc.tar.gz 0 8
   tar -xvf raw_data_yfcc.tar.gz
   
  • Update the following entries in dump/configs/yfcc_sp.yaml and dump/configs/yfcc_root.yaml
    • rawdata_dir: path for yfcc rawdata
    • feature_dump_dir: dump path for extracted features
    • dataset_dump_dir: dump path for generated dataset
    • extractor: configuration for keypoint extractor
cd dump
python3 dump.py --config_path configs/yfcc_sp.yaml # copied from SGMNet

You will generate a hdf5 (yfcc_sp_2000.hdf5) file at dataset_dump_dir. Please also update the rawdata_dir and dataset_dir in configs/yfcc_eval_gm.yaml and configs/yfcc_eval_gm_sift.yaml for evaluation.

  • Download the preprocessed Scannet evaluation data from here

  • Update the following entries in dump/configs/scannet_sp.yaml and dump/configs/scannet_root.yaml

    • rawdata_dir: path for yfcc rawdata
    • feature_dump_dir: dump path for extracted features
    • dataset_dump_dir: dump path for generated dataset
    • extractor: configuration for keypoint extractor
cd dump
python3 dump.py --config_path configs/scannet_sp.yaml  # copied from SGMNet

You will generate a hdf5 (scannet_sp_1000.hdf5) file at dataset_dump_dir. Please also update the rawdata_dir and dataset_dir in configs/scannet_eval_gm.yaml and configs/scannet_eval_gm_sift.yaml for evaluation.

  1. Run the following the command for evaluation:
python3 -m eval.eval_imp --matching_method IMP --dataset yfcc

You will get results like this on YFCC dataset:

Model @5 @10 @20
imp 38.45 58.52 74.67
imp_iterative 39.4 59.62 75.28
eimp 36.96 56.76 73.29
eimp_iterative 38.98 58.95 74.81

BibTeX Citation

If you use any ideas from the paper or code in this repo, please consider citing:

@inproceedings{xue2022imp,
  author    = {Fei Xue and Ignas Budvytis and Roberto Cipolla},
  title     = {IMP: Iterative Matching and Pose Estimation with Adaptive Pooling},
  booktitle = {CVPR},
  year      = {2023}
}

Acknowledgements

Part of the code is from previous excellent works including SuperPoint , SuperGlue and SGMNet. You can find more details from their released repositories if you are interested in their works.

imp-release's People

Contributors

feixue94 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.