Giter Club home page Giter Club logo

dmsr's Introduction

DMSR

This repository contains code and resources related to the paper RGB-based Category-level Object Pose Estimation via Decoupled Metric Scale Recovery by Jiaxin Wei, Xibin Song, Weizhe Liu, Laurent Kneip, Hongdong Li, and Pan Ji.

Illustration of our pipeline for RGB-based category-level object pose estimation.

Abstract

While showing promising results, recent RGB-D camera-based category-level object pose estimation methods have restricted applications due to the heavy reliance on depth sensors. RGB-only methods provide an alternative to this problem yet suffer from inherent scale ambiguity stemming from monocular observations. In this paper, we propose a novel pipeline that decouples the 6D pose and size estimation to mitigate the influence of imperfect scales on rigid transformations. Specifically, we leverage a pre-trained monocular estimator to extract local geometric information, mainly facilitating the search for inlier 2D-3D correspondence. Meanwhile, a separate branch is designed to directly recover the metric scale of the object based on category-level statistics. Finally, we advocate using the RANSAC-PnP algorithm to robustly solve for 6D object pose. Extensive experiments have been conducted on both synthetic and real datasets, demonstrating the superior performance of our method over previous state-of-the-art RGB-based approaches, especially in terms of rotation accuracy.

Requirements

We have tested our code with the following configurations

  • Ubuntu 20.04
  • PyTorch 1.8.0
  • torchvision 0.9.0
  • CUDA 11.1

Installation

Create virtual environment

conda create -n dmsr python=3.6
conda activate dmsr
pip install -r requirements.txt

Build PointNet++ (source code has been modified to adapt to higher versions of PyTorch)

cd DMSR/pointnet2/pointnet2
python setup.py install

Build nn_distance

cd DMSR/lib/nn_distance
python setup.py install

Dataset

Please follow the instructions in object-deformnet to prepare the NOCS dataset. The segmentation results from Mask R-CNN and predictions of NOCS can also be found here.

Important: delete obj_models/val/02876657/d3b53f56b4a7b3b3c9f016d57db96408 before preprocessing the datasets, otherwise it will cause errors.

The extended dataset containing depth and normal predictions can be downloaded here. Please use the following command to merge the files in each folder after downloading:

cat <folder_name>* > <folder_name>.tar.gz
tar xzvf <folder_name>.tar.gz

Now the DMSR/datasets directory is organized as follows:

datasets
├── NOCS
│   ├── CAMERA
│   │   ├── train
│   │   └── val
│   ├── Real
│   │   ├── train
│   │   └── test
│   ├── gts
│   │   ├── val
│   │   └── real_test
│   └── obj_models
│       ├── train
│       ├── val
│       ├── real_train
│       └── real_test
├── dpt_output
│   ├── CAMERA
│   │   ├── train
│   │   └── val
│   └── Real
│       ├── train
│       └── test
└── results
    ├── mrcnn_results
    │   ├── real_test
    │   └── val
    └── nocs_results
        ├── real_test
        └── val

Training

python train.py --dataset CAMERA --result_dir checkpoints/camera

or

python train.py --dataset CAMERA+Real --result_dir checkpoints/real

Evaluation

Please download our trained models here and put it in the DMSR/pretrained directory.

To evaluate CAMERA data, run

python evaluate.py --data val --model ./pretrained/camera_model.pth --result_dir ./results/eval_camera

To evaluate REAL data, run

python evaluate.py --data real_test --model ./pretrained/real_model.pth --result_dir ./results/eval_real

We also provide processed files for fast verification of our results reported in the paper. Please download our processed files here and put it in the DMSR/results directory. Then, you can run the above commands to see the results shown in the terminal.

Video

Please check our video for more qualitative results.

Qualitative results of our method on CAMERA25 and REAL275.

The predictions and ground truths are shown by red and green bounding boxes, respectively.

Citation

If you find our work useful in your research, please cite our paper:

@misc{wei2023rgbbased,
      title={RGB-based Category-level Object Pose Estimation via Decoupled Metric Scale Recovery}, 
      author={Jiaxin Wei and Xibin Song and Weizhe Liu and Laurent Kneip and Hongdong Li and Pan Ji},
      year={2023},
      eprint={2309.10255},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

If you have any questions, please feel free to contact Jiaxin Wei ([email protected]).

Acknowledgment

Our code is developed based on SGPA and we extend the NOCS dataset using Omnidata models for training.

dmsr's People

Contributors

goldoak avatar

Stargazers

Jian Liu avatar  avatar Weihang Li avatar leanAI avatar  avatar  avatar Jeff Carpenter avatar Tnut avatar Dr. Yifu Wang avatar Alberto Remus avatar Benedictus Kent avatar Steve avatar Xiao Xuan avatar hiyyg avatar Rudy avatar

Watchers

 avatar Dr. Yifu Wang avatar

dmsr's Issues

How to find the best model?

Hello, I use the pre -training model you provided to achieve the effect of approaching your original text. But I trained your code and finally got a total of 120 models. When I used the 120th model to verify, I found that the results were poor, so I would like to ask if there is any way to find the best model to verify? Will the result of the training model converged? I need to verify each model by myself to get the best results?

reference of the monocular estimator

Hi there, I found the reference [26] of monocular estimator in your paper was wrong. Could you please tell me which article you referred?
Thank you!

Metircs

Hi, thanks for sharing this work!
Could you please evaluate again using the SSC-6D metrics?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.