Giter Club home page Giter Club logo

voom's Introduction

VOOM: Robust Visual Object Odometry and Mapping using Hierarchical Landmarks

VOOM is a real-time visual SLAM library that uses high-level objects and low-level points as hierarchical landmarks in a coarse-to-fine manner. It computes the camera trajectory and a sparse 3D reconstruction.

This work has been accepted by ICRA 2024 ๐ŸŽ‰ [pdf] [video].

Abstract

We propose a Visual Object Odometry and Mapping framework (VOOM) using high-level objects and low-level points as the hierarchical landmarks in a coarse-to-fine manner instead of directly using object residuals in bundle adjustment. Firstly, we introduce an improved observation model and a novel data association method for dual quadrics, employed to represent physical objects. It facilitates the creation of a 3D map that closely reflects reality. Next, we use object information to enhance the data association of feature points and consequently update the map. In our visual object odometry backend, the updated map is employed to further optimize the camera pose and the objects. At the same time, local bundle adjustment is performed utilizing the objects and points-based covisibility graphs in our visual object mapping process. Our experiments demonstrate that the localization accuracy of the proposed VOOM not only exceeds that of other object-oriented SLAM but also surpasses that of feature points SLAM systems such as ORB-SLAM2. The videos of the results can be found at: https://www.bilibili.com/video/BV1w14y1C7Jb/ .

Prerequisites

Need Install

Included in the Thirdparty folder

  • DBoW2 and g2o We use modified versions of the DBoW2 library to perform place recognition and g2o library to perform non-linear optimizations. Both modified libraries (which are BSD) are included in the Thirdparty folder.
  • Json for I/O json files.
  • Osmap for map saving/loading. Modified version to handle objects.

Compilation

  1. Clone the repository recursively:

    git clone https://github.com/yutongwangBIT/VOOM.git VOOM

  2. Build:

    sh build.sh

Data

  1. TUM RGBD
  2. LM Data Diamond sequences

Our system takes instance segmentation as input. We provide detections in JSON files in the Data folder. We used an off-the-shelf version of YOLOv8, the Python script to prepare the JSON file is in the PythonScripts folder. The camera parameters are available in the Cameras folder.

Run our system

All command lines can be found in https://github.com/yutongwangBIT/VOOM/blob/main/script

An example usage on TUM Fr2_desk sequence:

cd bin/
./rgbd_tum_with_ellipse ../Vocabulary/ORBvoc.txt ../Cameras/TUM2.yaml PATH_TO_DATASET ../Data/fr2_desk/fr2_desk.txt ../Data/fr2_desk/detections_yolov8x_seg_tum_rgbd_fr2_desk_with_ellipse.json points fr2_desk

License

VOOM is released under a GPLv3 license. For a list of all code/library dependencies (and associated licenses), please see Dependencies.md.

Citation

If you use VOOM in an academic work, please cite our paper:

@inproceedings{wang2024icra,
	author = {Yutong Wang and Chaoyang Jiang and Xieyuanli Chen},
	title = {{VOOM: Robust Visual Object Odometry and Mapping using Hierarchical Landmarks}},
	booktitle = {Proc. of the IEEE Intl. Conf. on Robotics \& Automation (ICRA)},
	year = 2024
}

voom's People

Contributors

chen-xieyuanli avatar yutongwangbit avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.