Hi, I previously considered adding NeRF or 3DGS for 3D reconstruction. But the dat

[Feature-Request] VDBFusion Integration about rtabmap HOT 2 OPEN

borongyuan commented on September 23, 2024

[Feature-Request] VDBFusion Integration

from rtabmap.

Comments (2)

borongyuan commented on September 23, 2024 2

How to update efficiently online the 3D reconstruction to match the new optimized graph after a loop closure?

This is also something I have been thinking about. OpenVDB is often used for simulating and rendering sparse volumetric data such as water, fire, smoke and clouds. So it can certainly be used for fast re-generation. But VDBFusion can't do it yet. Its current API only allows adding data. But I am not familiar enough with OpenVDB yet, so integrating VDBFusion first is a good starting point. We’ll see how to improve it later. What I know about OpenVDB is that it is suitable for processing data that is globally sparse and locally dense. It happens that SLAM data has such characteristics.

Another question I have been thinking about is what kind of representation is more suitable for SLAM maps? Different types of environment representations seem to be needed for map visualization, robot navigation, and human-robot interaction. Is it possible to design a universal intermediate representation that can be quickly converted to other types? When I was working on local/global descriptors, I realized that these models actually provided sparse image embedding (only for keypoints). So it naturally occurred to me that we need to try dense image embedding (for every pixels) next. Perhaps SAM is a foundation model worth trying. This seems to imply a path towards semantic SLAM. In the past, people have been focusing on labelized semantics. However, labels have semantic granularity issues and bring ambiguity. Labels cannot describe objects, parts, and subparts well. At the embedding level, there will be no such problem. Moreover, fusion can be performed on multi-frame embeddings. That’s why I now feel that semantic SLAM and 3D reconstruction should be considered together. I hope to build an intermediate representation that can contain information such as color, shape, semantics, etc., and can be updated incrementally. Now many models have Encoder-Decoder structure. A lightweight decoder can convert the intermediate representation into the required output format. So I'm going to try to change it to Encoder->SLAM/Fusion->Decoder.

from rtabmap.

matlabbe commented on September 23, 2024

It is sure it depends what the end goal is. From robotics point of view, one major issue with OctoMap is not really the update step, but that on loop closure detection (when the map's graph is optimized), we need to re-generate the whole OctoMap from scratch. I tried some TSDF approaches in the past (open_chisel that was originally created for Google Tango and cpu_tsdf), but get always stuck on this question: "How to update efficiently online the 3D reconstruction to match the new optimized graph after a loop closure?". Because of that question, these 3D reconstructions approaches are mostly used only offline with RTAB-Map (options available when doing File->Export Clouds...). If with OpenVDB the re-generation is fast (e.g., faster than OctoMap), that could be indeed useful to integrate. Note that if we would want only to use a 3D reconstruction in localization mode (knowing that we won't update the global map), that may be indeed useful for 3D route planning indoor (e.g., for drone). Approaches derived from Elastic Fusion may have some answers to question above, it had some results for deforming a TSDF after loop closure detection, but not sure if it scales well (and it requires quite good computer/GPU to run).

Thanks for the link to STVL, very interesting to get a speed boost on the default costmap voxel layer.

For example, returning to a visited location from the opposite direction has always been a problem for appearance-based loop closure detection.

Yes, it is. New approaches like SuperGlue can match two images looking at same thing but from very different point of views. However, it could be interesting to also see some photo-realistic 3D reconstruction in which we could simulate an image (combination of multiple images of the area taken from different point of views than the current position of the robot) to compare with actual point of view of the robot. NERF volumes can be very realistic, assuming the robot knows where it is in the environment, it could localize by generating an image from the model at that position. While I see great potential for robot localization, I don't see yet how to use NERF online in SLAM mode.

from rtabmap.

[Feature-Request] VDBFusion Integration about rtabmap HOT 2 OPEN

Comments (2)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent