Giter Club home page Giter Club logo

Comments (2)

borongyuan avatar borongyuan commented on July 24, 2024 2

How to update efficiently online the 3D reconstruction to match the new optimized graph after a loop closure?

This is also something I have been thinking about. OpenVDB is often used for simulating and rendering sparse volumetric data such as water, fire, smoke and clouds. So it can certainly be used for fast re-generation. But VDBFusion can't do it yet. Its current API only allows adding data. But I am not familiar enough with OpenVDB yet, so integrating VDBFusion first is a good starting point. We’ll see how to improve it later. What I know about OpenVDB is that it is suitable for processing data that is globally sparse and locally dense. It happens that SLAM data has such characteristics.

Another question I have been thinking about is what kind of representation is more suitable for SLAM maps? Different types of environment representations seem to be needed for map visualization, robot navigation, and human-robot interaction. Is it possible to design a universal intermediate representation that can be quickly converted to other types? When I was working on local/global descriptors, I realized that these models actually provided sparse image embedding (only for keypoints). So it naturally occurred to me that we need to try dense image embedding (for every pixels) next. Perhaps SAM is a foundation model worth trying. This seems to imply a path towards semantic SLAM. In the past, people have been focusing on labelized semantics. However, labels have semantic granularity issues and bring ambiguity. Labels cannot describe objects, parts, and subparts well. At the embedding level, there will be no such problem. Moreover, fusion can be performed on multi-frame embeddings. That’s why I now feel that semantic SLAM and 3D reconstruction should be considered together. I hope to build an intermediate representation that can contain information such as color, shape, semantics, etc., and can be updated incrementally. Now many models have Encoder-Decoder structure. A lightweight decoder can convert the intermediate representation into the required output format. So I'm going to try to change it to Encoder->SLAM/Fusion->Decoder.

from rtabmap.

matlabbe avatar matlabbe commented on July 24, 2024

It is sure it depends what the end goal is. From robotics point of view, one major issue with OctoMap is not really the update step, but that on loop closure detection (when the map's graph is optimized), we need to re-generate the whole OctoMap from scratch. I tried some TSDF approaches in the past (open_chisel that was originally created for Google Tango and cpu_tsdf), but get always stuck on this question: "How to update efficiently online the 3D reconstruction to match the new optimized graph after a loop closure?". Because of that question, these 3D reconstructions approaches are mostly used only offline with RTAB-Map (options available when doing File->Export Clouds...). If with OpenVDB the re-generation is fast (e.g., faster than OctoMap), that could be indeed useful to integrate. Note that if we would want only to use a 3D reconstruction in localization mode (knowing that we won't update the global map), that may be indeed useful for 3D route planning indoor (e.g., for drone). Approaches derived from Elastic Fusion may have some answers to question above, it had some results for deforming a TSDF after loop closure detection, but not sure if it scales well (and it requires quite good computer/GPU to run).

Thanks for the link to STVL, very interesting to get a speed boost on the default costmap voxel layer.

For example, returning to a visited location from the opposite direction has always been a problem for appearance-based loop closure detection.

Yes, it is. New approaches like SuperGlue can match two images looking at same thing but from very different point of views. However, it could be interesting to also see some photo-realistic 3D reconstruction in which we could simulate an image (combination of multiple images of the area taken from different point of views than the current position of the robot) to compare with actual point of view of the robot. NERF volumes can be very realistic, assuming the robot knows where it is in the environment, it could localize by generating an image from the model at that position. While I see great potential for robot localization, I don't see yet how to use NERF online in SLAM mode.

from rtabmap.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.