Giter Club home page Giter Club logo

Comments (3)

kxhit avatar kxhit commented on May 30, 2024

Hi,

  1. I'm not quite sure which part is wrong. But I suggest you accumulate the depth point clouds (or do TSDF-Fusion) and do a visualisation to check the depth scale, camera intrinsics and camera poses are good. For our evaluation on TUM RGB-D, we didn't use dataloader to read frames. Instead, we play the rosbag and use orb-slam-ros-wrapper to process the frames online, and vMAP will subscribe the rostopic from orb-slam to obtain the posed frames and updated keyframe poses (usually after orb-slam's local BA and global BA). So the mapping on TUM-RGBD data will be real-time and finished right after the rosbag playing finished.
  2. For integrating Detic, I think you can follow this demo. And the "instance" info is returned in the results python dictionary. After that, you should keep the semantic and instance info, and build the data association (based on semantic consistency and 3D overlap) by following this function. Or, I recently tried Video segmentation e.g., XMem as the data association tracker, that also works well.

Hope it helps and please let me know if it still cannot work well!

from vmap.

bilibilijin avatar bilibilijin commented on May 30, 2024

Hi,

  1. I'm not quite sure which part is wrong. But I suggest you accumulate the depth point clouds (or do TSDF-Fusion) and do a visualisation to check the depth scale, camera intrinsics and camera poses are good. For our evaluation on TUM RGB-D, we didn't use dataloader to read frames. Instead, we play the rosbag and use orb-slam-ros-wrapper to process the frames online, and vMAP will subscribe the rostopic from orb-slam to obtain the posed frames and updated keyframe poses (usually after orb-slam's local BA and global BA). So the mapping on TUM-RGBD data will be real-time and finished right after the rosbag playing finished.
  2. For integrating Detic, I think you can follow this demo. And the "instance" info is returned in the results python dictionary. After that, you should keep the semantic and instance info, and build the data association (based on semantic consistency and 3D overlap) by following this function. Or, I recently tried Video segmentation e.g., XMem as the data association tracker, that also works well.

Hope it helps and please let me know if it still cannot work well!

Thanks for your comments, I changed the depth scale in the TUM_RGBD dataset from 1000 to 5000, and got good results in the subsequent reconstruction, as shown below, I don't know whether such a reconstruction effect has reached your paper the effect shown in .

image

In addition, I still have some doubts. I tried to use RGBD camera to shoot my own data set, and used ColMAP to estimate the external parameters of the camera. I used Track-Anything to prepare the mask of the object, but both iMAP and vMAP were used for reconstruction. Less than expected effect. What I want to ask is:

  1. For the scene mentioned in the paper, how many RGB images generally need to be prepared to reconstruct it.
    image

  2. Does the pose need to be particularly accurate? Whether the pose estimated by COLMAP cannot meet the requirements of vMAP.

from vmap.

kxhit avatar kxhit commented on May 30, 2024

Hi @bilibilijin thanks for your updates.
Yeah, I think the results you showed matches.

  1. For Replica and ScanNet we use and process all the frames sequentially. For TUM RGB-D and Our recordings with Kinect, we play ros-bag in real-time and report the reconstruction results when the sequence ends. The real-time processing is around 5Hz, indicating the frame taken rate is about every 0.2s. So the total frame numbers will be around {sequence_time/0.2s}. For example, the sequence time of our Kinect recordings is around 170s, so the total processed frames will be around 700.
  2. I don't think the pose needs to be super accurate as we adopt continually updated poses from ORB-SLAM3. COLMAP is definitely far accurate enough.

from vmap.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.