<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="16

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

About the reconstruction result of iMAP method is not good about vmap HOT 3 CLOSED

kxhit commented on May 30, 2024

About the reconstruction result of iMAP method is not good

from vmap.

Comments (3)

kxhit commented on May 30, 2024

Hi,

I'm not quite sure which part is wrong. But I suggest you accumulate the depth point clouds (or do TSDF-Fusion) and do a visualisation to check the depth scale, camera intrinsics and camera poses are good. For our evaluation on TUM RGB-D, we didn't use dataloader to read frames. Instead, we play the rosbag and use orb-slam-ros-wrapper to process the frames online, and vMAP will subscribe the rostopic from orb-slam to obtain the posed frames and updated keyframe poses (usually after orb-slam's local BA and global BA). So the mapping on TUM-RGBD data will be real-time and finished right after the rosbag playing finished.
For integrating Detic, I think you can follow this demo. And the "instance" info is returned in the results python dictionary. After that, you should keep the semantic and instance info, and build the data association (based on semantic consistency and 3D overlap) by following this function. Or, I recently tried Video segmentation e.g., XMem as the data association tracker, that also works well.

Hope it helps and please let me know if it still cannot work well!

from vmap.

bilibilijin commented on May 30, 2024

Hi,

I'm not quite sure which part is wrong. But I suggest you accumulate the depth point clouds (or do TSDF-Fusion) and do a visualisation to check the depth scale, camera intrinsics and camera poses are good. For our evaluation on TUM RGB-D, we didn't use dataloader to read frames. Instead, we play the rosbag and use orb-slam-ros-wrapper to process the frames online, and vMAP will subscribe the rostopic from orb-slam to obtain the posed frames and updated keyframe poses (usually after orb-slam's local BA and global BA). So the mapping on TUM-RGBD data will be real-time and finished right after the rosbag playing finished.

For integrating Detic, I think you can follow this demo. And the "instance" info is returned in the results python dictionary. After that, you should keep the semantic and instance info, and build the data association (based on semantic consistency and 3D overlap) by following this function. Or, I recently tried Video segmentation e.g., XMem as the data association tracker, that also works well.

Hope it helps and please let me know if it still cannot work well!

Thanks for your comments, I changed the depth scale in the TUM_RGBD dataset from 1000 to 5000, and got good results in the subsequent reconstruction, as shown below, I don't know whether such a reconstruction effect has reached your paper the effect shown in .

In addition, I still have some doubts. I tried to use RGBD camera to shoot my own data set, and used ColMAP to estimate the external parameters of the camera. I used Track-Anything to prepare the mask of the object, but both iMAP and vMAP were used for reconstruction. Less than expected effect. What I want to ask is:

For the scene mentioned in the paper, how many RGB images generally need to be prepared to reconstruct it.
Does the pose need to be particularly accurate? Whether the pose estimated by COLMAP cannot meet the requirements of vMAP.

from vmap.

kxhit commented on May 30, 2024

Hi @bilibilijin thanks for your updates.
Yeah, I think the results you showed matches.

For Replica and ScanNet we use and process all the frames sequentially. For TUM RGB-D and Our recordings with Kinect, we play ros-bag in real-time and report the reconstruction results when the sequence ends. The real-time processing is around 5Hz, indicating the frame taken rate is about every 0.2s. So the total frame numbers will be around {sequence_time/0.2s}. For example, the sequence time of our Kinect recordings is around 170s, so the total processed frames will be around 700.
I don't think the pose needs to be super accurate as we adopt continually updated poses from ORB-SLAM3. COLMAP is definitely far accurate enough.

from vmap.

About the reconstruction result of iMAP method is not good about vmap HOT 3 CLOSED

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent