kxhit / vmap Goto Github PK

[CVPR 2023] vMAP: Vectorised Object Mapping for Neural Field SLAM

License: Other

Python 100.00%

3d-reconstruction dense-mapping nerf neural-fields neural-implicit-representations object-mapping rgbd-slam slam vectorized cvpr2023

vmap's People

Contributors

Stargazers

Watchers

vmap's Issues

Novel view synthesis

Thanks for such an excellent job!

I have a question regarding synthesizing a novel view, which involves both RGB and depth data.
Specifically, I'm curious about how to accomplish this based on a given camera position after completing the training of all the data. Unfortunately, I couldn't find the relevant code in this repo. Could you please provide some guidance on this matter?

Thank you in advance for your help!

About the reconstruction result of iMAP method is not good

#14 Hi ！Thanks for the steps you gave in the previous question, I tried following those suggestions, but still some problems are not solved effectively.
Question 1: I tried to reconstruct the TUM_RGBD dataset in iMAP mode. First, I referred to the dataset about this dataset in nice-slam, because there is no related processing for this dataset in vmap, I don’t know if I Whether the processing is correct, the result is as follows, there will be a lot of overlapping shadows, I don't know which step is wrong.

and this is code:

`class TUM_RGBD(Dataset):
def init(self, cfg):

    self.imap_mode = cfg.imap_mode
    self.root_dir = cfg.dataset_dir
    self.color_paths, self.depth_paths, self.poses = self.loadtum(
        self.root_dir,frame_rate=32)
    self.inst_path = '../Detic/datasets/TUM_RGBD/output_dataset_freiburg2_xyz_instance/'
    self.n_img = len(self.color_paths)

    self.depth_transform = transforms.Compose(
        [image_transforms.DepthScale(cfg.depth_scale),#depth * 1/1000.0
         image_transforms.DepthFilter(cfg.max_depth)])#6.0 depth>6的为far_mask 将下标标记为far_mask的深度取0
    self.distortion = np.array([0.2312, -0.7849, -0.0033, -0.0001, 0.9172])
    self.W = cfg.W
    self.H = cfg.H
    self.fx = cfg.fx
    self.fy = cfg.fy
    self.cx = cfg.cx
    self.cy = cfg.cy
    self.edge = cfg.mw

    self.crop_size = None
    self.scale = 1.0




    # background semantic classes: undefined--1, undefined-0 etc.

    ##################vmap_part##################################
    self.background_cls_list = [255]
    self.bbox_scale = 0.2
    self.inst_dict = {}
    ##############################################################


def parse_list(self, filepath, skiprows=0):
    """ read list data """
    data = np.loadtxt(filepath, delimiter=' ',
                      dtype=np.unicode_, skiprows=skiprows)
    return data

def associate_frames(self, tstamp_image, tstamp_depth, tstamp_pose, max_dt=0.08):
    """ pair images, depths, and poses """
    associations = []
    for i, t in enumerate(tstamp_image):
        if tstamp_pose is None:
            j = np.argmin(np.abs(tstamp_depth - t))
            if (np.abs(tstamp_depth[j] - t) < max_dt):
                associations.append((i, j))

        else:
            j = np.argmin(np.abs(tstamp_depth - t))
            k = np.argmin(np.abs(tstamp_pose - t))

            if (np.abs(tstamp_depth[j] - t) < max_dt) and \
                    (np.abs(tstamp_pose[k] - t) < max_dt):
                associations.append((i, j, k))

    return associations

def loadtum(self, datapath, frame_rate=-1):
    """ read video data in tum-rgbd format """
    if os.path.isfile(os.path.join(datapath, 'groundtruth.txt')):
        pose_list = os.path.join(datapath, 'groundtruth.txt')
    elif os.path.isfile(os.path.join(datapath, 'pose.txt')):
        pose_list = os.path.join(datapath, 'pose.txt')

    image_list = os.path.join(datapath, 'rgb.txt')
    depth_list = os.path.join(datapath, 'depth.txt')

    image_data = self.parse_list(image_list)
    depth_data = self.parse_list(depth_list)
    pose_data = self.parse_list(pose_list, skiprows=1)
    pose_vecs = pose_data[:, 1:].astype(np.float64)

    tstamp_image = image_data[:, 0].astype(np.float64)
    tstamp_depth = depth_data[:, 0].astype(np.float64)
    tstamp_pose = pose_data[:, 0].astype(np.float64)
    associations = self.associate_frames(
        tstamp_image, tstamp_depth, tstamp_pose)

    indicies = [0]
    for i in range(1, len(associations)):
        t0 = tstamp_image[associations[indicies[-1]][0]]
        t1 = tstamp_image[associations[i][0]]
        if t1 - t0 > 1.0 / frame_rate:
            indicies += [i]

    images, poses, depths, intrinsics = [], [], [], []
    inv_pose = None
    for ix in indicies:
        (i, j, k) = associations[ix]
        images += [os.path.join(datapath, image_data[i, 1])]
        depths += [os.path.join(datapath, depth_data[j, 1])]
        c2w = self.pose_matrix_from_quaternion(pose_vecs[k])
        if inv_pose is None:
            inv_pose = np.linalg.inv(c2w)
            c2w = np.eye(4)
        else:
            c2w = inv_pose@c2w
        # c2w[:3, 1] *= -1
        # c2w[:3, 2] *= -1
        c2w = torch.from_numpy(c2w).float()
        poses += [c2w]





    return images, depths, poses

def __len__(self):
    # return len(os.listdir(os.path.join(self.root_dir, "depth")))
    return self.n_img

def as_intrinsics_matrix(self, intrinsics):
    """
    Get matrix representation of intrinsics.

    """
    K = np.eye(3)
    K[0, 0] = intrinsics[0]
    K[1, 1] = intrinsics[1]
    K[0, 2] = intrinsics[2]
    K[1, 2] = intrinsics[3]
    return K

def __getitem__(self, index):
    color_path = self.color_paths[index]
    depth_path = self.depth_paths[index]
    (filepath, tempfilename) = os.path.split(color_path)
    inst_path = self.inst_path + tempfilename
    color_data = cv2.imread(color_path)

    if '.png' in depth_path:
        depth_data = cv2.imread(depth_path, cv2.IMREAD_UNCHANGED)
    elif '.exr' in depth_path:
        depth_data = readEXR_onlydepth(depth_path)
    if self.distortion is not None:
        K = self.as_intrinsics_matrix([self.fx, self.fy, self.cx+self.edge, self.cy+self.edge])
        # undistortion is only applied on color image, not depth!
        color_data = cv2.undistort(color_data, K, self.distortion)

    color_data = cv2.cvtColor(color_data, cv2.COLOR_BGR2RGB).transpose(1,0,2)
    # color_data = color_data / 255.
    depth_data = depth_data.astype(np.float32).transpose(1,0)
    depth_data = np.nan_to_num(depth_data, nan=0.)
    inst = cv2.imread(inst_path, cv2.IMREAD_UNCHANGED).astype(np.int32).transpose(1,0)
    bbox_scale = self.bbox_scale


    T = None
    if self.poses is not None:
        T = self.poses[index]
        T = T.numpy()
        if np.any(np.isinf(T)):
            if index + 1 == self.__len__():
                print("pose inf!")
                return None
            return self.__getitem__(index + 1)

    H, W = depth_data.shape
    # color_data = cv2.resize(color_data, (W, H), interpolation=cv2.INTER_LINEAR)

    if self.edge:
        edge = self.edge # crop image edge, there are invalid value on the edge of the color image
        color_data = color_data[edge:-edge, edge:-edge]
        depth_data = depth_data[edge:-edge, edge:-edge]
    if self.depth_transform:
        depth_data = self.depth_transform(depth_data)

    bbox_dict = {}
    if self.imap_mode:
        obj = np.zeros_like(depth_data)
    else:

        inst_list = []
        batch_masks = []
        if self.edge:
            edge = self.edge
            inst = inst[edge:-edge, edge:-edge]
        obj_ = np.zeros_like(inst)
        for inst_id in np.unique(inst):
            if inst_id in self.background_cls_list:
                continue
            obj_mask = inst == inst_id
            batch_masks.append(obj_mask)
            inst_list.append(inst_id)
        if len(batch_masks) > 0:
            batch_masks = torch.from_numpy(np.stack(batch_masks))
            cmins, cmaxs, rmins, rmaxs = get_bbox2d_batch(batch_masks)

            for i in range(batch_masks.shape[0]):
                w = rmaxs[i] - rmins[i]
                h = cmaxs[i] - cmins[i]
                if w <= 10 or h <= 10:  # too small#框太小则不作处理   todo
                    continue
                bbox_enlarged = enlarge_bbox([rmins[i], cmins[i], rmaxs[i], cmaxs[i]], scale=bbox_scale,
                                             w=inst.shape[1], h=inst.shape[0])

                inst_id = inst_list[i]
                obj_[batch_masks[i]] = 1  # 将batch_masks 为True的地方 在obj_中标记为1
                bbox_dict.update({inst_id: torch.from_numpy(np.array(
                    [bbox_enlarged[1], bbox_enlarged[3], bbox_enlarged[0],
                     bbox_enlarged[2]]))})  # bbox order 创建一个bbox_dict 字典 键值对为 obj_id

        inst[obj_ == 0] = 255#for background
        obj = inst#obj 为对应类别， background 为255


    bbox_dict.update({0: torch.from_numpy(np.array([int(0), int(obj.shape[0]), 0, int(obj.shape[1])]))})

    # wrap data to frame dict
    T_obj = np.eye(4)

    depth_data = torch.from_numpy(depth_data) * self.scale
    color_data = torch.from_numpy(color_data)

    sample = {"image":color_data.type(torch.uint8), "depth": depth_data, "T": T, "T_obj": T_obj,"obj":obj,"bbox_dict":bbox_dict,"frame_id": index}

    if color_data is None or depth_data is None:
        print(color_path)
        print(depth_path)
        raise ValueError


    return sample


def pose_matrix_from_quaternion(self, pvec):
    """ convert 4x4 pose matrix to (t, q) """
    from scipy.spatial.transform import Rotation

    pose = np.eye(4)
    pose[:3, :3] = Rotation.from_quat(pvec[3:]).as_matrix()
    pose[:3, 3] = pvec[:3]
    return pose`

Question 2: It is about obtaining the instance id and semantic id in Detic. I have debugged the detic code, but I can only get the semantic id of each object, that is, the category id. I would like to ask how to process it to get the instance id.

For the above problems, I hope you can give me some corrective opinions or steps, and look forward to your reply as soon as possible, which will be very grateful.

Questions about embedding function and forward pass of the model

Hello, thank you for sharing the code for this amazing research!

I have two questions:

The initial parameter values for the layer that embeds 3D points (UniDirsEmbed) in the code are fixed.
During the forward pass of the NERF model, the embedded input is split into two parts (emb_size1, emb_size2) and placed in different locations.
Could you explain about these two things? Would changing them according to the iMap settings yield similar performance?

Thank you!

Depth L1 results: Table C in Supplementary Material

Dear authors,

Thank you for sharing your outstanding work!

I am wondering how can I replicate the Depth L1 results in Table C in the supplementary material.

Any guidance would be highly appreciated.

Thanks in advance for your time in reading and responding to my question.

Regards,

Question about mh,mw?

I see in configs the image is cropped according to self.edge(e.g. mw or mh).
I want to know the crop size mw(or mh) is all 10 for Scannet dataset and 0 for Replica dataset?

Test vmap on TUM

Thanks for your excellent work and congratulations on the acceptance! I'm trying to reproduce the result on TUM dataset. Here is my process:

Run Detic and get the semantic and instance id
Write the dataloader following nice-slam.To ensure consistency with the vmap loader, I have deleted the pose transform from the camera frame to the nerf frame.
reuse configs file for replica room0

Despite following these steps, I am still unable to obtain meaningful reconstruction results. I have a couple of questions that I hope you can help me with:

To ensure accurate results, it is important to have consistent instance IDs right? However, the instance IDs provided by Detic may not be consistent. To overcome this issue, I have assigned semantic IDs to instance IDs and removed semantic classes with duplicated IDs. Are there any better solutions to address this problem?
I was wondering if you could provide me with instructions on how to reproduce the TUM datasets regarding hyperparameters. Alternatively, could you kindly share the config file?

Thank you in advance for your help.

Sudden Killed in WSL2 ubuntu20.04 3090Ti

First of all, thank you for the sharing the great work you've done!!!

I was just wondering how to solve the problem regarding the sudden crash happening in WSL2.

The crash point is like as follows

Computer setup is

Window11 -> WSL2 ubuntu20.04
3090 Ti (crash also happens at nvidia 3060 WSL2 ubuntu20.04, tested on different window PC)
other setups are the same as the enviroment.yaml file

Thx!

Testing vMAP on intel realsense RGB-D Camera.

Let me start by congratulating you for the great work and thanking you for the well-organized and easy-to-follow repo.

I am wondering about the steps to test vMAP on a live stream data from an intel realsense camera or a Microsoft Kinect camera.

Any suggestions or recommendations will be highly appreciated.

Regards,

Question about vmap in pytorch

Thank you for your excellent work!
I would like to ask if this parallel implementation of vmap can only be used for simple mlp, and if I want to use something like instant-ngp is feasible.
Look forward to your reply!

vis script

Thanks for your work！
In the replica dataset you provide, we can see the visual instance images, couldyou upload the visualization code?

About object mesh.

How did you get the mesh for each object (for TSDF-Fusion and iMAP)?

Question about ground-truth object mesh in Replica dataset

Hi,

Thanks for releasing code! I have some question about the ground-truth object mesh in the replica dataset:

1.Line 27-28 in data_generation/extract_inst_obj.py, why does sub_mesh_indices[object_id] append twice?

The ground-truth object mesh has holes due to occlusion. How can we faithfully evaluate the reconstructed object-level mesh on the occluded surface?
For example, there are holes on the sofa due to the pillows on top of it.

data generation for other replica scenes

Hello. I appreciate you sharing code of this impressive research.
I'm now interested in evaluating the performance of vMAP in different replica scenes, so I'd like to ask about the data generation process.
First, how were the trajectories for model training (trajectory 00) and novel view synthesis (trajectory 01) generated?
Second, how to obtain object-level meshes ? (to evaluate the 3D reconstruction performance at the object level) The original replica dataset seems to provides only scene-level meshes.
Thank you!

Image height and width

Hello,

in the config file for Replica dataset the following parameters are used:

"camera": {
        "w": 1200,
        "h": 680,
        "fx": 600.0,
        "fy": 600.0,
        "cx": 599.5,
        "cy": 339.5,
        "mw": 0,
        "mh": 0
    }

But actually the width of the image is 680 and the height is 1200. Also I notice the image are rotated counter clockwise by 90 degrees.

Can you explain why the camera config doesn't match the actual image size?

Thank you

Questions regarding semantic_instance files

Hi! I have a question regarding regarding the semantic_instance files!

Why do we need it for?!

It seems like it is all black...!!

Thank you in advance!!

how can i run vmap on scannet datasets?

after i run: python2 reader.py --filename ~/data/ScanNet/scannet/scans/scene0024_00/scene0024_00.sens --output_path ~/data/ScanNet/objnerf/ --export_depth_images --export_color_images --export_poses --export_intrinsics, there are /color /depth /intrinsic /pose folders in the output_path ,but there is no instance-filt or label-filt folder in the output_path, which run vmap on scannet dataset needs,how to solve it hoping for your reply,thanks!

Questions regarding the vectorised training

Hi!!

I have questions regarding the vectorised training.....!!

As far as I understood, the vectorised training is somewhat similar to KiloNeRF,
so I was wondering which part of the code is related to the vectorised training!

I am bit confused with the kiloNeRF since it looks like scene is uniformly distributed into n even blocks,
so I was wondering whether the render_rays.make_3D_grid part is related to vectorised training....! If not, could you tell me which part of the code is related to vectorised training?

Thank you!!

How to get scene mesh rather than object mesh?

Thanks for your nice work. When I run python ./train.py --config ./configs/Replica/config_replica_room0_vMAP.json --logdir ./logs/vMAP/room0 --save_ckpt True， I can get mesh of different objects, but the scene mesh as video is missing, How can I to get it?

How to test on TUM-RGBD datasets

Let me start by congratulating you for the great work and thanking you for the well-organized and easy-to-follow repo.

But I don't seem to see the part of the code that is tested on TUM-RGBD

I want to know how to run vmap on the TUM-RGBD dataset, if possible, can you give the relevant steps or detailed code? Would appreciate it if possible.

Looking forward to your reply as soon as possible！

Running iMAP demo

Hi, thanks for open sourcing your code!

I'm just trying to get iMAP running and I have two questions:

I remember a live demo of iMAP at CoRL 2021 and in the paper I believe it says it can run online. But I find in this implementation the meshing step is very slow (it takes about 30 seconds). Why is this and how can I get iMAP working online?
Can you explain my gt_depth and gt_rgb are always randomized images with different dimensions from the input? I thought the loss was supposed to be geometric and photometric error between the latest image and an image predicted by the network.

Thank you

Question about the Object Initialisation and Association

Dear Author,

Thanks for your great works.
I want to find the corresponding code for the 'Object Initialisation and Association part'. But I am confused since I couldn't find it. Could you please tell me the specific location?

Bests,
Runsong

3D mesh vis

Excuse me, when I run the Vmap code you provided, the 3D mesh vis window pops up and remains unresponsive

. Which other environments require configuration? Thank you very much! :)

kxhit / vmap Goto Github PK

vmap's People

Contributors

Stargazers

Watchers

Forkers

vmap's Issues

Recommend Projects

Recommend Topics

Recommend Org