kxhit / vmap Goto Github PK
View Code? Open in Web Editor NEW[CVPR 2023] vMAP: Vectorised Object Mapping for Neural Field SLAM
Home Page: https://kxhit.github.io/vMAP
License: Other
[CVPR 2023] vMAP: Vectorised Object Mapping for Neural Field SLAM
Home Page: https://kxhit.github.io/vMAP
License: Other
Thanks for such an excellent job!
I have a question regarding synthesizing a novel view, which involves both RGB and depth data.
Specifically, I'm curious about how to accomplish this based on a given camera position after completing the training of all the data. Unfortunately, I couldn't find the relevant code in this repo. Could you please provide some guidance on this matter?
Thank you in advance for your help!
#14 Hi !Thanks for the steps you gave in the previous question, I tried following those suggestions, but still some problems are not solved effectively.
Question 1: I tried to reconstruct the TUM_RGBD dataset in iMAP mode. First, I referred to the dataset about this dataset in nice-slam, because there is no related processing for this dataset in vmap, I don’t know if I Whether the processing is correct, the result is as follows, there will be a lot of overlapping shadows, I don't know which step is wrong.
and this is code:
`class TUM_RGBD(Dataset):
def init(self, cfg):
self.imap_mode = cfg.imap_mode
self.root_dir = cfg.dataset_dir
self.color_paths, self.depth_paths, self.poses = self.loadtum(
self.root_dir,frame_rate=32)
self.inst_path = '../Detic/datasets/TUM_RGBD/output_dataset_freiburg2_xyz_instance/'
self.n_img = len(self.color_paths)
self.depth_transform = transforms.Compose(
[image_transforms.DepthScale(cfg.depth_scale),#depth * 1/1000.0
image_transforms.DepthFilter(cfg.max_depth)])#6.0 depth>6的为far_mask 将下标标记为far_mask的深度取0
self.distortion = np.array([0.2312, -0.7849, -0.0033, -0.0001, 0.9172])
self.W = cfg.W
self.H = cfg.H
self.fx = cfg.fx
self.fy = cfg.fy
self.cx = cfg.cx
self.cy = cfg.cy
self.edge = cfg.mw
self.crop_size = None
self.scale = 1.0
# background semantic classes: undefined--1, undefined-0 etc.
##################vmap_part##################################
self.background_cls_list = [255]
self.bbox_scale = 0.2
self.inst_dict = {}
##############################################################
def parse_list(self, filepath, skiprows=0):
""" read list data """
data = np.loadtxt(filepath, delimiter=' ',
dtype=np.unicode_, skiprows=skiprows)
return data
def associate_frames(self, tstamp_image, tstamp_depth, tstamp_pose, max_dt=0.08):
""" pair images, depths, and poses """
associations = []
for i, t in enumerate(tstamp_image):
if tstamp_pose is None:
j = np.argmin(np.abs(tstamp_depth - t))
if (np.abs(tstamp_depth[j] - t) < max_dt):
associations.append((i, j))
else:
j = np.argmin(np.abs(tstamp_depth - t))
k = np.argmin(np.abs(tstamp_pose - t))
if (np.abs(tstamp_depth[j] - t) < max_dt) and \
(np.abs(tstamp_pose[k] - t) < max_dt):
associations.append((i, j, k))
return associations
def loadtum(self, datapath, frame_rate=-1):
""" read video data in tum-rgbd format """
if os.path.isfile(os.path.join(datapath, 'groundtruth.txt')):
pose_list = os.path.join(datapath, 'groundtruth.txt')
elif os.path.isfile(os.path.join(datapath, 'pose.txt')):
pose_list = os.path.join(datapath, 'pose.txt')
image_list = os.path.join(datapath, 'rgb.txt')
depth_list = os.path.join(datapath, 'depth.txt')
image_data = self.parse_list(image_list)
depth_data = self.parse_list(depth_list)
pose_data = self.parse_list(pose_list, skiprows=1)
pose_vecs = pose_data[:, 1:].astype(np.float64)
tstamp_image = image_data[:, 0].astype(np.float64)
tstamp_depth = depth_data[:, 0].astype(np.float64)
tstamp_pose = pose_data[:, 0].astype(np.float64)
associations = self.associate_frames(
tstamp_image, tstamp_depth, tstamp_pose)
indicies = [0]
for i in range(1, len(associations)):
t0 = tstamp_image[associations[indicies[-1]][0]]
t1 = tstamp_image[associations[i][0]]
if t1 - t0 > 1.0 / frame_rate:
indicies += [i]
images, poses, depths, intrinsics = [], [], [], []
inv_pose = None
for ix in indicies:
(i, j, k) = associations[ix]
images += [os.path.join(datapath, image_data[i, 1])]
depths += [os.path.join(datapath, depth_data[j, 1])]
c2w = self.pose_matrix_from_quaternion(pose_vecs[k])
if inv_pose is None:
inv_pose = np.linalg.inv(c2w)
c2w = np.eye(4)
else:
c2w = inv_pose@c2w
# c2w[:3, 1] *= -1
# c2w[:3, 2] *= -1
c2w = torch.from_numpy(c2w).float()
poses += [c2w]
return images, depths, poses
def __len__(self):
# return len(os.listdir(os.path.join(self.root_dir, "depth")))
return self.n_img
def as_intrinsics_matrix(self, intrinsics):
"""
Get matrix representation of intrinsics.
"""
K = np.eye(3)
K[0, 0] = intrinsics[0]
K[1, 1] = intrinsics[1]
K[0, 2] = intrinsics[2]
K[1, 2] = intrinsics[3]
return K
def __getitem__(self, index):
color_path = self.color_paths[index]
depth_path = self.depth_paths[index]
(filepath, tempfilename) = os.path.split(color_path)
inst_path = self.inst_path + tempfilename
color_data = cv2.imread(color_path)
if '.png' in depth_path:
depth_data = cv2.imread(depth_path, cv2.IMREAD_UNCHANGED)
elif '.exr' in depth_path:
depth_data = readEXR_onlydepth(depth_path)
if self.distortion is not None:
K = self.as_intrinsics_matrix([self.fx, self.fy, self.cx+self.edge, self.cy+self.edge])
# undistortion is only applied on color image, not depth!
color_data = cv2.undistort(color_data, K, self.distortion)
color_data = cv2.cvtColor(color_data, cv2.COLOR_BGR2RGB).transpose(1,0,2)
# color_data = color_data / 255.
depth_data = depth_data.astype(np.float32).transpose(1,0)
depth_data = np.nan_to_num(depth_data, nan=0.)
inst = cv2.imread(inst_path, cv2.IMREAD_UNCHANGED).astype(np.int32).transpose(1,0)
bbox_scale = self.bbox_scale
T = None
if self.poses is not None:
T = self.poses[index]
T = T.numpy()
if np.any(np.isinf(T)):
if index + 1 == self.__len__():
print("pose inf!")
return None
return self.__getitem__(index + 1)
H, W = depth_data.shape
# color_data = cv2.resize(color_data, (W, H), interpolation=cv2.INTER_LINEAR)
if self.edge:
edge = self.edge # crop image edge, there are invalid value on the edge of the color image
color_data = color_data[edge:-edge, edge:-edge]
depth_data = depth_data[edge:-edge, edge:-edge]
if self.depth_transform:
depth_data = self.depth_transform(depth_data)
bbox_dict = {}
if self.imap_mode:
obj = np.zeros_like(depth_data)
else:
inst_list = []
batch_masks = []
if self.edge:
edge = self.edge
inst = inst[edge:-edge, edge:-edge]
obj_ = np.zeros_like(inst)
for inst_id in np.unique(inst):
if inst_id in self.background_cls_list:
continue
obj_mask = inst == inst_id
batch_masks.append(obj_mask)
inst_list.append(inst_id)
if len(batch_masks) > 0:
batch_masks = torch.from_numpy(np.stack(batch_masks))
cmins, cmaxs, rmins, rmaxs = get_bbox2d_batch(batch_masks)
for i in range(batch_masks.shape[0]):
w = rmaxs[i] - rmins[i]
h = cmaxs[i] - cmins[i]
if w <= 10 or h <= 10: # too small#框太小则不作处理 todo
continue
bbox_enlarged = enlarge_bbox([rmins[i], cmins[i], rmaxs[i], cmaxs[i]], scale=bbox_scale,
w=inst.shape[1], h=inst.shape[0])
inst_id = inst_list[i]
obj_[batch_masks[i]] = 1 # 将batch_masks 为True的地方 在obj_中标记为1
bbox_dict.update({inst_id: torch.from_numpy(np.array(
[bbox_enlarged[1], bbox_enlarged[3], bbox_enlarged[0],
bbox_enlarged[2]]))}) # bbox order 创建一个bbox_dict 字典 键值对为 obj_id
inst[obj_ == 0] = 255#for background
obj = inst#obj 为对应类别, background 为255
bbox_dict.update({0: torch.from_numpy(np.array([int(0), int(obj.shape[0]), 0, int(obj.shape[1])]))})
# wrap data to frame dict
T_obj = np.eye(4)
depth_data = torch.from_numpy(depth_data) * self.scale
color_data = torch.from_numpy(color_data)
sample = {"image":color_data.type(torch.uint8), "depth": depth_data, "T": T, "T_obj": T_obj,"obj":obj,"bbox_dict":bbox_dict,"frame_id": index}
if color_data is None or depth_data is None:
print(color_path)
print(depth_path)
raise ValueError
return sample
def pose_matrix_from_quaternion(self, pvec):
""" convert 4x4 pose matrix to (t, q) """
from scipy.spatial.transform import Rotation
pose = np.eye(4)
pose[:3, :3] = Rotation.from_quat(pvec[3:]).as_matrix()
pose[:3, 3] = pvec[:3]
return pose`
Question 2: It is about obtaining the instance id and semantic id in Detic. I have debugged the detic code, but I can only get the semantic id of each object, that is, the category id. I would like to ask how to process it to get the instance id.
For the above problems, I hope you can give me some corrective opinions or steps, and look forward to your reply as soon as possible, which will be very grateful.
Hello, thank you for sharing the code for this amazing research!
I have two questions:
Thank you!
Dear authors,
Thank you for sharing your outstanding work!
I am wondering how can I replicate the Depth L1 results in Table C in the supplementary material.
Any guidance would be highly appreciated.
Thanks in advance for your time in reading and responding to my question.
Regards,
I see in configs the image is cropped according to self.edge(e.g. mw or mh).
I want to know the crop size mw(or mh) is all 10 for Scannet dataset and 0 for Replica dataset?
Thanks for your excellent work and congratulations on the acceptance! I'm trying to reproduce the result on TUM dataset. Here is my process:
Despite following these steps, I am still unable to obtain meaningful reconstruction results. I have a couple of questions that I hope you can help me with:
Thank you in advance for your help.
First of all, thank you for the sharing the great work you've done!!!
I was just wondering how to solve the problem regarding the sudden crash happening in WSL2.
The crash point is like as follows
Computer setup is
Thx!
Let me start by congratulating you for the great work and thanking you for the well-organized and easy-to-follow repo.
I am wondering about the steps to test vMAP on a live stream data from an intel realsense camera or a Microsoft Kinect camera.
Any suggestions or recommendations will be highly appreciated.
Regards,
Thank you for your excellent work!
I would like to ask if this parallel implementation of vmap can only be used for simple mlp, and if I want to use something like instant-ngp is feasible.
Look forward to your reply!
Thanks for your work!
In the replica dataset you provide, we can see the visual instance images, couldyou upload the visualization code?
How did you get the mesh for each object (for TSDF-Fusion and iMAP)?
Hi,
Thanks for releasing code! I have some question about the ground-truth object mesh in the replica dataset:
1.Line 27-28 in data_generation/extract_inst_obj.py, why does sub_mesh_indices[object_id]
append twice?
Hello. I appreciate you sharing code of this impressive research.
I'm now interested in evaluating the performance of vMAP in different replica scenes, so I'd like to ask about the data generation process.
First, how were the trajectories for model training (trajectory 00) and novel view synthesis (trajectory 01) generated?
Second, how to obtain object-level meshes ? (to evaluate the 3D reconstruction performance at the object level) The original replica dataset seems to provides only scene-level meshes.
Thank you!
Hello,
in the config file for Replica dataset the following parameters are used:
"camera": {
"w": 1200,
"h": 680,
"fx": 600.0,
"fy": 600.0,
"cx": 599.5,
"cy": 339.5,
"mw": 0,
"mh": 0
}
But actually the width of the image is 680
and the height is 1200
. Also I notice the image are rotated counter clockwise by 90 degrees.
Can you explain why the camera config doesn't match the actual image size?
Thank you
Hi! I have a question regarding regarding the semantic_instance files!
Why do we need it for?!
It seems like it is all black...!!
Thank you in advance!!
after i run: python2 reader.py --filename ~/data/ScanNet/scannet/scans/scene0024_00/scene0024_00.sens --output_path ~/data/ScanNet/objnerf/ --export_depth_images --export_color_images --export_poses --export_intrinsics, there are /color /depth /intrinsic /pose folders in the output_path ,but there is no instance-filt or label-filt folder in the output_path, which run vmap on scannet dataset needs,how to solve it hoping for your reply,thanks!
Hi!!
I have questions regarding the vectorised training.....!!
As far as I understood, the vectorised training is somewhat similar to KiloNeRF,
so I was wondering which part of the code is related to the vectorised training!
I am bit confused with the kiloNeRF since it looks like scene is uniformly distributed into n even blocks,
so I was wondering whether the render_rays.make_3D_grid part is related to vectorised training....! If not, could you tell me which part of the code is related to vectorised training?
Thank you!!
Thanks for your nice work. When I run python ./train.py --config ./configs/Replica/config_replica_room0_vMAP.json --logdir ./logs/vMAP/room0 --save_ckpt True
, I can get mesh of different objects, but the scene mesh as video is missing, How can I to get it?
Let me start by congratulating you for the great work and thanking you for the well-organized and easy-to-follow repo.
But I don't seem to see the part of the code that is tested on TUM-RGBD
I want to know how to run vmap on the TUM-RGBD dataset, if possible, can you give the relevant steps or detailed code? Would appreciate it if possible.
Looking forward to your reply as soon as possible!
Hi, thanks for open sourcing your code!
I'm just trying to get iMAP running and I have two questions:
Thank you
Excuse me, when I run the Vmap code you provided, the 3D mesh vis window pops up and remains unresponsive
. Which other environments require configuration? Thank you very much! :)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.