fictionarry / dngaussian Goto Github PK

View Code? Open in Web Editor NEW

231.0 231.0 14.0 13.7 MB

[CVPR'24] DNGaussian: Optimizing Sparse-View 3D Gaussian Radiance Fields with Global-Local Depth Normalization

Home Page: https://fictionarry.github.io/DNGaussian/

License: Other

Python 80.17% Shell 1.92% C++ 0.17% Cuda 17.27% C 0.47%

dngaussian's People

Contributors

Stargazers

Watchers

Forkers

abecadel axifeng ezhangle lenubolim zhaopufeng hiyyg hp-wuyongxing qian5683 m11pha whuhxb yaelcassini penguin-jpg atrovast jackzhousz

dngaussian's Issues

Wait for the code release

The use of loss for hard depth and soft depth starts at the moment

I notice that each data has a different time to start using soft and hard depth loss, is this a big difference? Whether it can be considered that the generality of these losses is not so strong is a result of parameter adjustment

ModuleNotFoundError: No module named '_shencoder'

Hello, Thanks for this great work
I have encountered some problems when I run this code. There has a error in line31 of shencoder/backend.py

Actually, I have already installed Ninja, and I didn't meet problem in the installation of gridencoder

So could you please give some recommends to solve this problem? Thank you very much

Use_sh in Train_blender.sh

Dear authors,

Thanks for your work!

I noticed that you are using the training_sh function when training Blender, but not for the DTU and LLFF datasets. Could you explain the difference between this function and the regular training function?

Best regards.

Question about the SfM points used for 3DGS on DTU

Thanks for your great work! Could you please share your setting to acquire the initial SfM points on DTU with 3 input views for 3DGS evaluation? Did you use the original resolution (1600 * 1200) or the evaluated resolution (400 * 300) images to run COLMAP? Hoping to get answer from you!

Experimental configuration of the "More Input Views"

Hi! Thank you for your great work. I have questions about the "More Input Views" settings mentioned in the supplementary materials. I found it hard to reach the reported performance in Table 11 using the train_llff.sh configuration with 6 or 9 views input on the LLFF dataset. I wonder if it is possible to supply the experimental configuration of the "More Input Views".

Question about colmap on DTU dataset

Thanks for your fancy work~
Here I met some problems when re-producing your work on DTU dataset. Specifically, the colmap failed on the scan30, 63 and 82. Most of images showed "Could not register, trying another image" and I can only get maybe "Reconstruction with 2 images and 180 points"? Is that normal? or in which step did I make an error? It would be better if you can provide the colmap result of DTU dataset.
Thanks again and looking forward to your reply~

Is This Figure Wrong?

I think the PSNR should be 29?

Question about the self._depth_err attribute in GaussianModel

Hello, I'm wondering why we need the optimizable attribute self._depth_err in the GaussianModel. Thanks in advance.

the

Where is the _C module?

Request DTU dataset

Very outstanding and wonderful work! May I ask if it is possible to provide the DTU dataset used in the paper, as the performance has decreased after trying colmap DTU myself. Thank you a lot!

Can depth regularization be used for multi-view setting?

Hello, excellent job! I know it's an algorithm for few-shot rendering. But I want to apply the depth regularization to the task where the input is multiple images. However, the results of my experiments are somewhat degraded compared to 3dgs. Do you think that the depth regularization in this paper can be useful for multi-view setting? Or could you give me some insights? Thank you!

Questions about initial point cloud

Hi, Jiahe. I found that before obtaining the randomly initialized point cloud, the file 'sparse/0/points3D.bin' is read. how the 'points3D' is generated?

Questions about freezing scaling and rotation for depth normalization and further freezing center for soft depth

Hello, thanks for sharing your excellent work. I have two questions.

In the paper, you mentioned during depth regularization, we should freeze the scaling $s$ and rotation $q$ and further freeze center $\mu$ for soft depth normalization. But I don't understand how you freeze these parameters in code i.e. by setting the corresponding learning rate to 0. Could you please point out the corresponding code for me?
Another question is the following loss terms seem not mentioned in the paper. Could you please explain what their functionality is?

train_dtu.py
# Reg
loss_reg = torch.tensor(0., device=loss.device)
shape_pena = (gaussians.get_scaling.max(dim=1).values / gaussians.get_scaling.min(dim=1).values).mean()
# scale_pena = (gaussians.get_scaling.max(dim=1).values).std()
scale_pena = ((gaussians.get_scaling.max(dim=1, keepdim=True).values)**2).mean()
opa_pena = 1 - (opacity[opacity > 0.2]**2).mean() + ((1 - opacity[opacity < 0.2])**2).mean()

Thanks in advance.

The dimension is wrong when calculate loss

Thanks for your fancy work~
Here I met some problems when re-producing your work on LLFF dataset. The dimension is wrong when calculate loss
Generating random point cloud (5024)... [24/03 11:26:18]
Loading Training Cameras [1.0] [24/03 11:26:18]
Loading Test Cameras [1.0] [24/03 11:26:20]
Loading Eval Cameras [1.0] [24/03 11:26:22]
Number of points at initialisation : 5024 [24/03 11:26:22]
Reading camera 180/180 [24/03 11:26:22]
Loading Render Cameras [1.0] [24/03 11:26:22]
Training progress: 0%| | 0/6000 [00:00<?, ?it/s]Traceback (most recent call last):
File "train_llff.py", line 406, in
training(lp.extract(args), op.extract(args), pp.extract(args), args.test_iterations, args.save_iterations, args.checkpoint_iterations, args.start_checkpoint, args.debug_from, args.near)
File "train_llff.py", line 161, in training
loss = Ll1 + opt.lambda_dssim * (1.0 - ssim(image, gt_image))
File "/mnt/star/3DGS/DNGaussian-main/utils/loss_utils.py", line 130, in ssim
return _ssim(img1, img2, window, window_size, channel, size_average)
File "/mnt/star/3DGS/DNGaussian-main/utils/loss_utils.py", line 134, in _ssim
mu1 = F.conv2d(img1, window, padding=window_size // 2, groups=channel)
RuntimeError: Expected 4-dimensional input for 4-dimensional weight [3, 1, 11, 11], but got 3-dimensional input of size [3, 378, 504] instead
Training progress: 0%|

Result Visualization

Thanks for the great work! I'm wondering is it possible to load the training results to a viewer like SIBR_viewer in the original gaussian splatting? I found that the point cloud format is not compatible with it. How can I modify the .ply files?

custom dataset

when i try to rebuild my own dataset, firstly i use python convert.py -s data to get colmap data
and i use python imgs2poses.py data to get poses_bounds.npy
then i use bash scripts/run_llff.sh to train but the results are worse than 3dgs
the depth maps are good

any advice?

Custom Dataset

How do I run this on my own scenes, I tried running train_blender with a colmap dataset(convert.py) with "depth_maps" inside for 6000 iters. It produces just a random black blob. Am I missing something? Also, how would I run this without colmap?

output/llff/nerf_llff_data/fern/chkpnt_latest.pth

FileNotFoundError: [Errno 2] No such file or directory: 'output/llff/nerf_llff_data/fern/chkpnt_latest.pth'，How can I solve this problem?

train stop at Number of points at initialisation , not continue

thank your DNGaussian , i like it very much !

windows-11 , cuda-11.8 , pytorch-2.2

python train_llff.py  ^
-s E:\AI\A28\240316\win_cuda118\input\TaiDi_1280_35=5x7 ^
--model_path E:\AI\A28\240316\win_cuda118\input\TaiDi_1280_35=5x7\output ^
-r 1 --n_sparse -1  --rand_pcd --iterations 6000 --lambda_dssim 0.2 ^
--densify_grad_threshold 0.0013 --prune_threshold 0.01 --densify_until_iter 6000 --percent_dense 0.01 ^
--position_lr_init 0.016 --position_lr_final 0.00016 --position_lr_max_steps 5500 --position_lr_start 500 ^
--split_opacity_thresh 0.1 --error_tolerance 0.001 --scaling_lr 0.003 --shape_pena 0.002 --opa_pena 0.001 --near 10

Launch TensorBoard
Optimizing E:\AI\A28\240316\win_cuda118\input\TaiDi_1280_35=5x7\output
Output folder: E:\AI\A28\240316\win_cuda118\input\TaiDi_1280_35=5x7\output [25/03 13:45:41]
Reading camera 35/35 [25/03 13:45:42]
train [ all images..] [25/03 13:45:42]
eval [] [25/03 13:45:42]
Init random point cloud. [25/03 13:45:42]
[14.11190049 15.89008596 25.18116638] [-43.88950791 -33.10698416 -11.5085673 ] [25/03 13:45:42]
[58.00140841 48.99707012 36.68973367] [25/03 13:45:42]
Generating random point cloud (2900)... [25/03 13:45:42]
Loading Training Cameras [1.0] [25/03 13:45:42]
Loading Test Cameras [1.0] [25/03 13:45:45]
Loading Eval Cameras [1.0] [25/03 13:45:45]
Number of points at initialisation :  2900 [25/03 13:45:45]
....

train stop , not continue

Why not considering ddepth_dmean during backward pass?

In backward preprocessCUDA, the effect of depth on mean is not considered. Maybe adding the following codes is better.

	// the w must be equal to 1 for view^T * [x,y,z,1]
	float3 m_view = transformPoint4x3(m, view);
	
	// Compute loss gradient w.r.t. 3D means due to gradients of depth
	// from rendering procedure
	glm::vec3 dL_dmean2;
	float mul3 = view[2] * m.x + view[6] * m.y + view[10] * m.z + view[14];
	dL_dmean2.x = (view[2] - view[3] * mul3) * dL_ddepth[idx];
	dL_dmean2.y = (view[6] - view[7] * mul3) * dL_ddepth[idx];
	dL_dmean2.z = (view[10] - view[11] * mul3) * dL_ddepth[idx];
	
	// That's the third part of the mean gradient.
	dL_dmeans[idx] += dL_dmean2;

The code is cited from this repo.

Questions about test

I noticed that in metrics.py, the evaluation data is used to obtain quantitative results using only 3 pictures. Could you please explain why test data is not used for obtaining these results? For example, the "fern" has 17 pictures available for testing.

Thank you for your time and assistance.

apply hard depth regularization to dynamic Gaussians

Dear author,
Hi！
if I need to apply hard depth regularization to dynamic Gaussians（4D-GS）, how should I set the opacity in render_for_depth_sh?

I greatly appreciate and look forward to your response and assistance.

hello，have you ever run your method on initialized sfm point cloud by colmap？

why depth_mono = 255.0 - viewpoint_cam.depth_mono

Could you tell me why use the "depth_mono = 255.0 - viewpoint_cam.depth_mono" like in train_dtu.py 102. I see that you load the depth first with PILToTorch and then with "depth_mono = 255.0 - viewpoint_cam.depth_mono", which I don't quite understand. Thank you!

Questions about the size and value range of the ground truth depth and rendered depth

Hi, thanks again for sharing the amazing work. I have questions about the size and value range of the rendered and ground truth depth.

Is it ok for the rendered depth and the ground truth depth to have very different value ranges, like one in [0,10] and the other one in [0,70000]?
Is the shape of the rendered depth and ground truth depth (1, height, width)?

Thanks in advance.

Optimal Parameters for 360 scenes

Thank you for the great work! I was wondering what the optimal parameters would be for 360 degree captures of scene level data with relatively high frequency detail. I also intend on training for the full 30k iterations. Thanks in advance!

the results are bad

hi，i find the training results are not good ，using the default parameter setting, can you give me some advice on how to adjust the parameter

error during training

Traceback (most recent call last):
File "train_llff.py", line 400, in
training(lp.extract(args), op.extract(args), pp.extract(args), args.test_iterations, args.save_iterations, args.checkpoint_iterations, args.start_checkpoint, args.debug_from, args.near)
File "train_llff.py", line 95, in training
render_pkg = render_for_depth(viewpoint_cam, gaussians, pipe, background)
File "/media/public/disk5/gaohc/huanghy/DNGaussian-main/gaussian_renderer/init.py", line 178, in render_for_depth
rendered_image, radii, rendered_depth, rendered_alpha = rasterizer(
ValueError: not enough values to unpack (expected 4, got 2)

Why did this happen? I thoroughly follow your setting...

Question on experiment data

Why is the PSNR 19.22 while training on LLFF with vanilla 3DGS (3000 iterations). But the work is lower than it.

visualize the training results

Your work is very good, but I have a question to ask you, how can we visualize the training results?

need for qualitative results

Thanks for your great work!

Could you please provide the qualitative results of the experimental and comparative methods presented in your paper, as well as the code for evaluating rendering speed (FPS).

Looking forward to your response.

Question about the difference between rendered_depth and depth_mono.

Hello, jiahe.
Thanks for your amazing work!

I'm a little confused about the depth loss part. I follow the guide and got the 'drum' dataset and its depth_mono. When training around 5500, I found that the rendered_depth had a big difference with depth_mono.
It seems that on those sides of the diagonal edge, depth went 30->10, but depth_mono went 30->150. Somehow I feel like they were like complementary for each other？
(感觉两个深度的数量级差了一位，然后他俩深浅的算法似乎是相反的)

Are those numbers reasonable? Or that I might have done datasets wrong. If that's right, I wonder why this is reasonable. Thank you so much!!

非常感谢

questions about neareset gaussians

Thanks for your work!
In the optimization of hard depth, the nearest gaussians closest to the camera are mentioned. How will the gaussians other than the nearest gaussians be processed? Will they always exist in the optimization process or will they be deleted in the subsequent densify process because the opacity is too low?

Error when generating monocular depths by DPT for the LLFF dataset

Hello, I installed the environment by following the instructions and tried to generate the depths for LLFF dataset by running python get_depth_map_for_llff_dtu.py --root_path ../data/llff/nerf_llff_data/ --benchmark LLFF but I got the following error:

Using cache found in /local/home/lijiaj/.cache/torch/hub/intel-isl_MiDaS_master
/local/home/lijiaj/miniconda3/envs/dngaussian/lib/python3.7/site-packages/timm/models/_factory.py:121: UserWarning: Mapping deprecated model name vit_base_resnet50_384 to current vit_base_r50_s16_384.orig_in21k_ft_in1k.
  **kwargs,
Using cache found in /local/home/lijiaj/.cache/torch/hub/intel-isl_MiDaS_master
image_paths: [['../data/llff/nerf_llff_data/fern/images/IMG_4026.JPG', '../data/llff/nerf_llff_data/fern/images/IMG_4027.JPG', '../data/llff/nerf_llff_data/fern/images/IMG_4028.JPG', '../data/llff/nerf_llff_data/fern/images/IMG_4029.JPG', '../data/llff/nerf_llff_data/fern/images/IMG_4030.JPG', '../data/llff/nerf_llff_data/fern/images/IMG_4031.JPG', '../data/llff/nerf_llff_data/fern/images/IMG_4032.JPG', '../data/llff/nerf_llff_data/fern/images/IMG_4033.JPG', '../data/llff/nerf_llff_data/fern/images/IMG_4034.JPG', '../data/llff/nerf_llff_data/fern/images/IMG_4035.JPG', '../data/llff/nerf_llff_data/fern/images/IMG_4036.JPG', '../data/llff/nerf_llff_data/fern/images/IMG_4037.JPG', '../data/llff/nerf_llff_data/fern/images/IMG_4038.JPG', '../data/llff/nerf_llff_data/fern/images/IMG_4039.JPG', '../data/llff/nerf_llff_data/fern/images/IMG_4040.JPG', '../data/llff/nerf_llff_data/fern/images/IMG_4041.JPG', '../data/llff/nerf_llff_data/fern/images/IMG_4042.JPG', '../data/llff/nerf_llff_data/fern/images/IMG_4043.JPG', '../data/llff/nerf_llff_data/fern/images/IMG_4044.JPG', '../data/llff/nerf_llff_data/fern/images/IMG_4045.JPG'], []]
k, img.shape: 0 (3024, 4032, 3)
Traceback (most recent call last):
  File "get_depth_map_for_llff_dtu.py", line 84, in <module>
    prediction = midas(input_batch)
  File "/local/home/lijiaj/miniconda3/envs/dngaussian/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/local/home/lijiaj/.cache/torch/hub/intel-isl_MiDaS_master/midas/dpt_depth.py", line 166, in forward
    return super().forward(x).squeeze(dim=1)
  File "/local/home/lijiaj/.cache/torch/hub/intel-isl_MiDaS_master/midas/dpt_depth.py", line 114, in forward
    layers = self.forward_transformer(self.pretrained, x)
  File "/local/home/lijiaj/.cache/torch/hub/intel-isl_MiDaS_master/midas/backbones/vit.py", line 13, in forward_vit
    return forward_adapted_unflatten(pretrained, x, "forward_flex")
  File "/local/home/lijiaj/.cache/torch/hub/intel-isl_MiDaS_master/midas/backbones/utils.py", line 86, in forward_adapted_unflatten
    exec(f"glob = pretrained.model.{function_name}(x)")
  File "<string>", line 1, in <module>
  File "/local/home/lijiaj/.cache/torch/hub/intel-isl_MiDaS_master/midas/backbones/vit.py", line 68, in forward_flex
    x = blk(x)
  File "/local/home/lijiaj/miniconda3/envs/dngaussian/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/local/home/lijiaj/miniconda3/envs/dngaussian/lib/python3.7/site-packages/timm/models/vision_transformer.py", line 164, in forward
    x = x + self.drop_path1(self.ls1(self.attn(self.norm1(x))))
  File "/local/home/lijiaj/miniconda3/envs/dngaussian/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/local/home/lijiaj/miniconda3/envs/dngaussian/lib/python3.7/site-packages/timm/models/vision_transformer.py", line 86, in forward
    qkv = self.qkv(x).reshape(B, N, 3, self.num_heads, self.head_dim).permute(2, 0, 3, 1, 4)
  File "/local/home/lijiaj/miniconda3/envs/dngaussian/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/local/home/lijiaj/miniconda3/envs/dngaussian/lib/python3.7/site-packages/torch/nn/modules/linear.py", line 114, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`

关于文中提到的 neareset gaussians

您好，我注意到文中有提到 depth regularization 时，主要针对的是基于射线的最近的高斯点，目的是期望促进这些近处的高斯的移动，也即：

we render a "hard depth" that mainly consists of the nearest Gaussians on the ray shot from camera center o and across the pixel

但我在代码的实现中似乎并没有看到根据相机位姿和pixel来提取 nearset Gaussians 这一步

我是否遗漏了哪些地方呢，又或者这种对于 nearest Gaussians 的约束，主要也是隐含在 depth 的loss里，并不需要额外的显式提取？

感谢您的帮助