tqtqliu / mvsgaussian Goto Github PK

View Code? Open in Web Editor NEW

334.0 26.0 16.0 76.09 MB

[ECCV 2024] MVSGaussian: Fast Generalizable Gaussian Splatting Reconstruction from Multi-View Stereo

Home Page: https://mvsgaussian.github.io/

License: MIT License

Python 99.35% Shell 0.65%

eccv2024 gaussian-splatting generalizable multi-view-stereo novel-view-synthesis

mvsgaussian's Introduction

MVSGaussian: Fast Generalizable Gaussian Splatting Reconstruction from Multi-View Stereo

Tianqi Liu¹ Guangcong Wang^2,3 Shoukang Hu² Liao Shen¹
Xinyi Ye¹ Yuhang Zang⁴ Zhiguo Cao¹ Wei Li² Ziwei Liu²

¹Huazhong University of Science and Technology ²S-Lab, Nanyang Technological University
³Great Bay University ⁴Shanghai AI Laboratory

TL;DR: MVSGaussian is a Gaussian-based method designed for efficient reconstruction of unseen scenes from sparse views in a single forward pass. It offers high-quality initialization for fast training and real-time rendering.

⚡ Updates

[2024.07.16] The latest updated code supports multi-batch training (details) and inference, and a single RTX 3090 GPU is sufficient to reproduce all of our experimental results.
[2024.07.16] Added a Demo (Custom Data) that only requires multi-view images as input.
[2024.07.10] Code and checkpoints are released.
[2024.07.01] Our work is accepted by ECCV2024.
[2024.05.21] Project Page | arXiv | YouTube released.

🌟 Abstract

We present MVSGaussian, a new generalizable 3D Gaussian representation approach derived from Multi-View Stereo (MVS) that can efficiently reconstruct unseen scenes. Specifically, 1) we leverage MVS to encode geometry-aware Gaussian representations and decode them into Gaussian parameters. 2) To further enhance performance, we propose a hybrid Gaussian rendering that integrates an efficient volume rendering design for novel view synthesis. 3) To support fast fine-tuning for specific scenes, we introduce a multi-view geometric consistent aggregation strategy to effectively aggregate the point clouds generated by the generalizable model, serving as the initialization for per-scene optimization. Compared with previous generalizable NeRF-based methods, which typically require minutes of fine-tuning and seconds of rendering per image, MVSGaussian achieves real-time rendering with better synthesis quality for each scene. Compared with the vanilla 3D-GS, MVSGaussian achieves better view synthesis with less training computational cost. Extensive experiments on DTU, Real Forward-facing, NeRF Synthetic, and Tanks and Temples datasets validate that MVSGaussian attains state-of-the-art performance with convincing generalizability, real-time rendering speed, and fast per-scene optimization.

🔨 Installation

Clone our repository

git clone https://github.com/TQTQliu/MVSGaussian.git
cd MVSGaussian

Set up the python environment

conda create -n mvsgs python=3.7.13
conda activate mvsgs
pip install -r requirements.txt
pip install torch==1.13.1+cu116 torchvision==0.14.1+cu116 torchaudio==0.13.1 -f https://download.pytorch.org/whl/torch_stable.html

Install Gaussian Splatting renderer

git clone https://github.com/graphdeco-inria/gaussian-splatting --recursive
pip install gaussian-splatting/submodules/diff-gaussian-rasterization
pip install gaussian-splatting/submodules/simple-knn

🤗 Demo (Custom Data)

Inference

First, prepare the multi-view image data, and then run colmap. Here, we take examples/scene1 (examples data) as an example:

python lib/colmap/imgs2poses.py -s examples/scene1

Tip: If you already have sparse reconstruction results, i.e. sparse/0/cameras.bin, sparse/0/images.bin, sparse/0/points3D.bin, and want to skip the colmap reconstruction step of the script, you can place the above sparse folder in the examples/scene1 directory and run the same command. The script recognizes that sparse reconstruction results already exist, automatically skips the colmap reconstruction phase, and simply organizes the existing results to produce the required poses_bounds.npy.

And execute the following command to obtain novel views:

python run.py --type evaluate --cfg_file configs/mvsgs/colmap_eval.yaml test_dataset.data_root examples/scene1

or videos:

python run.py --type evaluate --cfg_file configs/mvsgs/colmap_eval.yaml test_dataset.data_root examples/scene1 save_video True

Train on your own data

If you want to train our model on your own data, you can execute the following commands:

python train_net.py --cfg_file configs/mvsgs/colmap_eval.yaml train_dataset.data_root examples/scene1 test_dataset.data_root examples/scene1

You can specify the gpus in configs/mvsgs/dtu_pretrain.yaml. And you can modify the exp_name in the configs/mvsgs/dtu_pretrain.yaml. Before training, the code will first check whether there is checkpoint in trained_model/mvsgs/exp_name, and if so, the latest checkpoint will be loaded. During training, the tensorboard log will be save in record/mvsgs/exp_name, the trained checkpoint will be save in trained_model/mvsgs/exp_name, and the rendering results will be saved in result/mvsgs/exp_name.

Per-scene optimization

For per-scene optimization, first run the generalizable model to obtain the point cloud as initialization for subsequent optimization.

python run.py --type evaluate --cfg_file configs/mvsgs/colmap_eval.yaml test_dataset.data_root examples/scene1 save_ply True dir_ply <path to save ply>

The point cloud will be saved in <path to save ply>/scene1/scene1.ply. Note that this point cloud is a normal geometric point cloud, not a Gaussian point cloud, and you can open it through MeshLab.

And then run the 3DGS optimization:

python lib/train.py  --eval --iterations <iter> -s examples/scene1 -p <path to save ply>

The optimized Gaussian point cloud will be saved in output/scene1/point_cloud/iteration_<iter>/point_cloud.ply, and you can open it through 3DGS viewer.

Run the following commands to synthesize target views and calculate metrics:

python lib/render.py -c -m output/scene1 --iteration <iter> -p <path to save ply>
python lib/metrics.py -m output/scene1

Add -v to obtain the rendered video:

python lib/render.py -c -m output/scene1 -p <path to save ply> -v

📦 Datasets

DTU

Download DTU data and Depth raw. Unzip and organize them as:

mvs_training
    ├── dtu                   
        ├── Cameras                
        ├── Depths   
        ├── Depths_raw
        └── Rectified

Download NeRF Synthetic, Real Forward-facing, and Tanks and Temples datasets.

🚂 Training

Train generalizable model

To train a generalizable model from scratch on DTU, specify data_root in configs/mvsgs/dtu_pretrain.yaml first and then run:

python train_net.py --cfg_file configs/mvsgs/dtu_pretrain.yaml train.batch_size 4

Our code also supports multi-gpu training. The released pretrained model (paper) was trained with 4 RTX 3090 GPUs with a batch size of 1 for each GPU:

python -m torch.distributed.launch --nproc_per_node=4 train_net.py --cfg_file configs/mvsgs/dtu_pretrain.yaml distributed True gpus 0,1,2,3 train.batch_size 1

You can also use 4 GPUs, with a batch size of 4 for each GPU:

python -m torch.distributed.launch --nproc_per_node=4 train_net.py --cfg_file configs/mvsgs/dtu_pretrain.yaml distributed True gpus 0,1,2,3 train.batch_size 4

We provide the results as a reference below:

GPU number	Batch size	Checkpoint	DTU			Real Forward-facing			NeRF Synthetic			Tanks and Temples			Training time (per epoch)	Training memory
GPU number	Batch size	Checkpoint	PSNR	SSIM	LPIPS	PSNR	SSIM	LPIPS	PSNR	SSIM	LPIPS	PSNR	SSIM	LPIPS	Training time (per epoch)	Training memory
1	4	1gpu_4batch	28.23	0.963	0.075	24.19	0.860	0.164	26.57	0.948	0.070	23.50	0.879	0.137	~12min	~22G
4	1	4gpu_1batch (paper)	28.21	0.963	0.076	24.07	0.857	0.164	26.46	0.948	0.071	23.29	0.878	0.139	~5min	~7G
4	4	4gpu_4batch	28.56	0.964	0.073	24.02	0.858	0.165	26.28	0.947	0.072	23.14	0.876	0.147	~14min	~23G

Per-scene optimization

One strategy is to optimize only the initial Gaussian point cloud provided by the generalizable model.

bash scripts/mvsgs/llff_ft.sh
bash scripts/mvsgs/nerf_ft.sh
bash scripts/mvsgs/tnt_ft.sh

We provide optimized Gaussian point clouds for each scenes here.

You can also run the following command to get the results of vanilla 3D-GS, whose initialization is obtained via COLMAP.

bash scripts/3dgs/llff_ft.sh
bash scripts/3dgs/nerf_ft.sh
bash scripts/3dgs/tnt_ft.sh

It is worth noting that for the LLFF dataset, the point cloud in the original dataset is obtained by using all views. For fair comparison, we only use the training view set to regain the point cloud, so we recommend downloading the LLFF dataset we processed.

(Optional) Another approach is to optimize the entire pipeline, similar to NeRF-based methods.

Here we take the fern on the LLFF as an example:

cd ./trained_model/mvsgs
mkdir llff_ft_fern
cp dtu_pretrain/latest.pth llff_ft_fern
cd ../..
python train_net.py --cfg_file configs/mvsgs/llff/fern.yaml

🎯 Evaluation

Evaluation on DTU

Download the pretrained model and put it into trained_model/mvsgs/dtu_pretrain/latest.pth

Use the following command to evaluate the pretrained model on DTU:

python run.py --type evaluate --cfg_file configs/mvsgs/dtu_pretrain.yaml mvsgs.cas_config.render_if False,True mvsgs.cas_config.volume_planes 48,8 mvsgs.eval_depth True

The rendered images will be saved in result/mvsgs/dtu_pretrain.

Evaluation on Real Forward-facing

python run.py --type evaluate --cfg_file configs/mvsgs/llff_eval.yaml

Evaluation on NeRF Synthetic

python run.py --type evaluate --cfg_file configs/mvsgs/nerf_eval.yaml

Evaluation on Tanks and Temples

python run.py --type evaluate --cfg_file configs/mvsgs/tnt_eval.yaml

Render videos

Add the save_video True argument to save videos, such as:

python run.py --type evaluate --cfg_file configs/mvsgs/llff_eval.yaml save_video True

For optimized Gaussians, add -v to save videos, such as:

python lib/render.py -m output/$scene -p $dir_ply -v

See scripts/mvsgs/nerf_ft.sh for $scene and $dir_ply.

📝 Citation

If you find our work useful for your research, please cite our paper.

@article{liu2024mvsgaussian,
    title={MVSGaussian: Fast Generalizable Gaussian Splatting Reconstruction from Multi-View Stereo},
    author={Liu, Tianqi and Wang, Guangcong and Hu, Shoukang and Shen, Liao and Ye, Xinyi and Zang, Yuhang and Cao, Zhiguo and Li, Wei and Liu, Ziwei},
    journal={arXiv preprint arXiv:2405.12218},
    year={2024}
}

😃 Acknowledgement

This project is built on source codes shared by Gaussian-Splatting, ENeRF, MVSNeRF and LLFF. Many thanks for their excellent contributions!

📧 Contact

If you have any questions, please feel free to contact Tianqi Liu (tq_liu at hust.edu.cn).

mvsgaussian's People

Contributors

Stargazers

Watchers

Forkers

mengxuyigit dragonlqin whuhxb daiguangzhao mvsgaussian sorcererq lzhi0505 qian5683 alakia 0iui0 hiyyg jackzhousz imnotprepared gefucvpr24 lebron12332 lxh118

mvsgaussian's Issues

Hello!I encountered an error!

(mvsgs) PS E:\3dgs\MVSGaussian> pip install gaussian-splatting/submodules/diff-gaussian-rasterization
Processing e:\3dgs\mvsgaussian\gaussian-splatting\submodules\diff-gaussian-rasterization
Preparing metadata (setup.py) ... done
Building wheels for collected packages: diff-gaussian-rasterization
Building wheel for diff-gaussian-rasterization (setup.py) ... error
error: subprocess-exited-with-error

× python setup.py bdist_wheel did not run successfully.
│ exit code: 1
╰─> [11 lines of output]
running bdist_wheel
D:\anconda\envs\mvsgs\lib\site-packages\torch\utils\cpp_extension.py:476: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
warnings.warn(msg.format('we could not find ninja.'))
running build
running build_py
running build_ext
D:\anconda\envs\mvsgs\lib\site-packages\torch\utils\cpp_extension.py:358: UserWarning: Error checking compiler version for cl: [WinError 2] 系统找不到指定的文件。
warnings.warn(f'Error checking compiler version for {compiler}: {error}')
building 'diff_gaussian_rasterization.C' extension
"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6\bin\nvcc" -c cuda_rasterizer/backward.cu -o build\temp.win-amd64-cpython-37\Release\cuda_rasteriz
er/backward.obj -ID:\anconda\envs\mvsgs\lib\site-packages\torch\include -ID:\anconda\envs\mvsgs\lib\site-packages\torch\include\torch\csrc\api\include -ID:\ancon
da\envs\mvsgs\lib\site-packages\torch\include\TH -ID:\anconda\envs\mvsgs\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUD
A\v11.6\include" -ID:\anconda\envs\mvsgs\include -ID:\anconda\envs\mvsgs\Include "-ID:\Visual Studio 2022 Community\VC\Tools\MSVC\14.38.33130\include" "-ID:\Visu
al Studio 2022 Community\VC\Tools\MSVC\14.38.33130\ATLMFC\include" "-ID:\Visual Studio 2022 Community\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows
Kits\10\include\10.0.22621.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.2
2621.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\cppwinrt
" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.8\include\um" -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -Xcudafe --diag_suppress=dll
interface_conflict_none_assumed -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcompiler /
EHsc -Xcompiler /wd4190 -Xcompiler /wd4018 -Xcompiler /wd4275 -Xcompiler /wd4267 -Xcompiler /wd4244 -Xcompiler /wd4251 -Xcompiler /wd4819 -Xcompiler /MD -D__CUDA
NO_HALF_OPERATORS_ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -IE:\3dgs\MVSGaussi
an\gaussian-splatting\submodules\diff-gaussian-rasterization\third_party/glm/ -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --use-local-env
error: command 'C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6\bin\nvcc.exe' failed with exit code 1
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for diff-gaussian-rasterization
Running setup.py clean for diff-gaussian-rasterization
Failed to build diff-gaussian-rasterization
Installing collected packages: diff-gaussian-rasterization
Running setup.py install for diff-gaussian-rasterization ... error
error: subprocess-exited-with-error

× Running setup.py install for diff-gaussian-rasterization did not run successfully.
│ exit code: 1
╰─> [20 lines of output]
running install
D:\anconda\envs\mvsgs\lib\site-packages\setuptools\command\install.py:37: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
setuptools.SetuptoolsDeprecationWarning,
running build
running build_py
creating build
creating build\lib.win-amd64-cpython-37
creating build\lib.win-amd64-cpython-37\diff_gaussian_rasterization
copying diff_gaussian_rasterization_init_.py -> build\lib.win-amd64-cpython-37\diff_gaussian_rasterization
running build_ext
D:\anconda\envs\mvsgs\lib\site-packages\torch\utils\cpp_extension.py:476: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
warnings.warn(msg.format('we could not find ninja.'))
D:\anconda\envs\mvsgs\lib\site-packages\torch\utils\cpp_extension.py:358: UserWarning: Error checking compiler version for cl: [WinError 2] 系统找不到指定的文件。
warnings.warn(f'Error checking compiler version for {compiler}: {error}')
building 'diff_gaussian_rasterization.C' extension
creating build\temp.win-amd64-cpython-37
creating build\temp.win-amd64-cpython-37\Release
creating build\temp.win-amd64-cpython-37\Release\cuda_rasterizer
"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6\bin\nvcc" -c cuda_rasterizer/backward.cu -o build\temp.win-amd64-cpython-37\Release\cuda_rasteriz
er/backward.obj -ID:\anconda\envs\mvsgs\lib\site-packages\torch\include -ID:\anconda\envs\mvsgs\lib\site-packages\torch\include\torch\csrc\api\include -ID:\ancon
da\envs\mvsgs\lib\site-packages\torch\include\TH -ID:\anconda\envs\mvsgs\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUD
A\v11.6\include" -ID:\anconda\envs\mvsgs\include -ID:\anconda\envs\mvsgs\Include "-ID:\Visual Studio 2022 Community\VC\Tools\MSVC\14.38.33130\include" "-ID:\Visu
al Studio 2022 Community\VC\Tools\MSVC\14.38.33130\ATLMFC\include" "-ID:\Visual Studio 2022 Community\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows
Kits\10\include\10.0.22621.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.2
2621.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\cppwinrt
" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.8\include\um" -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -Xcudafe --diag_suppress=dll
interface_conflict_none_assumed -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcompiler /
EHsc -Xcompiler /wd4190 -Xcompiler /wd4018 -Xcompiler /wd4275 -Xcompiler /wd4267 -Xcompiler /wd4244 -Xcompiler /wd4251 -Xcompiler /wd4819 -Xcompiler /MD -D__CUDA
NO_HALF_OPERATORS_ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -IE:\3dgs\MVSGaussi
an\gaussian-splatting\submodules\diff-gaussian-rasterization\third_party/glm/ -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --use-local-env
error: command 'C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6\bin\nvcc.exe' failed with exit code 1
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: legacy-install-failure

× Encountered error while trying to install package.
╰─> diff-gaussian-rasterization

note: This is an issue with the package mentioned above, not pip.
hint: See above for output from the failure.

per-scene

你还，作者。逐场景优化是指，利用预训练模型得到的3dgs点云，根据gs模型的生长、裁剪操作去优化吗

How can I use my own data to train and test?

Some issues during the per-scene optimization process.

Thank you very much for this excellent work! I have encountered some issues during the per-scene optimization process.

While reviewing the code, I noticed that in per-scene optimization (where only 3D Gaussians are optimized), 3DGS is initialized using the point cloud (including xyz and rgb) provided by MVSGaussian, rather than the 3D Gaussians (including xyz, rgb, rotation ,scale and opacity) from MVSGaussian. In theory, initializing with 3D Gaussians instead of a point cloud would result in better initialization. Is my understanding correct?
I am having trouble loading the dataset for per-scene optimization. Following the release code, I multiplied the Translation T by the scale_factor when loading the colmap pose, like this:
T = T * scale_factor
but this approach fails during the optimization process, preventing the loss from decreasing. However, when I set the scale_factor to 1, the loss decreases and normal scene optimization can be performed.
I found that when MVSGaussian uses different scale_factors for inference, although large scale_factors yields better render images, the number and shape of the point cloud remain very similar, with only the scale differing. I would like to know if using different scale_factors during the inference process results in point clouds of different scales, and whether the final optimization results in the per-scene optimization process are similar. From my understanding, if the initial point cloud distributions are similar, the final optimization results should be close.

depth

你好，作者！这是一篇非常棒的工作。

我想请教下，pipeline里的最终获得的深度图，是位于network.py文件中第92行的depth吗？
如果不是，不知能否指导下mvs部分最终得到的深度图位于哪里，万分感谢

Dataset location

How to use pre-trained models to obtain point clouds of data sets such as TNT and LLFF? Where should these data sets be placed?

Issues on two-stage cascaded framework

Great work! Are the number of sampling points in your NeRF module the same as the number of points in 3DGS, or are they the same points? Is the number of sampling points in the final level 2? Is the first level used only for depth estimation and does not introduce 3DGS? How do you handle the density of Gaussian points—are they predicted through MLP or mapped using PDF?

Regarding Gaussian obtaining ply files

Thank you for your contribution. How can I proceed with my operation to obtain the ply file of Gaussian.I am using colmap data.

Code Release

Hi, I'm very interested in your MVSGaussian, and when do you plan to release your code?

Weird results on a custom Blender-synthetic dataset

Hi there! Thanks for your great work! I've encountered some unexpected behaviors when evaluating the model pretrained on the DTU dataset (the default one). My dataset consists of ~200 views of a Blender-synthesized LEGO object, with some of the images shown below:

The original sizes of the images are 800x800. The poses are specified in a transforms_train.json file just like the NeRF-synthetic dataset. I've combined the dataset modules for colmap and nerf to obtain that of this dataset. Here is my config file:

parent_cfg: configs/mvsgs/dtu_pretrain.yaml

train_dataset_module: lib.datasets.mydataset.mvsgs
test_dataset_module: lib.datasets.mydataset.mvsgs

mvsgs:
    bg_color: [1, 1, 1]
    test_input_views: 3
    eval_center: True
    reweighting: True
    scale_factor: 12
    cas_config:
        render_if: [False, True]
        volume_planes: [16, 8]

train_dataset:
    data_root: 'examples'
    split: 'train'
    input_h_w: [640, 640]
    input_ratio: 1.

test_dataset:
    data_root: 'examples'
    split: 'test'
    input_h_w: [640, 640]
    input_ratio: 1.

The reason why I use 640 for the input size is because 800 will cause out-of-memory. I failed to run COLMAP on this dataset due to lack of features, so I used the same near-far settings as the nerf dataset:

        H, W = tar_img.shape[:2]
        near_far = np.array([2.5 * self.scale_factor, 5.5 * self.scale_factor]).astype(np.float32)
        ret.update({'near_far': np.array(near_far).astype(np.float32)})

The near-far settings seem to play a crucial role since when I set near=0.1 and far=10, the PSNR is only ~8.

Also I noticed that when the input size is set to 400, I got this error:

  File ".../MVSGaussian/lib/networks/mvsgs/cost_reg_net.py", line 80, in forward
    x = conv2 + self.conv9(x)
RuntimeError: The size of tensor a (25) must match the size of tensor b (26) at non-singleton dimension 4

Anyway, when input size = 640, I got PSNR of about 14 and these test results:

Have you ever encountered similar problems? Should I try other near-far settings or condition-views selection approaches?

Additionally, in case of need, here is my dataset module:

import numpy as np
import os
from lib.config import cfg
import imageio
import cv2
import random
from lib.utils import data_utils
import torch
import json
from lib.datasets import mvsgs_utils
from lib.utils.video_utils import *

class Dataset:
    def __init__(self, **kwargs):
        super(Dataset, self).__init__()
        self.data_root: str = os.path.join(cfg.workspace, kwargs['data_root'])
            # workspace: .
            # data_root: just data root
        self.split = kwargs['split']
        self.input_h_w = kwargs['input_h_w']
        self.scale_factor = cfg.mvsgs.scale_factor # can be 12
        self.build_metas()
        self.zfar = 100.0
        self.znear = 0.01
        self.trans = [0.0, 0.0, 0.0]
        self.scale = 1.0

    def build_metas(self):
        self.scene_infos = {}
        self.metas = []
        self.data_root = self.data_root.strip()
        if self.data_root.endswith('/'):
            self.data_root = self.data_root[:-1]
        scene = os.path.basename(self.data_root)
        json_info = json.load(open(os.path.join(self.data_root, 'transforms_train.json')))
        b2c = np.array([[1, 0, 0, 0], [0, -1, 0, 0], [0, 0, -1, 0], [0, 0, 0, 1]])
        scene_info = {'ixts': [], 'exts': [], 'img_paths': []}
        frames: list = json_info['frames']
        frames.sort(key=lambda x: int(os.path.basename(x['file_path'])))
        for idx, frame in enumerate(frames):
            c2w = np.array(frame['transform_matrix'])
            c2w = c2w @ b2c
            ext = np.linalg.inv(c2w)
            ixt = np.eye(3)
            ixt[0][2], ixt[1][2] = 400, 400
            focal = .5 * 800 / np.tan(.5 * json_info['camera_angle_x'])
            ixt[0][0], ixt[1][1] = focal, focal
            scene_info['ixts'].append(ixt.astype(np.float32))
            scene_info['exts'].append(ext.astype(np.float32))
            img_path = os.path.join(self.data_root, frame['file_path'] + '.png')
            scene_info['img_paths'].append(img_path)
        img_len = len(frames)
        #
        render_ids = [j for j in range(img_len//8, img_len, img_len//4)] # test views
        train_ids = [j for j in range(img_len) if j not in render_ids]
        #
        if self.split == 'train':
            render_ids = train_ids
        c2ws = np.stack([np.linalg.inv(scene_info['exts'][idx]) for idx in train_ids])
        scene_info['c2ws'] = c2ws.astype(np.float32)
        self.scene_infos[scene] = scene_info

        for i in render_ids: # condition views
            c2w = scene_info['c2ws'][i]
            distance = np.linalg.norm((c2w[:3, 3][None] - c2ws[:, :3, 3]), axis=-1)
            argsorts = distance.argsort()
            argsorts = argsorts[1:] if i in train_ids else argsorts
            if self.split == 'train':
                src_views = [train_ids[j] for j in argsorts[:cfg.mvsgs.train_input_views[1]+1]]
            else:
                src_views = [train_ids[j] for j in argsorts[:cfg.mvsgs.test_input_views]]
            self.metas += [(scene, i, src_views)]
    
    def get_video_rendering_path(self, ref_poses, mode, near_far, train_c2w_all, n_frames=60, rads_scale=1.25):
        # loop over batch
        poses_paths = []
        ref_poses = ref_poses[None]
        for batch_idx, cur_src_poses in enumerate(ref_poses):
            if mode == 'interpolate':
                # convert to c2ws
                pose_square = torch.eye(4).unsqueeze(0).repeat(cur_src_poses.shape[0], 1, 1)
                cur_src_poses = torch.from_numpy(cur_src_poses)
                pose_square[:, :3, :] = cur_src_poses[:,:3]
                cur_c2ws = pose_square.double().inverse()[:, :3, :].to(torch.float32).cpu().detach().numpy()
                cur_path = get_interpolate_render_path(cur_c2ws, n_frames)
            elif mode == 'spiral':
                cur_c2ws_all = train_c2w_all
                cur_near_far = near_far.tolist()
                # rads_scale=...?
                cur_path = get_spiral_render_path(cur_c2ws_all, cur_near_far, rads_scale=rads_scale, N_views=n_frames)
            else:
                raise Exception(f'Unknown video rendering path mode {mode}')

            # convert back to extrinsics tensor
            cur_w2cs = torch.tensor(cur_path).inverse()[:, :3].to(torch.float32)
            poses_paths.append(cur_w2cs)

        poses_paths = torch.stack(poses_paths, dim=0)
        return poses_paths

    def __getitem__(self, index_meta):
        index, input_views_num = index_meta
        scene, tar_view, src_views = self.metas[index]
        if self.split == 'train':
            if np.random.random() < 0.1:
                src_views = src_views + [tar_view]
            src_views = random.sample(src_views, input_views_num)
        scene_info = self.scene_infos[scene]
        tar_img, tar_mask, tar_ext, tar_ixt = self.read_tar(scene_info, tar_view)
        src_inps, src_exts, src_ixts = self.read_src(scene_info, src_views)
        ret = {'src_inps': src_inps.transpose(0, 3, 1, 2),
               'src_exts': src_exts,
               'src_ixts': src_ixts}
        ret.update({'tar_ext': tar_ext,
                    'tar_ixt': tar_ixt})
        if self.split != 'train':
            ret.update({'tar_img': tar_img,
                        'tar_mask': tar_mask})

        H, W = tar_img.shape[:2]
        near_far = np.array([2.5 * self.scale_factor, 5.5 * self.scale_factor]).astype(np.float32)
        ret.update({'near_far': np.array(near_far).astype(np.float32)})
        ret.update({'meta': {'scene': scene, 'tar_view': tar_view, 'frame_id': 0}})

        for i in range(cfg.mvsgs.cas_config.num):
            rays, rgb, msk = mvsgs_utils.build_rays(tar_img, tar_ext, tar_ixt, tar_mask, i, self.split)
            ret.update({f'rays_{i}': rays, f'rgb_{i}': rgb.astype(np.float32), f'msk_{i}': msk})
            s = cfg.mvsgs.cas_config.volume_scale[i]
            ret['meta'].update({f'h_{i}': int(H*s), f'w_{i}': int(W*s)})

        R = np.array(tar_ext[:3, :3], np.float32).reshape(3, 3).transpose(1, 0)
        T = np.array(tar_ext[:3, 3], np.float32)
        for i in range(cfg.mvsgs.cas_config.num):
            h, w = H*cfg.mvsgs.cas_config.render_scale[i], W*cfg.mvsgs.cas_config.render_scale[i]
            tar_ixt_ = tar_ixt.copy()
            tar_ixt_[:2,:] *= cfg.mvsgs.cas_config.render_scale[i]
            FovX = data_utils.focal2fov(tar_ixt_[0, 0], w)
            FovY = data_utils.focal2fov(tar_ixt_[1, 1], h)
            projection_matrix = data_utils.getProjectionMatrix(znear=self.znear, zfar=self.zfar, K=tar_ixt_, h=h, w=w).transpose(0, 1)
            world_view_transform = torch.tensor(data_utils.getWorld2View2(R, T, np.array(self.trans), self.scale)).transpose(0, 1)
            full_proj_transform = (world_view_transform.unsqueeze(0).bmm(projection_matrix.unsqueeze(0))).squeeze(0)
            camera_center = world_view_transform.inverse()[3, :3]
            novel_view_data = {
                'FovX':  torch.FloatTensor([FovX]),
                'FovY':  torch.FloatTensor([FovY]),
                'width': w,
                'height': h,
                'world_view_transform': world_view_transform,
                'full_proj_transform': full_proj_transform,
                'camera_center': camera_center
            }
            ret[f'novel_view{i}'] = novel_view_data    
        
        if cfg.save_video:
            rendering_video_meta = []
            render_path_mode = 'spiral'
            train_c2w_all = np.linalg.inv(src_exts)
            poses_paths = self.get_video_rendering_path(src_exts, render_path_mode, near_far, train_c2w_all, n_frames=60)
            for pose in poses_paths[0]:
                R = np.array(pose[:3, :3], np.float32).reshape(3, 3).transpose(1, 0)
                T = np.array(pose[:3, 3], np.float32)
                FovX = data_utils.focal2fov(tar_ixt[0, 0], W)
                FovY = data_utils.focal2fov(tar_ixt[1, 1], H)
                projection_matrix = data_utils.getProjectionMatrix(znear=self.znear, zfar=self.zfar, K=tar_ixt, h=H, w=W).transpose(0, 1)
                world_view_transform = torch.tensor(data_utils.getWorld2View2(R, T, np.array(self.trans), self.scale)).transpose(0, 1)
                full_proj_transform = (world_view_transform.unsqueeze(0).bmm(projection_matrix.unsqueeze(0))).squeeze(0)
                camera_center = world_view_transform.inverse()[3, :3]
                rendering_meta = {
                    'FovX':  torch.FloatTensor([FovX]),
                    'FovY':  torch.FloatTensor([FovY]),
                    'width': W,
                    'height': H,
                    'world_view_transform': world_view_transform,
                    'full_proj_transform': full_proj_transform,
                    'camera_center': camera_center,
                    'tar_ext': pose
                }
                for i in range(cfg.mvsgs.cas_config.num):
                    tar_ext[:3] = pose
                    rays, _, _ = mvsgs_utils.build_rays(tar_img, tar_ext, tar_ixt, tar_mask, i, self.split)
                    rendering_meta.update({f'rays_{i}': rays})
                rendering_video_meta.append(rendering_meta)
            ret['rendering_video_meta'] = rendering_video_meta
        return ret

    def read_src(self, scene, src_views):
        src_ids = src_views
        ixts, exts, imgs = [], [], []
        for idx in src_ids:
            img, orig_size = self.read_image(scene, idx)
            imgs.append(((img/255.)*2-1).astype(np.float32))
            ixt, ext = self.read_cam(scene, idx, orig_size)
            ixts.append(ixt)
            exts.append(ext)
        return np.stack(imgs), np.stack(exts), np.stack(ixts)

    def read_tar(self, scene, view_idx):
        img, orig_size = self.read_image(scene, view_idx)
        img = (img/255.).astype(np.float32)
        ixt, ext = self.read_cam(scene, view_idx, orig_size)
        mask = np.ones_like(img[..., 0]).astype(np.uint8)
        return img, mask, ext, ixt

    def read_cam(self, scene, view_idx, orig_size):
        ext = scene['exts'][view_idx].astype(np.float32)
        ext[:3,3] *= self.scale_factor 
        ixt = scene['ixts'][view_idx]
        # ixt[0, 2] = self.input_h_w[1] / 2
        # ixt[1, 2] = self.input_h_w[0] / 2
        ixt[0] *= self.input_h_w[1] / orig_size[0]
        ixt[1] *= self.input_h_w[0] / orig_size[1]
        return ixt, ext

    def read_image(self, scene, view_idx):
        img_path = scene['img_paths'][view_idx]
        img = (np.array(imageio.imread(img_path))).astype(np.float32)
        orig_size = img.shape[:2][::-1]
        img = cv2.resize(img, self.input_h_w[::-1], interpolation=cv2.INTER_AREA)
        return np.array(img), orig_size

    def __len__(self):
        return len(self.metas)

def get_K_from_params(params):
    K = np.zeros((3, 3)).astype(np.float32)
    K[0][0], K[0][2], K[1][2] = params[:3]
    K[1][1] = K[0][0]
    K[2][2] = 1.
    return K

train colmap data？

After the demo (Custom Data), I obtained the following data. How do I execute train_net.cy? Your subsequent examples are a bit unclear

Where can I find the .ply file for observing the 3D Scene?

Hello Sir, would you mind telling me where does the .ply file save? I 'd like to use the gaussian-splatting viewer to observe the scene that reconstructed by your brilliant method.

What is the meaning of the number in the name of the result folders?

Hello sir, would you mind telling me the the meaning of the number in the name of the result folders?

关于训练llff数据集的问题

大佬您好，原谅我的菜，我想问一下关于llff数据集是怎么训练的，因为我看您给的教程里面只有关于dtu的训练示例

如果我想训练llff中的flower数据集我是直接运行flower.yaml还是llff_eval.yaml呢

我还比较好奇的是，您的论文中的定性比较的图片是怎么比较的呢？是提前在某一个数据集里面训练好，然后用预训练好的模型去渲染出没见过的场景吗？还是说选取对应的场景我用很少的视角去训练，比如一个场景35张图片，我用其中的3张训练，其余的用训练好的模型进行渲染并比较。就比如下面这个fern数据集的某一视角，您是用哪些数据集进行预训练的模型呢？是用的除开fern之外的所有flower数据集进行训练的模型吗？然后训练好后，用预训练的模型去渲染整个fern数据集吗？

你好，请问你的代码里实现多卡训练的代码是在哪个部分？

how to save Gaussian point cloud results

Hello, I wonder how to save Gaussian point cloud results after training my own dataset with the generalizable model.
I tried using the following command to obtain Gaussian point cloud of the example dataset：
“python run.py --type evaluate --cfg_file configs/mvsgs/colmap_eval.yaml test_dataset.data_root examples/scene1 save_ply True dir_ply mvsgs_pointcloud”
However, the result cannot be opened in Gaussian viewer.Could you please give me some advice? Thanks a lot!

“win_size exceeds image extent”

Traceback (most recent call last):
File "/mnt/sda/xdh/MVS-GS/MVSGaussian/run.py", line 98, in
globals()'run_' + args.type
File "/mnt/sda/xdh/MVS-GS/MVSGaussian/run.py", line 82, in run_evaluate
evaluator.evaluate(output, batch)
File "lib/evaluators/mvsgs.py", line 73, in evaluate
ssim_item = ssim(gt_rgb[b], pred_rgb[b], multichannel=True)
File "/mnt/sda/xdh/env/mvsgs/lib/python3.10/site-packages/skimage/metrics/_structural_similarity.py", line 178, in structural_similarity
raise ValueError(
ValueError: win_size exceeds image extent. Either ensure that your images are at least 7x7; or pass win_size explicitly in the function call, with an odd value less than or equal to the smaller side of your images. If your images are multichannel (with color channels), set channel_axis to the axis number corresponding to the channels.

I can't run the code successfully, please take a look for me

How can I find the suitable number for "depth", "range" in file .txt and "zfar", "znear" for my custom dataset ?

Thanks for your amazing work,

But I have one problem
When I tested with your dataset , the result was pretty good, but when I tested with my custom dataset, I have a trouble with "depth" and "range" in file .*txt, I cannot find the suitable value for those variables, therefor the result is not good (the result images below).
Now I'm using depth and range = 425 - 905 and "znear" - "zfar" = 0.01 - 100 (which are defaults of dtu dataset and your code).
Can you help me for this question ? Thank you so much.

OOM when training

作者你好！实在是非常棒的工作！我在custom dataset上训练时遇到一些问题，关于显卡内存的，不知道您有一些建议吗？
具体是在单机多卡训练时，显卡占用不是很均衡，并且占用量也不是很稳定，常常是在几个epoch训练结束后，validation结束后会OOM，关于这个，您有好的建议吗？非常感谢！

The results are poor

Thanks for the updated readme and demo for custom datasets!

The results are poor, my dataset is inward facing 360 ring of cameras of an object not simply looking out at the object from one direct ion and moving the camera around on a plane.

Is this code capable of producing results from images taken fully around an object?

What are the differences between run.py and lib/render.py?

Hi, Sir. I wonder what are the differences between the video rendered from the run.py and lib/render.py?

关于流程上一些详细的步骤

非常感谢这么好的工作的分享，我有个小小的建议：
流程上可以详细些，就是train test的步骤，包括预训练的泛化的模型可以怎么用？非常感谢

训练出现问题

作者您好！我在单独训练某一数据集时出现了如下问题

这是我运行的命令
python train_net.py --cfg_file configs/mvsgs/llff/flower.yaml train.batch_size 2
这是我的数据集，我想训练的是一个街景的数据集，我把这个数据集的名字改为了flower

或许是该场景与flower的尺寸不一样，因为我训练flower没有弹出ipdb的情况

Code Release

Great work! When do you plan to release your code?

关于深度估计的问题

作者您好，非常感谢您出色的工作，我想问一下您的论文中为什么不采用通过3D高斯累积的方式来构建深度图呢？比如下面的图片所示：

这种深度计算方式相比于通过cost volume的计算深度图的方式哪种更好呢？

How to run on Custom data?

Hey,

thanks for your amazing work. I was wondering how can I run your method on the output from Colmap?

Basically I have data in the following format:
data -- images/
-- sparse -- 0 -- cameras.bin, images.bin, points3D.bin

About the training process under the train_net.py

Hi, sir! I'm sorry to bother you again. While I run the train_net.py on my own dataset, I find that the log file where I redirect the output information is always increasing its storage, shown as below:

What makes me confused is that whether the traning process needs such a long time to complete it.
It is appreciated of you to offer me some suggestions about it.
Looking forward to your earliest reply!

code issue

I utilized the open-source code you provided to run the MVSGaussian model and Unfortunately, under your experimental configuration, training on the DTU dataset for 300 epochs only achieved a performance of 27.58 dB. I noticed that in the code you released, the sampling points for both levels are the same（one sample）, and the rotation network of the Gaussian module has not been unlocked for training (you set it to a fixed value). I am not too sure if the issue lies with my environment or with the code you released. If you could address my questions, I would be extremely grateful.

How can I render a visualized scene demo？

How can I render a visualized scene demo from the trained results, not just perform new perspective synthesis？

How can I execute the following statement about colmap correctly?

Hello, I wonder what should I do in order to execute python lib/colmap/imgs2poses.py -s examples/scene1 correctly?
Actually I have used several other commads of colmap to replace the command above, in order to avoid the error of no gui interfaces.

How to deal with the

Hello, when I run the command ython train_net.py --cfg_file configs/mvsgs/colmap_eval.yaml train_dataset.data_root dataset/<>test_dataset.data_root dataset/<> on my own dataset, the erroe is like:

Traceback (most recent call last): File "train_net.py", line 151, in <module> main() File "train_net.py", line 143, in main train(cfg, network) File "train_net.py", line 51, in train trainer.train(epoch, train_loader, optimizer, recorder) File "/home/sunzhenyu/Projects/MVSGaussian/lib/train/trainers/trainer.py", line 49, in train for iteration, batch in enumerate(data_loader): File "/home/sunzhenyu/anaconda3/envs/mvsgs/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 628, in __next__ data = self._next_data() File "/home/sunzhenyu/anaconda3/envs/mvsgs/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1333, in _next_data return self._process_data(data) File "/home/sunzhenyu/anaconda3/envs/mvsgs/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1359, in _process_data data.reraise() File "/home/sunzhenyu/anaconda3/envs/mvsgs/lib/python3.7/site-packages/torch/_utils.py", line 543, in reraise raise exception ValueError: Caught ValueError in DataLoader worker process 2. Original Traceback (most recent call last): File "/home/sunzhenyu/anaconda3/envs/mvsgs/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop data = fetcher.fetch(index) File "/home/sunzhenyu/anaconda3/envs/mvsgs/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 58, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/sunzhenyu/anaconda3/envs/mvsgs/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 58, in <listcomp> data = [self.dataset[idx] for idx in possibly_batched_index] File "lib/datasets/colmap/mvsgs.py", line 95, in __getitem__ src_views = random.sample(src_views, input_views_num) File "/home/sunzhenyu/anaconda3/envs/mvsgs/lib/python3.7/random.py", line 321, in sample raise ValueError("Sample larger than population or is negative") ValueError: Sample larger than population or is negative

How can I fix the problem?

A question about Hybrid Gaussian Rendering.

The depths of input views are already known. But for a target view, how do you get the depth of the only sample point for the volume rendering?

How can I get the Camera Position.json file

Hello Sir, I wonder how can I get the Camera Position.json file? Anticipate for your earliest reply!

tqtqliu / mvsgaussian Goto Github PK

mvsgaussian's Introduction

MVSGaussian: Fast Generalizable Gaussian Splatting Reconstruction from Multi-View Stereo

⚡ Updates

🌟 Abstract

🔨 Installation

Clone our repository

Set up the python environment

Install Gaussian Splatting renderer

🤗 Demo (Custom Data)

Inference

Train on your own data

Per-scene optimization

📦 Datasets

🚂 Training

Train generalizable model

Per-scene optimization

🎯 Evaluation

Evaluation on DTU

Evaluation on Real Forward-facing

Evaluation on NeRF Synthetic

Evaluation on Tanks and Temples

Render videos

📝 Citation

😃 Acknowledgement

📧 Contact

mvsgaussian's People

Contributors

Stargazers

Watchers

Forkers

mvsgaussian's Issues

Recommend Projects

Recommend Topics

Recommend Org