Giter Club home page Giter Club logo

taichi_3d_gaussian_splatting's Introduction

taichi_3d_gaussian_splatting

An unofficial implementation of paper 3D Gaussian Splatting for Real-Time Radiance Field Rendering by taichi lang.

What does 3D Gaussian Splatting do?

Training:

The algorithm takes image from multiple views, a sparse point cloud, and camera pose as input, use a differentiable rasterizer to train the point cloud, and output a dense point cloud with extra features(covariance, color information, etc.).

drawing
If we view the training process as module, it can be described as:

graph LR
    A[ImageFromMultiViews] --> B((Training))
    C[sparsePointCloud] --> B
    D[CameraPose] --> B
    B --> E[DensePointCloudWithExtraFeatures]
Loading

Inference:

The algorithm takes the dense point cloud with extra features and any camera pose as input, use the same rasterizer to render the image from the camera pose.

graph LR
    C[DensePointCloudWithExtraFeatures] --> B((Inference))
    D[NewCameraPose] --> B
    B --> E[Image]
Loading

An example of inference result:

tat_truck_1.mp4

Because the nice property of point cloud, the algorithm easily handles scene/object merging compared to other NeRF-like algorithms.

tat_truck_with_boot_1.mp4
other example result

top left: result from this repo(30k iteration), top right: ground truth, bottom left: normalized depth, bottom right: normalized num of points per pixel image image image

Why taichi?

  • Taichi is a language for high-performance computing. It is designed to close the gap between the productivity-focused Python language and the performance- and parallelism-focused C++/CUDA languages. By using Taichi, the repo is pure Python, and achieves the same or even better performance compared to CUDA implementation. Also, the code is much easier to read and maintain.
  • Taichi provides various backends, including CUDA, OpenGL, Metal, etc. We do plan to change the backend to support various platforms, but currently, the repo only supports CUDA backend.
  • Taichi provides automatic differentiation, although the repo does not use it currently, it is a nice feature for future development.

Current status

The repo is now tested with the dataset provided by the official implementation. For the truck dataset, The repo is able to achieve a bit higher PSNR than the official implementation with only 1/5 to 1/4 number of points. However, the training/inference speed is still slower than the official implementation.

The results for the official implementation and this implementation are tested on the same dataset. I notice that the result from official implementation is slightly different from their paper, the reason may be the difference in testing resolution.

Dataset source PSNR SSIM #points
Truck(7k) paper 23.51 0.840 -
Truck(7k) offcial implementation 23.22 - 1.73e6
Truck(7k) this implementation 23.762359619140625 0.835700511932373 ~2.3e5
Truck(30k) paper 25.187 0.879 -
Truck(30k) offcial implementation 24.88 - 2.1e6
Truck(30k) this implementation 25.21463966369629 0.8645088076591492 428687.0

Truck(30k)(recent best result):

train:iteration train:l1loss train:loss train:num_valid_points train:psnr train:ssim train:ssimloss val:loss val:psnr val:ssim
30000.0 0.02784738875925541 0.04742341861128807 428687.0 25.662137985229492 0.8742724657058716 0.12572753429412842 0.05369199812412262 25.21463966369629 0.8645088076591492

Installation

  1. Prepare an environment contains pytorch and torchvision
  2. clone the repo and cd into the directory.
  3. run the following command
pip install -r requirements.txt
pip install -e .

All dependencies can be installed by pip. pytorch/tochvision can be installed by conda. The code is tested on Ubuntu 20.04.2 LTS with python 3.10.10. The hardware is RTX 3090 and CUDA 12.1. The code is not tested on other platforms, but it should work on other platforms with minor modifications.

Dataset

The algorithm requires point cloud for whole scene, camera parameters, and ground truth image. The point cloud is stored in parquet format. The camera parameters and ground truth image are stored in json format. The running config is stored in yaml format. A script to build dataset from colmap output is provided. It is also possible to build dataset from raw data.

Train on Tank and temple Truck scene

CLICK ME

**Disclaimer**: users are required to get permission from the original dataset provider. Any usage of the data must obey the license of the dataset owner.

The truck scene in tank and temple dataset is the major dataset used to develop this repo. We use a downsampled version of images in most experiments. The camera poses and the sparse point cloud can be easily generated by colmap. The preprocessed image, pregenerated camera pose and point cloud for truck scene can be downloaded from this link.

Please download the images into a folder named image and put it under the root directory of this repo. The camera poses and sparse point cloud should be put under data/tat_truck_every_8_test. The folder structure should be like this:

├── data
│   ├── tat_truck_every_8_test
│   │   ├── train.json
│   │   ├── val.json
│   │   ├── point_cloud.parquet
├── image
│   ├── 000000.png
│   ├── 000001.png

the config file config/tat_truck_every_8_test.yaml is provided. The config file is used to specify the dataset path, the training parameters, and the network parameters. The config file is self-explanatory. The training can be started by running

python gaussian_point_train.py --train_config config/tat_truck_every_8_test.yaml

Train on Example Object(boot)

CLICK ME

It is actually one random free mesh from Internet, I believe it is free to use. BlenderNerf is used to generate the dataset. The preprocessed image, pregenerated camera pose and point cloud for boot scene can be downloaded from this link. Please download the images into a folder named image and put it under the root directory of this repo. The camera poses and sparse point cloud should be put under data/boots_super_sparse. The folder structure should be like this:

├── data
│   ├── boots_super_sparse
│   │   ├── boots_train.json
│   │   ├── boots_val.json
│   │   ├── point_cloud.parquet
├── image
│   ├── images_train
│   │   ├── COS_Camera.001.png
│   │   ├── COS_Camera.002.png
|   |   ├── ...

Note that because the image in this dataset has a higher resolution(1920x1080), training on it is actually slower than training on the truck scene.

Train on dataset generated by colmap

CLICK ME

  • Reconstruct using colmap: See https://colmap.github.io/tutorial.html. The image should be undistorted. Sparse reconstruction is usually enough.
  • save as txt: the standard colmap txt output contains three files, cameras.txt, images.txt, points3D.txt
  • transform the txt into json and parquet: see this file about how to prepare it.
  • prepare config yaml: see this file as an example
  • run with the config.

Train on dataset with Instant-NGP format with extra mesh

CLICK ME

  • A script to convert Instant-NGP format dataset into the two required JSON files is provided. However, the algorithm requires an extra point cloud as input, which does not usually come with Instant-NGP format dataset. The script accepts a mesh file as input and generate a point cloud by sampling points on the mesh. The script is here.
  • User can run the script with the following command:
python tools/prepare_InstantNGP_with_mesh.py \
    --transforms_train {path to train transform file} \
    --transforms_test {path to val transform file, if not provided, val will be sampled from train} \
    --mesh_path {path to mesh file} \
    --mesh_sample_points {number of points to sample on the mesh} \
    --val_sample {if sample val from train, sample by every n frames} \
    --image_path_prefix {path prefix to the image, usually the path to the folder containing the image folder} \
    --output_path {path to output folder}
  • then in the output folder, there will be two json files, train.json and val.json, and a point cloud file point_cloud.parquet.
  • create a config yaml file similar to test_sagemaker.yaml, modify train-dataset-json-path to the path of train.json, val-dataset-json-path to the path of val.json, and pointcloud-parquet-path to the path of point_cloud.parquet. Also modify the summary-writer-log-dir and output-model-dir to where ever you want to save the model and tensorboard log.
  • run with the config:
python gaussian_point_train.py --train_config {path to config yaml}

Train on dataset generated by BlenderNerf

CLICK ME

BlenderNerf is a Blender Plugin to generate dataset for NeRF. The dataset generated by BlenderNerf can be the Instant-NGP format, and we can use the script to convert it into the required format. And the mesh can be easily exported from Blender. To generate the dataset:

  • Install Blender
  • import the mesh/scene you want to Blender
  • Install BlenderNerf by following the README in BlenderNerf
  • config BlenderNerf: make sure Train is selected and Test is not selected(Test seems to be buggy), File Format is NGP, save path is filled.
    image
  • config BlenderNerf Camera on Sphere: follow BlenderNerf README to config the camera(default is enough for most case). Then click PLAY COS.
    image
  • A zip file will be generated in the save path. Unzip it, it should contain a folder named train and a file named transforms_train.json.
  • In Blender, File->Export->Stl(.stl), export the mesh as stl file.
  • can run the script with the following command:
python tools/prepare_InstantNGP_with_mesh.py \
    --transforms_train {path to transform_train.json} \
    --mesh_path {path to stl file} \
    --mesh_sample_points {number of points to sample on the mesh, default to be 500} \
    --val_sample {if sample val from train, sample by every n frames, default to be 8} \
    --image_path_prefix {absolute path of the directory contain the train dir} \
    --output_path {any path you want}
  • then in the output folder, there will be two json files, train.json and val.json, and a point cloud file point_cloud.parquet.
  • create a config yaml file similar to test_sagemaker.yaml, modify train-dataset-json-path to the path of train.json, val-dataset-json-path to the path of val.json, and pointcloud-parquet-path to the path of point_cloud.parquet. Also modify the summary-writer-log-dir and output-model-dir to where ever you want to save the model and tensorboard log.
  • run with the config:
python gaussian_point_train.py --train_config {path to config yaml}

Train on dataset generated by other methods

CLICK ME

see this file about how to prepare the dataset.

Run

python gaussian_point_train.py --train_config {path to config file}

The training process works in the following way:

stateDiagram-v2
    state WeightToTrain {
        sparsePointCloud
        pointCloudExtraFeatures
    }
    WeightToTrain --> Rasterizer: input
    cameraPose --> Rasterizer: input
    Rasterizer --> Loss: rasterized image
    ImageFromMultiViews --> Loss
    Loss --> Rasterizer: gradient
    Rasterizer --> WeightToTrain: gradient
Loading

The result is visualized in tensorboard. The tensorboard log is stored in the output directory specified in the config file. The trained point cloud with feature is also stored as parquet and the output directory is specified in the config file.

Run on colab (to take advantage of google provided GPU accelerators)

You can find the related notebook here: /tools/run_3d_gaussian_splatting_on_colab.ipynb

  1. Set the hardware accelerator in colab: "Runtime->Change Runtime Type->Hardware accelerator->select GPU->select T4"
  2. Upload this repo to corresponding folder in your google drive.
  3. Mount your google drive to your notebook (see notebook).
  4. Install condacolab (see notebook).
  5. Install requirement.txt with pip (see notebook).
  6. Install pytorch, torchvision, pytorch-cuda etc. with conda (see notebook).
  7. Prepare the dataset as instructed in https://github.com/wanmeihuali/taichi_3d_gaussian_splatting#dataset
  8. Run the trainer with correct config (see notebook).
  9. Check out the training process through tensorboard (see notebook).

Visualization

A simple visualizer is provided. The visualizer is implemented by Taichi GUI which limited the FPS to 60(If anyone knows how to change this limitation please ping me). The visualizer takes one or multiple parquet results. Example parquets can be downloaded here.

python3 visualizer --parquet_path_list <parquet_path_0> <parquet_path_1> ...

The visualizer merges multiple point clouds and displays them in the same scene.

  • Press 0 to select all point clouds(default state).
  • Press 1 to 9 to select one of the point clouds.
  • When all point clouds are selected, use "WASD=-" to move the camera, and use "QE" to rotate by the y-axis, or drag the mouse to do free rotation.
  • When only one of the point clouds is selected, use "WASD=-" to move the object/scene, and use "QE" to rotate the object/scene by the y-axis, or r drag the mouse to do free rotation by the center of the object.

How to contribute/Use CI to train on cloud

I've enabled CI and cloud-based training now. The function is not very stable yet. It enables anyone to contribute to this repo even if you don't have a GPU. Generally, the workflow is:

  1. For any algorithm improvement, please create a new branch and make a pull request.
  2. Please @wanmeihuali in the pull request, and I will check the code and add a label need_experiment or need_experiment_garden or need_experiment_tat_truck to the pull request.
  3. The CI will automatically build the docker image and upload it to AWS ECR. Then the cloud-based training will be triggered. The training result will be uploaded to the pull request as a comment, e.g. this PR. The dataset is generated by the default config of colmap. The training is on g4dn.xlarge Spot Instance(NVIDIA T4, a weaker GPU than 3090/A6000), the training usually takes 2-3 hours.
  4. Now the best training result in README.md is manually updated. I will try to automate this process in the future.

The current implementation is based on my understanding of the paper, and it will have some difference from the paper/official implementation(they plan to release the code in the July). As a personal project, the parameters are not tuned well. I will try to improve performance in the future. Feel free to open an issue if you have any questions, and PRs are welcome, especially for any performance improvement.

TODO

Algorithm part

  • Fix the adaptive controller part, something is wrong with the densify process, and the description in the paper is very vague. Further experiments are needed to figure out the correct/better implementation.
    • figure if the densify shall apply to all points, or only points in current frame.
    • figure what "average magnitude of view-space position gradients" means, is it average across frames, or average across pixel?
    • figure the correct split policy. Where shall the location of new point be? Currently the location is the location before optimization. Will it be better to put it at foci of the original ellipsoid? use sampling of pdf for over-reconstruct, use position before optimization for under-reconstruct.
  • Add result score/image in README.md
    • try same dataset in the paper.
    • fix issue in current blender plugin, and also make the plugin open source.
  • camera pose optimization: get the gradient of the camera pose, and optimize it during training.
  • Dynamic Rigid Object support. The current implementation already supports multiple camera poses in one scene, so the movement of rigid objects shall be able to transform into the movement of the camera. Need to find some sfm solution that can provide an estimation of 6 DOF pose for different objects, and modify the dataset code to do the test.

Engineering part

  • fix bug: crash when there's no point in camrea.
  • Add a inference only framework to support adding/moving objects in the scene, scene merging, scene editing, etc.
  • Add a install script/docker image
  • Support batch training. Currently the code only supports single image training, and only uses small part of the GPU memory.
  • Implement radix sort/cumsum by Taichi instead of torch, torch-taichi tensor cast seems only available on CUDA device. If we want to switch to other device, we need to get rid of torch.
  • Implement a Taichi only inference rasterizer which only use taichi field, and migrate to MacOS/Android/IOS.

taichi_3d_gaussian_splatting's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

taichi_3d_gaussian_splatting's Issues

Error running prepare_colmap.py

Error running prepare_colmap.py, any ideas?

(3, 191076)
(3, 191076)
Traceback (most recent call last):
  File "D:\.repos\taichi_3d_gaussian_splatting\tools\prepare_colmap.py", line 275, in <module>
    'camera_intrinsics': camera['K'].tolist(),
AttributeError: 'NoneType' object has no attribute 'tolist'

Is this related to f and fx values? my colmap data is SIMPLE_PINHOLE so only has f

0 SIMPLE_PINHOLE 4112 3008 4665.1357862684463 2048.8621369116008 1539.5097867399454

How to accelerate the rendering process?

When I trained the model with 8G-1070Ti and 24G-3090Ti, the speed increase was not significant. Compared the original-version, the taichi-version is 2-3 times slower, is it possible to speed up the taichi-version further?

Point filtering

Thanks for this splatting implementation. It helps me to learn it.
I found out that in point filtering(https://github.com/wanmeihuali/taichi_3d_gaussian_splatting/blob/main/taichi_3d_gaussian_splatting/GaussianPointCloudRasterisation.py#L74) you have following lines:
pixel_u >= -TILE_WIDTH * BOUNDARY_TILES and pixel_u < camera_width + TILE_WIDTH * BOUNDARY_TILES and \ pixel_v >= -TILE_HEIGHT * BOUNDARY_TILES and pixel_v < camera_height + TILE_HEIGHT * BOUNDARY_TILES

Is this check necessary? In my understanding of camera intrinsics - it would transform points in camera frame into pixel space in such way that their coordinates will be in range of [0,camera_width] and [0, camera_height].
This check looks redundant for me. However, i am not excluding that i might be wrong

Could you describe what was the idea?

Rasterisation in validation is slower.

Hi, guys,
Thank you for your excellent work. I am wondering why the rasterisation time cost during validation is much slower than during training?(~400ms -> ~10ms)

how to match quality of original implementation?

I have been playing with the truck sample in both this repository and in graphdeco-inria /
gaussian-splatting
from this repo
^this repo
from reference
^ reference, trained with default parameters

I can't seem to replicate the same re-construction quality with this repository, (note that the fence cannot be rendered clearly). I have tried to match the learning rates, making the following changes to the config. what is causing the difference?

--- a/config/tat_truck_every_8_test.yaml
+++ b/config/tat_truck_every_8_test.yaml
@@ -31,8 +31,8 @@ print-metrics-to-console: False
 enable_taichi_kernel_profiler: False
 log_taichi_kernel_profile_interval: 3000
 log_validation_image: False
-feature_learning_rate: 0.005
-position_learning_rateo: 0.00005
+feature_learning_rate: 0.0025
+position_learning_rate: 0.00016
 position_learning_rate_decay_rate: 0.9947
 position_learning_rate_decay_interval: 100
 loss-function-config:
@@ -45,8 +45,11 @@ rasterisation-config:
   depth-to-sort-key-scale: 10.0
   far-plane: 2000.0
   near-plane: 0.4
+  grad_s_factor: 2
+  grad_q_factor: 0.4
+  grad_alpha_factor: 20
 summary-writer-log-dir: logs/tat_truck_every_8_experiment
-output-model-dir: logs/tat_truck_every_8_experiment
+output-model-dir: logs/tat_truck_every_8_experiment_matched_lr

Parameter simplifications

Hi there,

Thanks for your great piece of work - I'm keen to try simplifying a few things and interested in your opinion:

First one is relatively simple:
There's lots of code like this where there's a parameter:
in_camera_grad_color_buffer: ti.types.ndarray(ti.f32, ndim=2), # (M, 3)

Then some code later:

      point_grad_color = ti.math.vec3(
           in_camera_grad_color_buffer[idx, 0],
           in_camera_grad_color_buffer[idx, 1],
           in_camera_grad_color_buffer[idx, 2],
       )

But did you realise you can declare the parameter like this?
Directly creating the vec3 instead:

in_camera_grad_color_buffer: ti.types.ndarray(ti.math.vec3, ndim=1), # (M, 3)

Second one is potentially packing parameters into vectors a little like how you did the Gaussian3D struct, so instead of having a bunch of input parameters:

    point_uv: ti.types.ndarray(ti.f32, ndim=2),  # (M, 2)
    point_in_camera: ti.types.ndarray(ti.f32, ndim=2),  # (M, 3)
    point_uv_conic: ti.types.ndarray(ti.f32, ndim=2),  # (M, 3)
    point_alpha_after_activation: ti.types.ndarray(ti.f32, ndim=1),  # (M)
    point_color: ti.types.ndarray(ti.f32, ndim=2),  # (M, 3)

They could be packed into a ti.types.ndarray(vec12, ndim=1) and unpacked into a Gaussian2D struct, a few helper abstractions can simplify it to avoid creating even more boilerplate...

Thanks!
Oliver

Results for NeRF-synthetic dataset

Hi,

Thank you so much for the great work! I was wondering if you have also tried the NeRF-synthetic dataset, where the Gaussians are initialized randomly (in the original paper). If so, how are the results for this type of synthetic bounded scenes? Thanks in advance!

Use the taichi autodiff grad during backward

When I read the taichi doc, taichi support autodiff funcation by kernel_func.grad. So is it possible to directly use kernel_func.grad in the backward function during taichi-3dgs?

parquet_view

Can you help me to make the file.parquet transform to file.obj,or can you make the file.parquet transform to other file 3D

Missing gaussians and rendering become transparent?

Hi, This is great job.
I have been playing the code and found that for object with uniform colors, it often misses to create gaussians and cause rendering with transparent effect. See an example below (left: official Gaussian splatting, right: taichi GS)

transparent.mp4

I am using this yaml config with commit 030e5ab

Any suggestion to encounter this issue?

Torch not compiled with CUDA enabled

I am trying to run just the visualizer, and facing these error

python visualizer.py --parquet_path_list boots_19000.parquet

[Taichi] version 1.6.0, llvm 15.0.1, commit f1c6fbbd, win, python 3.10.11
C:\Users\saeid\source\repos\taichi_3d_gaussian_splatting\taichi_3d_gaussian_splatting\GaussianPointCloudScene.py:6: DeprecationWarning: Please use cKDTree from the scipy.spatial namespace, the scipy.spatial.ckdtree namespace is deprecated.
from scipy.spatial.ckdtree import cKDTree
Traceback (most recent call last):
File "C:\Users\saeid\source\repos\taichi_3d_gaussian_splatting\visualizer.py", line 23, in
class GaussianPointVisualizer:
File "C:\Users\saeid\source\repos\taichi_3d_gaussian_splatting\visualizer.py", line 25, in GaussianPointVisualizer
class GaussianPointVisualizerConfig:
File "C:\Users\saeid\source\repos\taichi_3d_gaussian_splatting\visualizer.py", line 29, in GaussianPointVisualizerConfig
camera_intrinsics: torch.Tensor = torch.tensor(
File "C:\Users\saeid\source\repos\taichi_3d_gaussian_splatting\venv\lib\site-packages\torch\cuda_init_.py", line 239, in _lazy_init
raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled

Visualization

Thank you for your amazing work. I have 2 questions regarding visualization.

  • Is there anyway to visualize .parquet file with any web-based viewer or locally cpu-based machine?
  • Is it possible to convert .parquet to some another known point cloud or mesh format?

CUDA/GL Rasterizer

I'm looking into working on a GL renderer for the parquets.
I see that the Inria repository that has now been released wrote the rasterizer completely in CUDA.
Do you have any plans on working on different renderers/rasterizers?
Also, about making the inference completely Taichi field-based, how much work do you believe is left there? Any starting points to work on this?
@wanmeihuali

Object centric dataset such as boots dataset

Please can you provide training dataset for the boot sequence. I will like to try on object centric datasets and improve from there onwards. Thank you for your swift response. I just need to know the dataset, I can then run colmap myself if necessary to generate.

Thank you a lot for the great work

prepare_InstantNGP_with_mesh.py error

error running the script with my dataset, have I used the flags correctly?;

(taichi_3d_gaussian_splatting) D:\.repos\taichi_3d_gaussian_splatting>python tools/prepare_InstantNGP_with_mesh.py  --transforms_train "D:\.repos\taichi_3d_gaussian_splatting_data\data\ed\ingp\transforms.json"     --mesh_path "D:\.repos\taichi_3d_gaussian_splatting_data\data\ed\ingp\mesh.ply"   --mesh_sample_points 100000     --image_path_prefix image    --output_path D:\.repos\taichi_3d_gaussian_splatting_data\data\ed\output
Traceback (most recent call last):
  File "D:\.repos\taichi_3d_gaussian_splatting\tools\prepare_InstantNGP_with_mesh.py", line 54, in <module>
    data_list = convert_json(input_json, args.image_path_prefix)
  File "D:\.repos\taichi_3d_gaussian_splatting\tools\prepare_InstantNGP_with_mesh.py", line 13, in convert_json
    [input_json["fl_x"], 0, input_json["cx"]],
KeyError: 'fl_x'

my transforms file attached
transforms.zip

whether supports kitti dataset

Hello, I've seen references to KITTI in both the "config" and "tools". I wanted to ask if you've tested the KITTI dataset and if you can provide the test data.Thanks!

WRONG COLOR WHEN CONVERT PARQUET TO PLY

I got wrong color between taichi viewer and PLY which converted from parquet. It got right color when opened by viewer but got strange color with PLY which is more contrast.

conda activate my_environment

cd /media/ichsan/DISK/taichi_3d_gaussian_splatting/

python3 visualizer.py --parquet_path_list OUTPUT/truck/best_scene.parquet

image

#########
I convert to PLY using this code. The PLY can be opened by Antimatter15 WebGL Splat viewer

conda activate my_environment

cd /media/ichsan/KERJAAN/Linux_World/taichi_3d_gaussian_splatting/

python3 parquet_to_ply.py --parquet_path OUTPUT/truck/best_scene.parquet --ply_path OUTPUT/truck/point_cloud.ply

image

########

Python version==3.10.10
CUDA version==12.1.105

taichi==1.7.0
taichi-3d-gaussian-splatting==0.0.1
torch==2.1.2
torchvision==0.16.2
plyfile==1.0.2

Segmentation fault (core dumped)

Hi I am getting segmentation fault. Any idea why?

python3 gaussian_point_train.py --train_config config/tat_truck_every_8_test.yaml

[Taichi] version 1.6.0, llvm 15.0.4, commit f1c6fbbd, linux, python 3.10.12
[Taichi] Starting on arch=cuda
0%| | 0/30001 [00:00<?, ?it/s]/home/ai/.local/lib/python3.10/site-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True).
warnings.warn(
/home/ai/.local/lib/python3.10/site-packages/taichi/lang/expr.py:101: DeprecationWarning: In future, it will be an error for 'np.bool_' scalars to be interpreted as an index
return Expr(_ti_core.make_const_expr_int(constant_dtype, val))
0%| | 11/30001 [00:04<1:50:52, 4.51it/s]Segmentation fault (core dumped)

Nans in camera gradient

Hi! Thanks for the awesome project! I found that there are sometimes nans or infs in the camera pose gradients (around 1/10 iterations). I'm just wondering whether you experienced these too? Also I'm new to Taichi but doesn't it support automatic differentiation? If so why do you write the gradients by hand? Thanks again!

Spherical harmonics parametrization

In the original splatting repo, spherical harmonics parameters are represented as features_rest of shape (n, 15, 3) and features_dc of shape (n, 1, 3) where n is the number of points.

Here the feature vector is of shape 56 by default. First 4 dimensions for rotation, then 3 dimensions for the scale, 1 dimensions for alpha. This leaves 48 dimensions for the spherical harmonics parameters.

How can I convert the optimized features from the original repo to match the features that you have preserving the correct order?

Custom datasets

@wanmeihuali Hi this is really cool, very impressive!
I have tested with the datasets you provided and got everything working.
Next I would like to test with my own data of human scans and I'm having trouble following how to convert my own data into the right format. Could you go into more detail on this please or could we get into contact to discuss further and other ideas?

Best
Henry

IndexError: tensors used as indices must be long, byte or bool tensors

[Taichi] version 1.6.0, llvm 15.0.4, commit f1c6fbbd, linux, python 3.9.18
[Taichi] Starting on arch=cuda
  0%|                                                                                                                                                                            | 0/30001 [00:00<?, ?it/s]/data/anaconda3/envs/3dgs/lib/python3.9/site-packages/taichi/lang/expr.py:101: DeprecationWarning: In future, it will be an error for 'np.bool_' scalars to be interpreted as an index
  return Expr(_ti_core.make_const_expr_int(constant_dtype, val))
  0%|                                                                                                                                                                            | 0/30001 [00:06<?, ?it/s]
Traceback (most recent call last):
  File "/data/zxc/code/git/taichi_3d_gaussian_splatting/gaussian_point_train.py", line 20, in <module>
    trainer.train()
  File "/data/zxc/code/git/taichi_3d_gaussian_splatting/taichi_3d_gaussian_splatting/GaussianPointTrainer.py", line 176, in train
    loss.backward()
  File "/data/anaconda3/envs/3dgs/lib/python3.9/site-packages/torch/_tensor.py", line 488, in backward
    torch.autograd.backward(
  File "/data/anaconda3/envs/3dgs/lib/python3.9/site-packages/torch/autograd/__init__.py", line 197, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
  File "/data/anaconda3/envs/3dgs/lib/python3.9/site-packages/torch/autograd/function.py", line 267, in apply
    return user_fn(self, *args)
  File "/data/zxc/code/git/taichi_3d_gaussian_splatting/taichi_3d_gaussian_splatting/GaussianPointCloudRasterisation.py", line 1133, in backward
    grad_point_in_camera=grad_pointcloud[point_id_in_camera_list],
IndexError: tensors used as indices must be long, byte or bool tensors

Metal support for training/inference

Is there a timeline for adding metal backend to the Taichi training code? I have other GPUs but apple silicon macs have a lot of unified memory and are very energy efficient. I think it would be a good long term platform for experimenting with Gaussian splats.

define images location

Is there a way to define the image location for the dataset?

At the moment the images have to be in my root taichi_3d_gaussian_splatting folder if absolute paths are not used in the transforms.json file.

the image path in the truck dataset in the json file is image/ , usually this location is referenced next to the .json files, not the project root.

Provide dataset

Hi !
Its an amazing work !
Can you provide some compatible dataset to test your program first ?
thanks a lot

Fix several issues with the code and make it work nicely on large set of videos

Hi @wanmeihuali

I would like to start a PR effort on a few recent fixes we developed for Taichi Gaussian Splatting (Taichi-GS). I had branched out from this commit ( f7631e3 ) and made a few fixes to make the current code work robustly on a wide range of examples (100+ scenes).

I tested the latest commit on main branch ( 2cea5de ), which crashes on many videos, I don't really have clues.

Particularly, I made a set of changes fixes several issues of the current project ( https://github.com/jb-ye/taichi_3d_gaussian_splatting/tree/eames ):
(1) It automatically scale the training image to no more than 1600 (if needed). Training higher resolution requires more fine-tuning of existing parameters, otherwise, you may observe undesirable outcomes like I mentioned here #144.

(2) It addresses the root cause of #119 and the issue was also reported by many other people on this forum who tried this repo.

The key idea to avoid numerical overflow is to always do a 2D convolution in pixel space after splatting 3D Gaussians. In official GS code here and here: the 2D Gaussian is convolved with an isotropic 2D Gaussian of σ=3​⋅I to simulate pixel integration (this is not in the paper). However, even official code is not done this in proper way. See also discussion here.

Let us say you have a 2D Gaussian with an opacity of 1 at the center. When doing a convolution with another 2D Gaussian, if the opacity is currently left unchanged the Gaussian will become larger while remaining opaque and may obscure Gaussians that are behind (we observed that on grid patterns).

When we implemented this strategy for Taichi GS (was not part of original implementation), we noticed in one example this issue becomes quite obvious:

aliasing_like.mp4

When render camera moves from far to close (near the captured distance), we observe the color on the grid pattern of acoustic amplifier changes and creates aliasing like effect (though it is not aliasing). This issue is not usually reflected in standard metrics of validation images if those validation images are captured at similar distance as training images.

Instead, the opacity should be reduced so that the (2D) integrated opacity of the resulting Gaussian is the same as the original one. Thus in the 3DGS code the opacity should be multiplied by the factor as sqrt(∣Σ+0.3I∣/∣Σ∣​)​ . We implemented this rescaling factor into Taichi GS, and observed that the aliasing like effect is gone:

no_aliasing_like.mp4

In our internal evaluation, I found this change also improves standard metrics (PSNR, SSIM, LPIPS).

(3) We also reduce the parameter densification-view-space-position-gradients-threshold: 3e-6 by half, and observe nice improvements in standard metrics albeit not being too slow.

Besides those fixes, I also implemented a simple offline renderer for rendering images from a custom trajectory.

I noticed a few more recent interesting changes (e.g. camera optimization, and depth cov loss), but they are not of our primary interest at the moment. Is there a way to move them to a separate experimental branch before they becomes mature and tested.

about coordinate system & camera poses

Hi there, thanks first for the work, it's great. I am looking to extend some rendering features, for example rendering to a video clip with a given camera trajectory. But as so far, my rendered frames' view looks weird and it's likely the transformation to camera is wrong.

What I do for now:

  • The data I am using is in instant-NGP format & coordinate system
  • I first retrieve the 4x4 transformation matrix from Transform.json
  • next apply a flip_x transformation to it (I notice you do this in your prepare_InstantNGP_with_mesh.py script, so I try the same way)
  • then I do a TRS decomposition from transformation matrix
  • at last pass R to q_pointcloud_camera and pass T to t_pointcloud_camera

Do you see any step I am missing or doing wrong? If not, I am thinking if there is coordinates mismatch, for example the coordinate handness or camera axis are defined differently. Please let me know if you have any idea or comments.

Thanks!

batching

thank you for the amazing code!

just got it training today in colab. i would like to try implementing batching. i saw it listed in the engineering TODOs. did you try this at all yet? anything to watch out for?

thanks again for making splatting more accessible!

Speeding up training stage

I am new to gaussian splatting and taichi as well. Is there quick way to speed up training process without loosing quality. Is it possible to do batch processing?

P.S. The training process takes more than 1 hour for 20k iterations on Tesla T4.

Training times/license

Hello,

I was wondering, how much slower are the training times than the official implementation? I am currently running your implementation and it seems significantly slower.

The other question, is the license. This is under Apache 2.0, which allows for commercial use, but official implementation does not allow this. Why the difference?

Thank you for your work!

一些问题想要咨询

感谢您出色的工作,我对您基于taichi实现的3d gaussion十分感兴趣,因此想向您咨询一些相关的问题

  1. 我想要在官方的实现上添加相机姿态优化,但是我不清楚梯度怎样由3d高斯的协方差矩阵传递到视角上,关于这部分的数学推理,您能否提供一些帮助?如果可以的话,那么我将不胜感激
  2. 关于溅射方法的相机的姿态估计,与基于光线方法的不同之处?
  3. 我尝试了您的实现,但是训练速度不足官方实现的1/10,大量的时间浪费在测试环节,我使用单张16gb的v100进行实验,请问您知道问题出在哪里嘛
  4. 我尝试基于3d高斯的官方实现完成场景的编辑,但是我发现当我旋转选中的子场景(像您demo中的鞋子一样),会导致一些不一致的情况存在,我有理由怀疑SH拟合的颜色是视角相关的,当在视角不变的情况下旋转点云,会导致SH基函数的不同,从而导致结果的不正常,但是我注意到您的demo中似乎不存在这样的问题,请问有什么建议可以解决该问题吗?

我是神经渲染领域的新手,非常期待您的回复,感恩的心,感谢有您!

'use default_factory' error on python 3.11

Hi, when I tried to run this initially I got the following error:

[Taichi] version 1.6.0, llvm 15.0.4, commit f1c6fbbd, linux, python 3.11.5
Traceback (most recent call last):
  File "/home/ashley/projects/taichi_3d_gaussian_splatting/gaussian_point_train.py", line 3, in <module>
    from taichi_3d_gaussian_splatting.GaussianPointTrainer import GaussianPointCloudTrainer
  File "/home/ashley/projects/taichi_3d_gaussian_splatting/taichi_3d_gaussian_splatting/GaussianPointTrainer.py", line 31, in <module>
    class GaussianPointCloudTrainer:
  File "/home/ashley/projects/taichi_3d_gaussian_splatting/taichi_3d_gaussian_splatting/GaussianPointTrainer.py", line 32, in GaussianPointCloudTrainer
    @dataclass
     ^^^^^^^^^
  File "/home/ashley/.pyenv/versions/3.11.5/lib/python3.11/dataclasses.py", line 1230, in dataclass
    return wrap(cls)
           ^^^^^^^^^
  File "/home/ashley/.pyenv/versions/3.11.5/lib/python3.11/dataclasses.py", line 1220, in wrap
    return _process_class(cls, init, repr, eq, order, unsafe_hash,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ashley/.pyenv/versions/3.11.5/lib/python3.11/dataclasses.py", line 958, in _process_class
    cls_fields.append(_get_field(cls, name, type, kw_only))
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ashley/.pyenv/versions/3.11.5/lib/python3.11/dataclasses.py", line 815, in _get_field
    raise ValueError(f'mutable default {type(f.default)} for field '
ValueError: mutable default <class 'taichi_3d_gaussian_splatting.GaussianPointCloudRasterisation.GaussianPointCloudRasterisation.GaussianPointCloudRasterisationConfig'> for field rasterisation_config is not allowed: use default_factory

This only occurs with python 3.11, 3.10 doesn't have any issues so it's not a deal-breaker.

How to improve result?

Hello @wanmeihuali , I have started doing some experiments with the code. The implementation is really good and clean, I like it. I have couple of questions regarding result quality.

  • Result is a bit blur and missing some fine details.
  • Because of black ground PSNR and SSIM score look good.

Left = Rendered View, Right = Groundtruth View
Screenshot 2023-09-29 134311

Questions:

  • Which parameter affeacts for image sharpness?
  • Is it possible to know a little description of parameters?
  • Can we use RayTune or Optuna tool to get tunned parameters?

P.S. I am ready to provide a PR if something is improving the result.

parquet

What are the parameters for each column in the parquet file,thank you for telling me !!!!

Tiles-like artifacts

@wanmeihuali

I found tiles-like artifacts when visualizing some parquet files, as shown in the attached image.

After my short inspection, I found that gaussian_alpha of the following lines takes value of inf in these pixels, which leads to the occupation of whole one tile by one color.
https://github.com/wanmeihuali/taichi_3d_gaussian_splatting/blob/main/taichi_3d_gaussian_splatting/GaussianPointCloudRasterisation.py#L403-L407

Do you have any idea on this issue or any suggestion for further inspection?

To reproduce the issue, please download the attached parquet file and run python3 visualizer.py --parquet_path_list refined.parquet .

By the way, thank you for sharing your great project! 😄

parquet.zip

Screenshot from 2023-09-05 11-36-17

what conic means

Hi !
It is an amazing work !
I noticed that you've used "conic" in many places in your code. I would like to understand the reasons for using "conic" and find some resources to help me comprehend why "conic" is used. Thank you.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.