apple / ml-pgdvs Goto Github PK

View Code? Open in Web Editor NEW

21.0 9.0 0.0 1.46 MB

[ICLR 2024] Official implementation of "Pseudo-Generalized Dynamic View Synthesis from a Video"

Home Page: https://xiaoming-zhao.github.io/projects/pgdvs/

License: Other

Python 95.34% Shell 4.66%

3d-vision dynamic-view-synthesis view-synthesis neural-rendering novel-view-synthesis

ml-pgdvs's Introduction

Pseudo-Generalized Dynamic View Synthesis

ICLR 2024

Pseudo-Generalized Dynamic View Synthesis from a Video, ICLR 2024.
Xiaoming Zhao, Alex Colburn, Fangchang Ma, Miguel Ángel Bautista, Joshua M Susskind, and Alexander G. Schwing.

Environment Setup

This code has been tested on Ubuntu 20.04 with CUDA 11.8 on NVIDIA A100-SXM4-80GB GPU (driver 470.82.01).

We recommend using conda for virtual environment control and libmamba for a faster dependency check.

# setup libmamba
conda install -n base conda-libmamba-solver -y
conda config --set solver libmamba

# create virtual environment
conda env create -f envs/pgdvs.yaml

conda activate pgdvs
conda install pytorch3d=0.7.4 -c pytorch3d -y

[optional] Run the following to install JAX if you want to

try TAPIR
evaluate with metrics computation from DyCheck

conda activate pgdvs
pip install -r envs/requirements_jax.txt --verbose

To check that JAX is installed correctly, run the following. NOTE: the first import torch is important since it will make sure that JAX finds the cuDNN installed by conda.

conda activate pgdvs
python -c "import torch; from jax import random; key = random.PRNGKey(0); x = random.normal(key, (10,)); print(x)"

Try PGDVS on Video in the Wild

Download Checkpoints

# this environment variable is used for demonstration
cd /path/to/this/repo
export PGDVS_ROOT=$PWD

Since we use third parties's pretrained models, we provide two ways to download them:

Directly download from those official repositories;
Download from our copy for reproducing results in the paper just in case those official repositories's checkpoints are modified in the future.

FLAG_ORIGINAL=1  # set to 0 if you want to download from our copy
bash ${PGDVS_ROOT}/scripts/download_ckpts.sh ${PGDVS_ROOT}/ckpts ${FLAG_ORIGINAL}

Example of DAVIS

We use DAVIS as an example to illustrate how to render novel view from monocular videos in the wild. Please see IN_THE_WILD.md for details.

Benchmarking

Please see BENCHMARK_NVIDIA.md and BENCHMARK_iPhone.md for details about reproducing results on NVIDIA Dynamic Scenes and DyCheck's iPhone Dataset in the paper.

Citation

Xiaoming Zhao, Alex Colburn, Fangchang Ma, Miguel Ángel Bautista, Joshua M Susskind, and Alexander G. Schwing. Pseudo-Generalized Dynamic View Synthesis from a Video. ICLR 2024.

@inproceedings{Zhao2024PGDVS,
  title={{Pseudo-Generalized Dynamic View Synthesis from a Video}},
  author={Xiaoming Zhao and Alex Colburn and Fangchang Ma and Miguel Angel Bautista and Joshua M. Susskind and Alexander G. Schwing},
  booktitle={ICLR},
  year={2024},
}

License

This sample code is released under the LICENSE terms.

Acknowledgements

Our project is not possible without the following ones:

GNT (commit 7b63996cb807dbb5c95ab6898e8093996588e73a)
RAFT (commit 3fa0bb0a9c633ea0a9bb8a79c576b6785d4e6a02)
OneFormer (commit 56799ef9e02968af4c7793b30deabcbeec29ffc0)
segment-anything (commit 6fdee8f2727f4506cfbbe553e23b895e27956588)
ZoeDepth (commit edb6daf45458569e24f50250ef1ed08c015f17a7)
TAPIR (commit 4ac6b2acd0aed36c0762f4247de9e8630340e2e0)
CoTracker (commit 0a0596b277545625054cb041f00419bcd3693ea5)
casualSAM (we use our modified version)
dynamic-video-depth (we use our modified version)

ml-pgdvs's People

Contributors

Stargazers

Watchers

ml-pgdvs's Issues

Getting error when trying to visualize nvidia data: Error executing job with overrides:[]

Hello, when I try to run "Spatial Temporal Interpolation Visualizations" on the benchmark of nvidia dynamic scenes, I got a problem like this:

gnt: 91%|█████████ | 70/77 [2:55:22<17:32, 150.33s/it] gnt: 92%|█████████▏| 71/77 [2:57:54<15:04, 150.74s/it] gnt: 94%|█████████▎| 72/77 [3:00:22<12:30, 150.04s/it] gnt: 95%|█████████▍| 73/77 [3:02:54<10:02, 150.56s/it] gnt: 96%|█████████▌| 74/77 [3:05:23<07:30, 150.10s/it] gnt: 97%|█████████▋| 75/77 [3:07:51<04:58, 149.32s/it] gnt: 99%|█████████▊| 76/77 [3:10:19<02:29, 149.11s/it] gnt: 100%|██████████| 77/77 [3:12:23<00:00, 149.92s/it] Error executing job with overrides: ['verbose=true', 'distributed=false', 'seed=0', 'resume=vis_wo_resume', 'resume_dir=null', 'engine=visualizer_pgdvs', 'model=pgdvs_renderer', 'model.softsplat_metric_abs_alpha=100.0', 'static_renderer=gnt', 'static_renderer.model_cfg.ckpt_path=/data/code/ml-pgdvs/ckpts/gnt/model_720000.pth', 'series_eval=false', 'eval_batch_size=1', 'n_max_eval_data=-1', 'eval_save_individual=true', 'engine.engine_cfg.render_cfg.render_stride=1', 'engine.engine_cfg.render_cfg.chunk_size=2048', 'engine.engine_cfg.render_cfg.sample_inv_uniform=true', 'engine.engine_cfg.render_cfg.n_coarse_samples_per_ray=256', 'engine.engine_cfg.render_cfg.n_fine_samples_per_ray=0', 'engine.engine_cfg.render_cfg.mask_oob_n_proj_thres=1', 'engine.engine_cfg.render_cfg.mask_invalid_n_proj_thres=4', 'engine.engine_cfg.render_cfg.dyn_pcl_remove_outlier=true', 'engine.engine_cfg.render_cfg.dyn_pcl_outlier_knn=50', 'engine.engine_cfg.render_cfg.dyn_pcl_outlier_std_thres=0.1', 'engine.engine_cfg.render_cfg.gnt_use_dyn_mask=true', 'engine.engine_cfg.render_cfg.gnt_use_masked_spatial_src=false', 'engine.engine_cfg.render_cfg.dyn_render_use_flow_consistency=false', 'dataset=combined', 'dataset.dataset_list.train=[nvidia_eval]', 'dataset.dataset_list.eval=[nvidia_eval]', 'dataset.dataset_list.vis=[nvidia_vis]', 'dataset.dataset_specifics.mono_vis.scene_ids=[Balloon1]', 'dataset.data_root=/data/code/ml-pgdvs/data', 'n_dataloader_workers=1', 'dataset_max_hw=-1', 'dataset.use_aug=false', 'dataset.dataset_list.vis=[nvidia_vis]', 'dataset.dataset_specifics.nvidia_vis.scene_ids=[Balloon1]', 'vis_specifics.n_render_frames=400', 'vis_specifics.vis_center_time=50', 'vis_specifics.vis_time_interval=50', 'vis_specifics.vis_bt_max_disp=32'] Traceback (most recent call last): File "/data/code/ml-pgdvs/pgdvs/run.py", line 267, in <module> cli() File "/data/apps/anaconda3/envs/pgdvs/lib/python3.10/site-packages/hydra/main.py", line 90, in decorated_main _run_hydra( File "/data/apps/anaconda3/envs/pgdvs/lib/python3.10/site-packages/hydra/_internal/utils.py", line 389, in _run_hydra _run_app( File "/data/apps/anaconda3/envs/pgdvs/lib/python3.10/site-packages/hydra/_internal/utils.py", line 452, in _run_app run_and_report( File "/data/apps/anaconda3/envs/pgdvs/lib/python3.10/site-packages/hydra/_internal/utils.py", line 216, in run_and_report raise ex File "/data/apps/anaconda3/envs/pgdvs/lib/python3.10/site-packages/hydra/_internal/utils.py", line 213, in run_and_report return func() File "/data/apps/anaconda3/envs/pgdvs/lib/python3.10/site-packages/hydra/_internal/utils.py", line 453, in <lambda> lambda: hydra.run( File "/data/apps/anaconda3/envs/pgdvs/lib/python3.10/site-packages/hydra/_internal/hydra.py", line 132, in run _ = ret.return_value File "/data/apps/anaconda3/envs/pgdvs/lib/python3.10/site-packages/hydra/core/utils.py", line 260, in return_value raise self._return_value File "/data/apps/anaconda3/envs/pgdvs/lib/python3.10/site-packages/hydra/core/utils.py", line 186, in run_job ret.return_value = task_function(task_cfg) File "/data/code/ml-pgdvs/pgdvs/run.py", line 263, in cli run(cfg, hydra_config) File "/data/code/ml-pgdvs/pgdvs/run.py", line 192, in run return _distributed_worker( File "/data/code/ml-pgdvs/pgdvs/run.py", line 146, in _distributed_worker output = engine.run() File "/data/apps/anaconda3/envs/pgdvs/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/data/code/ml-pgdvs/pgdvs/engines/visualizer_pgdvs.py", line 26, in run self.vis_model() File "/data/apps/anaconda3/envs/pgdvs/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/data/code/ml-pgdvs/pgdvs/engines/visualizer_pgdvs.py", line 91, in vis_model ret_dict = self._get_model_module(self.model).forward( File "/data/code/ml-pgdvs/pgdvs/renderers/pgdvs_renderer.py", line 146, in forward (render_dyn_rgb, render_dyn_mask, render_dyn_info) = self.dyn_renderer( File "/data/apps/anaconda3/envs/pgdvs/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/data/apps/anaconda3/envs/pgdvs/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/data/code/ml-pgdvs/pgdvs/renderers/pgdvs_renderer_dyn.py", line 184, in forward splat_dyn_img_full, softsplat_metric_src1_to_src2 = self.softsplat_img( File "/data/code/ml-pgdvs/pgdvs/renderers/pgdvs_renderer_base.py", line 80, in softsplat_img splat_img_src1_to_tgt = softsplat.softsplat( File "/data/code/ml-pgdvs/pgdvs/utils/softsplat.py", line 311, in softsplat tenOut = softsplat_func.apply(tenIn, tenFlow) File "/data/apps/anaconda3/envs/pgdvs/lib/python3.10/site-packages/torch/autograd/function.py", line 539, in apply return super().apply(*args, **kwargs) # type: ignore[misc] File "/data/apps/anaconda3/envs/pgdvs/lib/python3.10/site-packages/torch/cuda/amp/autocast_mode.py", line 121, in decorate_fwd return fwd(*args, **kwargs) File "/data/code/ml-pgdvs/pgdvs/utils/softsplat.py", line 421, in forward assert False AssertionError

I have followed the instruction of "https://github.com/apple/ml-pgdvs/blob/main/docs/BENCHMARK_NVIDIA.md" , and I just want to show the visuable result. I guess this may be related to the config process, but how could I solve this problem? Thank you!

Environment: CUDA 11.8 + nvidia_A800 and without jax

Pretrained gnt model install

Hello, I notice that there is a pretrained gnt model in your download_ckpts.sh like this:

if [ "${FLAG_ORIGINAL}" == "1" ]; then
# GNT
if [ ! -f ${DATA_ROOT}/gnt/generalized_model_720000.pth ]; then
gdown 1AMN0diPeHvf2fw53IO5EE2Qp4os5SkoX -O ${DATA_ROOT}/gnt/
fi

However, I couldn't find this file on google drive and they tell me it not exist. How could I get this?
Thanks!

How can you make sure each pixel has its corresponding point cloud?

Since the point clouds are filtered by a cycle consistent check, how do you inpaint the filtered points?

How can you make sure each pixel has its corresponding point cloud?

I notice that you project the point cloud to the image plane for rendering. But how can you make sure each pixel has its corresponding point cloud? Are there any constrains?

DyCheckCamera conventions

Hello,

I was trying to load the DyCheck dataset in another framework and stumbled upon your fixes to a DyCheckCamera class.

I seem to be understanding that you assume OpenCV conventions everywhere after you, for example, call cam.extrin and obtain the extrinsics matrix (according to comments world-to-camera).

However, when you call the extrin property note that you also return the translation as -orientation @ position, implying that the position is in camera-to-world and you're taking the inverse of the stored position as $-R^\top t$ -- and the orientation is coherently assumed to be stored as camera-to-world, so that $R^\top$ is its inverse.

Have you confirmed this system is consistent with the expected trajectory, e.g., on the paper-windmill example from the iphone dataset in dycheck-release?

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.