Giter Club home page Giter Club logo

ml-pgdvs's Introduction

Pseudo-Generalized Dynamic View Synthesis

ICLR 2024

Pseudo-Generalized Dynamic View Synthesis from a Video, ICLR 2024.
Xiaoming Zhao, Alex Colburn, Fangchang Ma, Miguel Ángel Bautista, Joshua M Susskind, and Alexander G. Schwing.

Table of Contents

Environment Setup

This code has been tested on Ubuntu 20.04 with CUDA 11.8 on NVIDIA A100-SXM4-80GB GPU (driver 470.82.01).

We recommend using conda for virtual environment control and libmamba for a faster dependency check.

# setup libmamba
conda install -n base conda-libmamba-solver -y
conda config --set solver libmamba

# create virtual environment
conda env create -f envs/pgdvs.yaml

conda activate pgdvs
conda install pytorch3d=0.7.4 -c pytorch3d -y

[optional] Run the following to install JAX if you want to

  1. try TAPIR
  2. evaluate with metrics computation from DyCheck
conda activate pgdvs
pip install -r envs/requirements_jax.txt --verbose

To check that JAX is installed correctly, run the following. NOTE: the first import torch is important since it will make sure that JAX finds the cuDNN installed by conda.

conda activate pgdvs
python -c "import torch; from jax import random; key = random.PRNGKey(0); x = random.normal(key, (10,)); print(x)"

Try PGDVS on Video in the Wild

Download Checkpoints

# this environment variable is used for demonstration
cd /path/to/this/repo
export PGDVS_ROOT=$PWD

Since we use third parties's pretrained models, we provide two ways to download them:

  1. Directly download from those official repositories;
  2. Download from our copy for reproducing results in the paper just in case those official repositories's checkpoints are modified in the future.
FLAG_ORIGINAL=1  # set to 0 if you want to download from our copy
bash ${PGDVS_ROOT}/scripts/download_ckpts.sh ${PGDVS_ROOT}/ckpts ${FLAG_ORIGINAL}

Example of DAVIS

We use DAVIS as an example to illustrate how to render novel view from monocular videos in the wild. Please see IN_THE_WILD.md for details.

Benchmarking

Please see BENCHMARK_NVIDIA.md and BENCHMARK_iPhone.md for details about reproducing results on NVIDIA Dynamic Scenes and DyCheck's iPhone Dataset in the paper.

Citation

Xiaoming Zhao, Alex Colburn, Fangchang Ma, Miguel Ángel Bautista, Joshua M Susskind, and Alexander G. Schwing. Pseudo-Generalized Dynamic View Synthesis from a Video. ICLR 2024.

@inproceedings{Zhao2024PGDVS,
  title={{Pseudo-Generalized Dynamic View Synthesis from a Video}},
  author={Xiaoming Zhao and Alex Colburn and Fangchang Ma and Miguel Angel Bautista and Joshua M. Susskind and Alexander G. Schwing},
  booktitle={ICLR},
  year={2024},
}

License

This sample code is released under the LICENSE terms.

Acknowledgements

Our project is not possible without the following ones:

ml-pgdvs's People

Contributors

alexschwing avatar xiaoming-zhao avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ml-pgdvs's Issues

Getting error when trying to visualize nvidia data: Error executing job with overrides:[]

Hello, when I try to run "Spatial Temporal Interpolation Visualizations" on the benchmark of nvidia dynamic scenes, I got a problem like this:

gnt: 91%|█████████ | 70/77 [2:55:22<17:32, 150.33s/it] gnt: 92%|█████████▏| 71/77 [2:57:54<15:04, 150.74s/it] gnt: 94%|█████████▎| 72/77 [3:00:22<12:30, 150.04s/it] gnt: 95%|█████████▍| 73/77 [3:02:54<10:02, 150.56s/it] gnt: 96%|█████████▌| 74/77 [3:05:23<07:30, 150.10s/it] gnt: 97%|█████████▋| 75/77 [3:07:51<04:58, 149.32s/it] gnt: 99%|█████████▊| 76/77 [3:10:19<02:29, 149.11s/it] gnt: 100%|██████████| 77/77 [3:12:23<00:00, 149.92s/it] Error executing job with overrides: ['verbose=true', 'distributed=false', 'seed=0', 'resume=vis_wo_resume', 'resume_dir=null', 'engine=visualizer_pgdvs', 'model=pgdvs_renderer', 'model.softsplat_metric_abs_alpha=100.0', 'static_renderer=gnt', 'static_renderer.model_cfg.ckpt_path=/data/code/ml-pgdvs/ckpts/gnt/model_720000.pth', 'series_eval=false', 'eval_batch_size=1', 'n_max_eval_data=-1', 'eval_save_individual=true', 'engine.engine_cfg.render_cfg.render_stride=1', 'engine.engine_cfg.render_cfg.chunk_size=2048', 'engine.engine_cfg.render_cfg.sample_inv_uniform=true', 'engine.engine_cfg.render_cfg.n_coarse_samples_per_ray=256', 'engine.engine_cfg.render_cfg.n_fine_samples_per_ray=0', 'engine.engine_cfg.render_cfg.mask_oob_n_proj_thres=1', 'engine.engine_cfg.render_cfg.mask_invalid_n_proj_thres=4', 'engine.engine_cfg.render_cfg.dyn_pcl_remove_outlier=true', 'engine.engine_cfg.render_cfg.dyn_pcl_outlier_knn=50', 'engine.engine_cfg.render_cfg.dyn_pcl_outlier_std_thres=0.1', 'engine.engine_cfg.render_cfg.gnt_use_dyn_mask=true', 'engine.engine_cfg.render_cfg.gnt_use_masked_spatial_src=false', 'engine.engine_cfg.render_cfg.dyn_render_use_flow_consistency=false', 'dataset=combined', 'dataset.dataset_list.train=[nvidia_eval]', 'dataset.dataset_list.eval=[nvidia_eval]', 'dataset.dataset_list.vis=[nvidia_vis]', 'dataset.dataset_specifics.mono_vis.scene_ids=[Balloon1]', 'dataset.data_root=/data/code/ml-pgdvs/data', 'n_dataloader_workers=1', 'dataset_max_hw=-1', 'dataset.use_aug=false', 'dataset.dataset_list.vis=[nvidia_vis]', 'dataset.dataset_specifics.nvidia_vis.scene_ids=[Balloon1]', 'vis_specifics.n_render_frames=400', 'vis_specifics.vis_center_time=50', 'vis_specifics.vis_time_interval=50', 'vis_specifics.vis_bt_max_disp=32'] Traceback (most recent call last): File "/data/code/ml-pgdvs/pgdvs/run.py", line 267, in <module> cli() File "/data/apps/anaconda3/envs/pgdvs/lib/python3.10/site-packages/hydra/main.py", line 90, in decorated_main _run_hydra( File "/data/apps/anaconda3/envs/pgdvs/lib/python3.10/site-packages/hydra/_internal/utils.py", line 389, in _run_hydra _run_app( File "/data/apps/anaconda3/envs/pgdvs/lib/python3.10/site-packages/hydra/_internal/utils.py", line 452, in _run_app run_and_report( File "/data/apps/anaconda3/envs/pgdvs/lib/python3.10/site-packages/hydra/_internal/utils.py", line 216, in run_and_report raise ex File "/data/apps/anaconda3/envs/pgdvs/lib/python3.10/site-packages/hydra/_internal/utils.py", line 213, in run_and_report return func() File "/data/apps/anaconda3/envs/pgdvs/lib/python3.10/site-packages/hydra/_internal/utils.py", line 453, in <lambda> lambda: hydra.run( File "/data/apps/anaconda3/envs/pgdvs/lib/python3.10/site-packages/hydra/_internal/hydra.py", line 132, in run _ = ret.return_value File "/data/apps/anaconda3/envs/pgdvs/lib/python3.10/site-packages/hydra/core/utils.py", line 260, in return_value raise self._return_value File "/data/apps/anaconda3/envs/pgdvs/lib/python3.10/site-packages/hydra/core/utils.py", line 186, in run_job ret.return_value = task_function(task_cfg) File "/data/code/ml-pgdvs/pgdvs/run.py", line 263, in cli run(cfg, hydra_config) File "/data/code/ml-pgdvs/pgdvs/run.py", line 192, in run return _distributed_worker( File "/data/code/ml-pgdvs/pgdvs/run.py", line 146, in _distributed_worker output = engine.run() File "/data/apps/anaconda3/envs/pgdvs/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/data/code/ml-pgdvs/pgdvs/engines/visualizer_pgdvs.py", line 26, in run self.vis_model() File "/data/apps/anaconda3/envs/pgdvs/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/data/code/ml-pgdvs/pgdvs/engines/visualizer_pgdvs.py", line 91, in vis_model ret_dict = self._get_model_module(self.model).forward( File "/data/code/ml-pgdvs/pgdvs/renderers/pgdvs_renderer.py", line 146, in forward (render_dyn_rgb, render_dyn_mask, render_dyn_info) = self.dyn_renderer( File "/data/apps/anaconda3/envs/pgdvs/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/data/apps/anaconda3/envs/pgdvs/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/data/code/ml-pgdvs/pgdvs/renderers/pgdvs_renderer_dyn.py", line 184, in forward splat_dyn_img_full, softsplat_metric_src1_to_src2 = self.softsplat_img( File "/data/code/ml-pgdvs/pgdvs/renderers/pgdvs_renderer_base.py", line 80, in softsplat_img splat_img_src1_to_tgt = softsplat.softsplat( File "/data/code/ml-pgdvs/pgdvs/utils/softsplat.py", line 311, in softsplat tenOut = softsplat_func.apply(tenIn, tenFlow) File "/data/apps/anaconda3/envs/pgdvs/lib/python3.10/site-packages/torch/autograd/function.py", line 539, in apply return super().apply(*args, **kwargs) # type: ignore[misc] File "/data/apps/anaconda3/envs/pgdvs/lib/python3.10/site-packages/torch/cuda/amp/autocast_mode.py", line 121, in decorate_fwd return fwd(*args, **kwargs) File "/data/code/ml-pgdvs/pgdvs/utils/softsplat.py", line 421, in forward assert False AssertionError

I have followed the instruction of "https://github.com/apple/ml-pgdvs/blob/main/docs/BENCHMARK_NVIDIA.md" , and I just want to show the visuable result. I guess this may be related to the config process, but how could I solve this problem? Thank you!

Environment: CUDA 11.8 + nvidia_A800 and without jax

Pretrained gnt model install

Hello, I notice that there is a pretrained gnt model in your download_ckpts.sh like this:

if [ "${FLAG_ORIGINAL}" == "1" ]; then
# GNT
if [ ! -f ${DATA_ROOT}/gnt/generalized_model_720000.pth ]; then
gdown 1AMN0diPeHvf2fw53IO5EE2Qp4os5SkoX -O ${DATA_ROOT}/gnt/
fi

However, I couldn't find this file on google drive and they tell me it not exist. How could I get this?
Thanks!

DyCheckCamera conventions

Hello,

I was trying to load the DyCheck dataset in another framework and stumbled upon your fixes to a DyCheckCamera class.

I seem to be understanding that you assume OpenCV conventions everywhere after you, for example, call cam.extrin and obtain the extrinsics matrix (according to comments world-to-camera).

However, when you call the extrin property note that you also return the translation as -orientation @ position, implying that the position is in camera-to-world and you're taking the inverse of the stored position as $-R^\top t$ -- and the orientation is coherently assumed to be stored as camera-to-world, so that $R^\top$ is its inverse.

Have you confirmed this system is consistent with the expected trajectory, e.g., on the paper-windmill example from the iphone dataset in dycheck-release?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.