Giter Club home page Giter Club logo

srt's Introduction

SRT: Scene Representation Transformer

This is an independent PyTorch implementation of SRT, as presented in the paper "Geometry-Free Novel View Synthesis Through Set-Latent Scene Representations" by Sajjadi et al., CVPR 2022.

The authors have kindly reviewed this code and confirmed that it appears to match their results. All credit for the model goes to the original authors.

New: This repo now also supports the improved version of SRT discussed in Appendix A.4 of the OSRT paper. It yields higher reconstruction accuracy, uses fewer parameters, and runs faster. An example checkpoint is provided below.

NMR RotationMSN Rotation

Setup

After cloning the repository and creating a new conda environment, the following steps will get you started:

Data

The code currently supports the following datasets. Simply download and place (or symlink) them in the data directory.

  • The 3D datasets introduced by ObSuRF.

  • The NMR multiclass ShapeNet dataset, hosted by Niemeyer et al. It may be downloaded here.

  • SRT's MultiShapeNet (MSN) dataset, specifically version 2.3.3. It may be downloaded via gsutil:

pip install gsutil
mkdir -p data/msn/multi_shapenet_frames/
gsutil -m cp -r gs://kubric-public/tfds/multi_shapenet_frames/2.3.3/ data/msn/multi_shapenet_frames/

Dependencies

This code requires at least Python 3.9 and PyTorch 1.11. Additional dependencies may be installed via pip -r requirements.txt. Note that Tensorflow is required to load SRT's MultiShapeNet data, though the CPU version suffices.

Rendering videos additionally depends on ffmpeg>=4.3 being available in your $PATH.

Running Experiments

Each run's config, checkpoints, and visualization are stored in a dedicated directory. Recommended configs can be found under runs/[dataset]/[model].

Training

To train a model on a single GPU, simply run e.g.:

python train.py runs/nmr/srt/config.yaml

To train on multiple GPUs on a single machine, launch multiple processes via Torchrun, where $NUM_GPUS is the number of GPUs to use:

torchrun --standalone --nnodes 1 --nproc_per_node $NUM_GPUS train.py runs/nmr/srt/config.yaml

Checkpoints are automatically stored in and (if available) loaded from the run directory. Visualizations and evaluations are produced periodically. Check the args of train.py for additional options. Importantly, to log training progress, use the --wandb flag to enable Weights & Biases.

Resource Requirements

The default training configurations require a significant amount of total VRAM.

  • SRT on NMR (nmr/srt/config.yaml) requires about 130GB, e.g. 4 A100 GPUs with 40GB VRAM, each.
  • SRT on MSN (msn/srt/config.yaml) requires about 350GB, e.g. 6 A100 GPUS with 80GB VRAM, each.

If you do not have those resources, consider modifying the config files by reducing the batch size (training: batch_size), the number of target points per scene (data: num_points), or both. The model has not appeared to be particularly sensitive to either.

Rendering videos

Videos may be rendered using render.py, e.g.

python render.py runs/nmr/srt/config.yaml --sceneid 1 --motion rotate_and_closeup --fade

Rendered frames and videos are placed in the run directory. Check the args of render.py for various camera movements, and compile_video.py for different ways of compiling videos.

Results

Here you find some checkpoints which partially reproduce the quantitative results of the paper.

Run Training Iterations Test Set PSNR Download
nmr/srt 3 Million 27.28 Link
msn/srt 4 Million* 23.39 Link
msn/isrt 2.8 Million** 24.84 Link

(*) The SRT MSN run was largely trained with a batch size of 192, due to memory constrains. (**) Similarly, the ISRT MSN run was trained with a batch size of 48, and 4096 target pixels per training scene.

Known Issues

On the NMR dataset, SRT overfits to the 24 camera positions in the training dataset (left). It will generally not produce coherent images when given other cameras at test time (right).

NMR Camera Overfitting

On the MSN dataset, this is not an issue, as cameras are densely sampled in the training data. The model even has some ability to produce closeups which use cameras outside of the training distribution.

MSN Closeup

Citation

@article{srt22,
  title={{Scene Representation Transformer: Geometry-Free Novel View Synthesis Through Set-Latent Scene Representations}},
  author={Mehdi S. M. Sajjadi and Henning Meyer and Etienne Pot and Urs Bergmann and Klaus Greff and Noha Radwan and Suhani Vora and Mario Lucic and Daniel Duckworth and Alexey Dosovitskiy and Jakob Uszkoreit and Thomas Funkhouser and Andrea Tagliasacchi},
  journal={{CVPR}},
  year={2022},
  url={https://srt-paper.github.io/},
}

srt's People

Contributors

stelzner avatar

Stargazers

ZHANG Yifei avatar Jinhyeok Kim avatar lizhan avatar  avatar Sergey Prokudin avatar  avatar  avatar Zilong Chen avatar Yuan Li avatar  avatar  avatar Zhang JueXiao avatar OldSix avatar WendyYang avatar Kim Youwang avatar Hao Li avatar Zheng Chen avatar  avatar Shijie Li avatar wangpu avatar Yongjia Ma avatar  avatar JINGJUN TAO avatar Luke Meyers avatar yare sama avatar kiui avatar weihao avatar Chenguo Lin avatar Hatoyst avatar PDC avatar LordLiang avatar Ziyi Wu avatar Yang Fu avatar Moreland avatar Ziqi Lu avatar Andrew Marmon avatar Jeff Carpenter avatar Waikit Chan avatar Luis Denninger avatar Kairo Morton avatar Fan Yang avatar  avatar Aditya Vora avatar Vivek Gopalakrishnan avatar tm avatar Wenbo Chen avatar Peng Wang avatar  avatar YouSiki avatar  avatar Rudy avatar Gonglin Chen avatar cogitoErgoSum avatar Tianrui Guan avatar James Perlman avatar  avatar paulpanwang avatar Seok-Ju Hahn (Adam) avatar GAAP avatar Ondrej Biza avatar Zhixuan Xu avatar  avatar  avatar Li Xin avatar  avatar Edson-Niu avatar PRAYER avatar fantastic_levio avatar Shunying avatar Liu Yufei avatar Ning Wang avatar Haipeng Wang avatar AlexC avatar lucas avatar Long(Tony) Lian avatar  avatar Roman Milishchuk avatar lg(x) avatar Sachin Chanchani avatar yangchao avatar Jiazhi Yang avatar smellslikeml avatar Xiaojian Ma avatar zijianwang avatar Christian Milianti avatar Angel Villar-Corrales avatar  avatar  avatar Baoxiong Jia avatar Zan Wang avatar Yoon, Seungje avatar Jinghuan Shang avatar Jeonghwan Kim avatar ustcygl avatar  avatar Fangneng Zhan avatar Qilong avatar Bernard Spiegl avatar Konpat avatar Kuroko avatar

Watchers

cheng zhang avatar  avatar  avatar  avatar jimmyhu avatar Benjamin avatar  avatar Kostas Georgiou avatar Pyjcsx avatar Qiang Liu avatar Matt Shaffer avatar

srt's Issues

Use sparse view image as input for inference

Thanks for sharing and great work! from the readme it seems that run.py can render selected scene. I am wondering how to generate and render novel scene views from a sparse set of the images as mentioned in paper? thanks.

Extrinsic/Intrisic Camera Input

Thank you for your implementation. I was looking at the SRTEncoder class and I was wondering how the camera extrinsic/intrinsic matrices are incorporated into the input. I thought typical tensors representing intrinsic camera matrices could be [batch_size, num_images, 3, 3], and extrinsic camera matrices as [batch_size, num_images, 4, 4]. How is this incorporated into the new arguments for the encoder, camera_pos with size [batch_size, num_images, 3] and rays with [batch_size, num_images, height, width, 3]?

Thanks again.

MSN-Hard dataset issue

Thanks for your wonderful work, I can not find the MSN-Hard dataset used in osrt, how should I download the MSN-Hard dataset?

"Known Issues" NMR Example Render does not Match Render at the top of Readme

At the beginning of the Readme the rendered rotation video from the NMR dataset is as follows:
NMR Rotation

However, later in the Readme you express that overfitting on the 24 input views is a "known issue". As such, is there a different configuration that you used to fix the overfitting issue and render the video above? Also, given that the authors checked over the codebase, do they know about the existence of this overfitting issue and did they provide any insight on how to mitigate it?

Known Issue video:
NMR Camera Overfitting

Dataset multi_shapenet_frames not found.

Hi,
I have installed sund 0.4.1, running render.py have an exception at

        builder = sunds.builder('multi_shapenet', data_dir=self.path)

        self.tf_dataset = builder.as_dataset(

Exception has occurred: DatasetNotFoundError
Dataset multi_shapenet_frames not found.

any idea on that would be appreciated. thanks.

Multishapenet dataset issue

Hello, loading the msn data raises a typeError: type uint8 not recognized dialogue. Was the dataset updated since the repository was made, or is there a modification needed?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.