Giter Club home page Giter Club logo

instructscene's Introduction

[ICLR 2024 spotlight] InstructScene

InstructScene: Instruction-Driven 3D Indoor Scene Synthesis with Semantic Graph Prior

Chenguo Lin, Yadong Mu

arXiv Project page License: MIT

bedroom diningroom livingroom

pipeline

This repository contains the official implementation of the paper: InstructScene: Instruction-Driven 3D Indoor Scene Synthesis with Semantic Graph Prior, which is accepted by ICLR 2024 for spotlight presentation. InstructScene is a generative framework to synthesize 3D indoor scenes from instructions. It is composed of a semantic graph prior and a layout decoder.

Feel free to contact me ([email protected]) or open an issue if you have any questions or suggestions.

๐Ÿ“ข News

  • 2024-02-28: The pretrained weights of fVQ-VAE are released.
  • 2024-02-28: The source code and preprocessed dataset are released.
  • 2024-02-07: The paper is available on arXiv.
  • 2024-01-16: InstructScene is accepted by ICLR 2024 for spotlight presentation.

๐Ÿ“‹ TODO

  • Release the training and evaluation code
  • Release the preprocessed dataset and rendered images on HuggingFace
  • Release the pretrained weights of fVQ-VAE to quantize OpenShape features of 3D-FRONT objects
  • Release the dataset preprocessing scripts (ChatGPT API usage, quantization of OpenShape features, etc.)
  • Add an "FAQ" section to provide detailed explanations on the implementation

๐Ÿ”ง Installation

You may need to modify the specific version of torch in settings/setup.sh according to your CUDA version. There are not restrictions on the torch version, feel free to use your preferred one.

git clone https://github.com/chenguolin/InstructScene.git
cd InstructScene
bash settings/setup.sh

Download the Blender software for visualization.

cd blender
wget https://download.blender.org/release/Blender3.3/blender-3.3.1-linux-x64.tar.xz
tar -xvf blender-3.3.1-linux-x64.tar.xz
rm blender-3.3.1-linux-x64.tar.xz

๐Ÿ“Š Dataset

Dataset used in InstructScene is based on 3D-FORNT and 3D-FUTURE. Please refer to the instructions provided in their official website to download the original dataset. One can refer to the dataset preprocessing scripts in ATISS and DiffuScene, which are similar to ours.

We provide the preprocessed instruction-scene paired dataset used in the paper and rendered images for evaluation on HuggingFace.

import os
from huggingface_hub import hf_hub_url
url = hf_hub_url(repo_id="chenguolin/InstructScene_dataset", filename="InstructScene.zip", repo_type="dataset")
os.system(f"wget {url} && unzip InstructScene.zip")
url = hf_hub_url(repo_id="chenguolin/InstructScene_dataset", filename="3D-FRONT.zip", repo_type="dataset")
os.system(f"wget {url} && unzip 3D-FRONT.zip")

Please refer to dataset/README.md for more details.

๐Ÿ‘€ Visualization

We provide a helpful script to visualize synthesized scenes by Blender. Please refer to blender/README.md for more details.

We also provide many useful visualization functions in src/utils/visualize.py, including creating appropriate floor plans, drawing scene graphs, adding instructions as titles in the rendered images, making gifs, etc.

๐Ÿš€ Usage

Note that:

  • All scripts in this project are executed in only one GPU. It takes 1~3 days to train the semantic graph prior or layout decoder on a single NVIDIA A40 GPU depending on the room type.

  • We use TensorBoard to track the training process by executing tensorboard --logdir out/.

  • The training of "1. layout decoder" and "2. semantic graph prior" are independent and can be trained parallelly, as we use ground-truth semantic graphs to train the layout decoder. During inference, to render syntheiszed scenes from instruction prompts, one needs to have both the semantic graph prior and the layout decoder trained.

0๏ธ. ๐Ÿ“ฆ fVQ-VAE: quantize OpenShape/CLIP features of objects

Training

We provide the pretrained weights of fVQ-VAE on HuggingFace. Our preprocessed dataset contains the original OpenShape features and correspondingly quantization indices.

import os
from huggingface_hub import hf_hub_url
os.system("mkdir -p out/threedfront_objfeat_vqvae/checkpoints")
url = hf_hub_url(repo_id="chenguolin/InstructScene_dataset", filename="threedfront_objfeat_vqvae_epoch_01999.pth", repo_type="dataset")
os.system(f"wget {url} -O out/threedfront_objfeat_vqvae/checkpoints/epoch_01999.pth")

You can also train the fVQ-VAE from scratch. However, you should update the quantization indices in the dataset (stored in dataset/InstructScene/threed_front_<room_type>/<scene_id>/models_info.pkl) accordingly.

# bash scripts/train_objfeatvqvae.sh <tag> <gpu_id>
bash scripts/train_objfeatvqvae.sh threedfront_objfeat_vqvae 0

Inference (only for debugging)

# bash scripts/inference_objfeatvqvae.sh <tag> <gpu_id> <epoch>
bash scripts/inference_objfeatvqvae.sh threedfront_objfeat_vqvae 0 -1
# '-1' means the latest checkpoint

1๏ธ. ๐Ÿฆพ Layout Decoder: embody 3D scenes from semantic graphs

Training

# bash scripts/train_sg2sc_objfeat.sh <room_type> <tag> <gpu_id> <fvqvae_tag>
bash scripts/train_sg2sc_objfeat.sh bedroom bedroom_sg2scdiffusion_objfeat 0 threedfront_objfeat_vqvae

Inference (only for debugging)

# bash scripts/inference_sg2sc_objfeat.sh <room_type> <tag> <gpu_id> <epoch> <fvqvae_tag> <(optional) cfg_scale>
bash scripts/inference_sg2sc_objfeat.sh bedroom bedroom_sg2scdiffusion_objfeat 0 -1 threedfront_objfeat_vqvae 1.0

To visualize synthesized scenes, replace --n_scene 0 in scripts/inference_sg2sc_objfeat.sh to --n_scenes 5 --visualize --resolution 1024, which means to visualize 5 synthesized scenes and save the rendered images with a resolution of 1024x1024. Otherwise, it will only compute the iRecall score for evaluation.

2๏ธ. ๐Ÿค– Semantic Graph Prior: design semantic graphs from instructions

Training

# bash scripts/train_sg_vq_objfeat.sh <room_type> <tag> <gpu_id>
bash scripts/train_sg_vq_objfeat.sh bedroom bedroom_sgdiffusion_vq_objfeat 0

Inference

# bash scripts/inference_sg_vq_objfeat.sh <room_type> <tag> <gpu_id> <epoch> <fvqvae_tag> <sg2sc_tag> <(optional) cfg_scale> <(optional) sg2sc_cfg_scale>
bash scripts/inference_sg_vq_objfeat.sh bedroom bedroom_sgdiffusion_vq_objfeat 0 -1 threedfront_objfeat_vqvae bedroom_sg2scdiffusion_objfeat 1.0 1.0

To visualize synthesized scenes, replace --n_scene 0 in scripts/inference_sg_vq_objfeat.sh to --n_scenes 5 --visualize --resolution 1024, which means to visualize 5 synthesized scenes and save the rendered images with a resolution of 1024x1024. Otherwise, it will only compute the iRecall score for evaluation.

Evaluation

Evaluation should be conducted after the inference script is executed with the --visualize flag, which will save the rendered images in the output directory.

FID, CLIP-FID and KID
python3 src/compute_fid_scores.py configs/bedroom_sgdiffusion_vq_objfeat.yaml --tag bedroom_sgdiffusion_vq_objfeat --checkpoint_epoch -1
SCA (scene classification accuracy)
python3 src/synthetic_vs_real_classifier.py configs/bedroom_sgdiffusion_vq_objfeat.yaml --tag bedroom_sgdiffusion_vq_objfeat --checkpoint_epoch -1

Applications

Replace the python file name in scripts/inference_sg_vq_objfeat.sh from generate_sg.py to stylize_sg.py, rearrange_sg.py or complete_sg.py for "stylization", "rearrangement" or "completion" downstream tasks, respectively.

Please refer to these python files for more detailed arguments and usage.

๐Ÿ˜Š Acknowledgement

We would like to thank the authors of ATISS, DiffuScene, OpenShape, NAP and CLIPLayout for their great work and generously providing source codes, which inspired our work and helped us a lot in the implementation.

๐Ÿ“š Citation

If you find our work helpful, please consider citing:

@inproceedings{lin2024instructscene,
  title={InstructScene: Instruction-Driven 3D Indoor Scene Synthesis with Semantic Graph Prior},
  author={Chenguo Lin and Yadong Mu},
  booktitle={International Conference on Learning Representations (ICLR)},
  year={2024}
}

instructscene's People

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.