Giter Club home page Giter Club logo

dst3d's Introduction

Generating Images with 3D Annotations Using Diffusion Models

LICENSE Python PyTorch

[PDF] [Project Page]

This repository contains the PyTorch implementation for the ICLR 2024 Spotlight Paper "Generating Images with 3D Annotations Using Diffusion Models" by the following authors.
Wufei Ma*, Qihao Liu*, Jiahao Wang*, Angtian Wang, Xiaoding Yuan, Yi Zhang, Zihao Xiao, Beijia Lu, Ruxiao Duan, Yongrui Qi, Adam Kortylewski, Yaoyao Liuโœ‰, Alan Yuille

Overview

We present 3D Diffusion Style Transfer (3D-DST), a simple and effective approach to generate images with 3D annotations using diffusion models. Our method exploits ControlNet, which extends diffusion models by using visual prompts in addition to text prompts. We render 3D CAD models from a variety of poses and viewing directions, compute the edge maps of the rendered images, and use these edge maps as visual prompts to generate realistic images. With explicit 3D geometry control, we can easily change the 3D structures of the objects in the generated images and obtain ground-truth 3D annotations automatically. Experiments on image classification, 3D pose estimation, and 3D object detection show that with 3D-DST data we can effectively improve the models' performance in both in-distribution and out-of-ditribution settings.

Besides code to reproduce our data generation pipeline, we also release the following data to support other research projects in the community:

  1. Aligned CAD models for all 1000 classes in ImageNet-1k. See ccvl/3D-DST-models.
  2. LLM-generated captions for all 1000 classes in ImageNet-1k. See ccvl/3D-DST-captions.
  3. 3D-DST data for all 1000 classes in ImageNet-1k. See ccvl/3D-DST-data.

Installation

Please check INSTALL.md for installation instructions.

Quick Start

  1. Rendering images with Blender.

    python3 scripts/render_synthetic_data.py \
        --data_path DST3D/train \
        --model_path /path/to/all_dst_models \
        --shapenet_path /path/to/ShapeNetCore.v2 \
        --objaverse_path /path/to/objaverse_models \
        --omniobject3d_path /path/to/OpenXD-OmniObject3D-New \
        --synsets n02690373 \
        --workers 48 \
        --num_samples 2500 \
        --disable_random_distance
  2. DST image generation with visual prompts and LLM prompts.

    CUDA_VISIBLE_DEVICES=0 python3 scripts/controllable_generation.py \
        --model_name control_v11p_sd15_canny \
        --data_path DST3D \
        --data_name image_dst \
        --synsets n02690373
  3. Run K-fold Consistency Filter (KCF) on the generated images. The KCF code trains a ResNet50 pose estimation model and produces a validation loss for each sample. The results are saved in a JSON file in --output_dir.

    CUDA_VISIBLE_DEVICES=0 python3 scripts/run_kcf_filter.py \
        --data_path DST3D/train \
        --category n02690373 \
        --output_dir exp/kcf_n02690373

Released 3D-DST Data

We release our generated 3D-DST data for all 1000 classes in ImageNet-1k here. We also provide the DeiT-small models trained on our 3D-DST data.

Image Classification on ImageNet-200.

model data acc@1 url
DeiT-small baseline 81.5 checkpoint & log
DeiT-small with 3D-DST 84.8 checkpoint & log

Image Classification on ImageNet-1k. We provide baseline results on ImageNet-1k with 3D-DST pretraining.

model data acc@1
DeiT-small baseline 80.1
DeiT-small with 3D-DST 81.1

License

This project is released under the MIT license. Please see the LICENSE file for more information.

Citation

If you find this repository helpful, please consider citing:

@inproceedings{ma2024generating,
title={Generating Images with 3D Annotations Using Diffusion Models},
author={Wufei Ma and Qihao Liu and Jiahao Wang and Angtian Wang and Xiaoding Yuan and Yi Zhang and Zihao Xiao and Guofeng Zhang and Beijia Lu and Ruxiao Duan and Yongrui Qi and Adam Kortylewski and Yaoyao Liu and Alan Yuille},
booktitle={The Twelfth International Conference on Learning Representations},
year={2024},
url={https://openreview.net/forum?id=XlkN11Xj6J}
}

dst3d's People

Contributors

wufeim avatar yaoyao-liu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.