Giter Club home page Giter Club logo

custom-diffusion360's Introduction

Custom Diffusion 360

customdiffusion360.mp4

Custom Diffusion 360 allows you to control the camera viewpoint of the custom object in generated images by text-to-image diffusion models, such as Stable Diffusion. Given a 360-degree multiview dataset (~50 images), we fine-tune FeatureNeRF blocks in the intermediate feature space of the diffusion model to condition the generation on a target camera pose.

Customizing Text-to-Image Diffusion with Camera Viewpoint Control


Results

All of our results are based on the SDXL model. We customize the model on various categories of multiview images, e.g., car, teddybear, chair, toy, motorcycle. For more generations and comparisons with baselines, please refer to our webpage.

Comparison to baselines

Generations with different target camera pose

Method Details

Given multi-view images of an object with its camera pose, our method customizes a text-to-image diffusion model with that concept with an additional condition of target camera pose. We modify a subset of transformer layers to be pose-conditioned. This is done by adding a new FeatureNeRF block in intermediate feature space of the transformer layer. We finetune the new weights with the multiview dataset while keeping pre-trained model weights frozen. Similar to previous model customization methods, we add a new modifier token V* in front of the category name, e.g., V* car.

Getting Started

git clone https://github.com/customdiffusion360/custom-diffusion360.git
cd custom-diffusion360
conda create -n pose python=3.8 
conda activate pose
pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt

We also use pytorch3D in our code. Please look at the instructions to install that here. Or you can follow the below steps to install from source:

conda install -c conda-forge cudatoolkit-dev -y
export CUDA_HOME=$CONDA_PREFIX/pkgs/cuda-toolkit/"
pip install "git+https://github.com/facebookresearch/pytorch3d.git@stable"

Download the stable-diffusion-xl model checkpoint:

mkdir pretrained-models
cd pretrained-models
wget https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/resolve/main/sd_xl_base_1.0.safetensors
wget https://huggingface.co/stabilityai/sdxl-vae/resolve/main/sdxl_vae.safetensors

Inference with provided models

Download pretrained models:

gdown 1LM3Yc7gYXuNmFwr0s1Z-fnH0Ik8ttY8k -O pretrained-models/car0.tar
tar -xvf pretrained-models/car0.tar -C pretrained-models/

We provide all customized models here

Sample images:

python sample.py --custom_model_dir pretrained-models/car0 --output_dir outputs --prompt "a <new1> car beside a field of blooming sunflowers." 

Training

Dataset:

We share the 14 concepts (part of CO3Dv2 and NAVI) that we used in our paper for easy experimentation. The datasets are redistributed under the same licenses as the original works.

gdown 1GRnkm4xp89bnYAPnp01UMVlCbmdR7SeG
tar -xvzf  data.tar.gz

Train:

python main.py --base configs/train_co3d_concept.yaml --name car0 --resume_from_checkpoint_custom  pretrained-models/sd_xl_base_1.0.safetensors --no_date  --set_from_main --data_category car  --data_single_id 0

Your own multi-view images + Colmap: to be released soon.

Evaluation: to be released

Referenced Github repos

Thanks to the following for releasing their code. Our code builds upon these.

Stable Diffusion-XL Relpose-plus-plus GBT

custom-diffusion360's People

Contributors

customdiffusion360 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.