Giter Club home page Giter Club logo

customize-it-3d's Introduction

Customize-It-3D: High-Quality 3D Creation from A Single Image Using Subject-Specific Knowledge Prior

If you like our project, please give us a star ⭐ on GitHub for latest update.

Nan Huang, Ting Zhang, Yuhui Yuan, Dong Chen, Shanghang Zhang

arXiv Project page

Pipeline. We propose a two-stage framework Customize-It-3D for high-quality 3D creation from a reference image with subject-specific diffusion prior. We first cultivate subject-specific knowledge prior using multi-modal information to effectively constrain the coherency of 3D object with respect to a particular identity. At the coarse stage, we optimize a NeRF for reconstructing the geometry of the reference image in a shading-aware manner. We further build point clouds with enhanced texture from the coarse stage, and jointly optimize the texture of invisible points and a learnable deferred renderer to generate realistic and view-consistent textures.

Demo of 360° geometry

News

  • [2323/12/22] Code is available at GitHub!
  • [2323/12/20] Paper is available at ArXiv!
  • [2023/12/14] Our code and paper will open soon.

Install

We only test on Ubuntu 22 with torch 2.0.1 & CUDA 11.7 on an A100. Make sure git, wget, Eigen are installed.

conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.7 -c pytorch -c nvidia
apt update && apt upgrade
apt install git wget libeigen3-dev -y

Install Environment

Install with pip:

    pip install git+https://github.com/NVlabs/tiny-cuda-nn/#subdirectory=bindings/torch
    pip install git+https://github.com/facebookresearch/pytorch3d.git
    pip install git+https://github.com/S-aiueo32/contextual_loss_pytorch.git@4585061   
    pip install ./raymarching
    pip install git+https://github.com/facebookresearch/segment-anything.git

Other dependencies:

    pip install -r requirements.txt 

Download pre-trained models

  • Zero-1-to-3 for 3D diffusion prior. We use zero123-xl.ckpt by default, reimplementation borrowed from Stable Diffusion repo, and is available in nerf/zero123.py.

    cd pretrained/zero123
    wget https://zero123.cs.columbia.edu/assets/zero123-xl.ckpt
    cd ../../
  • MiDaS for depth estimation. We use dpt_beit_large_512.pt. Put it in folder pretrained/midas/

    mkdir -p pretrained/midas
    cd pretrained/midas
    wget https://github.com/isl-org/MiDaS/releases/download/v3_1/dpt_beit_large_512.pt
    cd ../../
  • Omnidata for normal estimation.

    mkdir pretrained/omnidata
    cd pretrained/omnidata
    # assume gdown is installed
    gdown '1wNxVO4vVbDEMEpnAi_jwQObf2MFodcBR&confirm=t' 
    cd ../../
  • SAM to segement foreground mask of an object.

    cd mask
    wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth
    cd ..

Usage

Preprocess

In the ./data directory, we have included some preprocessed files that already extracted multi-modal images. If you want to test your own example, follow the following preprocessing steps and follow the file structure in ./data. Takes seconds.

Step1: Extract multi-modal images [Optional]

You can preprocess single image.

python preprocess_image.py --path /path/to/image 

You can also preprocess images in list or directory.

bash scripts/preprocess_list.sh $GPU_IDX
bash scripts/preprocess_folder.sh $GPU_IDX /path/to/dir

Step 2: Fine-tune multi-modal DreamBooth

Customize-It-3D uses the default DreamBooth from diffuers. To finetune multi-modal DreamBooth:

bash dreambooth/dreambooth.sh $GPU_IDX $INSTANCE_DIR $OUTPUT_DIR $CLASS_NAME $CLASS_DIR

$INSTANCE_DIR is the path to directory containing your own image.

$OUTPUT_DIR is the path where to save the trained model.

$CLASS_NAME is the text prompt describing the class of the generated sample images.

$CLASS_DIR is the path to a folder containing the generated class sample images.

For example:

bash dreambooth/dreambooth.sh 0 data/horse out/horse horse images_gen/horse

Don't forget the path of your trained model (in ./out directory).

Run

Run Customize-It-3D for a single example

We use progressive training strategy to generate a full 360° 3D geometry.

bash scripts/run.sh $GPU_IDX $WORK_SPACE $REF_PATH  $Enable_First_Stage $Enable_Second_Stage $TRAINED_MODEL_PATH $CLASS_NAME {More_Arugments}

As an example, run Customize-It-3D in the horse example whose trained multi-modal DreamBooth model is out/horse using both stages in GPU 0 and set the workspace and class name as horse, by the following command:

bash scripts/run.sh 0 horse data/horse/rgba/rgba.png 1 1 out/horse horse

Run Customize-It-3D for a group of examples

  • Run all examples in a folder, check the scripts scripts/run_folder.sh
  • Run all examples in a given list, check the scripts scripts/run_list.sh

Bibtex

If you find this work useful, a citation will be appreciated via:


    @misc{huang2023customizeit3d,
      title={Customize-It-3D: High-Quality 3D Creation from A Single Image Using Subject-Specific Knowledge Prior}, 
      author={Nan Huang and Ting Zhang and Yuhui Yuan and Dong Chen and Shanghang Zhang},
      year={2023},
      eprint={2312.11535},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
    }

Acknowledgments

This code borrows heavily from Stable-Dreamfusion, many thanks to the author.

customize-it-3d's People

Contributors

nnanhuang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.