Giter Club home page Giter Club logo

idea23d's Introduction

Idea-2-3D: Collaborative LMM Agents Enable 3D Model Generation from Interleaved Multimodal Inputs


Junhao Chen, Xiang Li, Xiaojun Ye, Chao Li, Zhaoxin Fan, Hao Zhao


✨Introduction

Based on the LMM we developed Idea23D, a multimodal iterative self-refinement system that enhances any T2I model for automatic 3D model design and generation, enabling various new image creation functionalities togther with better visual qualities while understanding high level multimodal inputs.

idea23d

📔Prerequisites:

🛠Run

❗If different modules are used, install the corresponding dependency packages.

The code we have given to run locally uses llava-1.6, SD-XL and TripoSR. so requirements-local.txt is following that.

It's driven by GPT4V, SD-XL(replicate), and TripoSR if you're using colab for testing, it uses this requirements-colab.txt.

Colab

Open In Colab

Offline

pip install -r requirements-local.txt

Then change the path to your path in the "Initialize LMM, T2I, I23D" section of ipynb.

https://huggingface.co/llava-hf/llava-v1.6-34b-hf
https://huggingface.co/stabilityai/TripoSR
https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0
https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-1.0

This section in ipynb:

# init LMM,T2I,I23D
log('loading lmm...')

# lmm = lmm_gpt4v('sk-your open ai key')
lmm = lmm_llava_34b(model_path = "path_to_your/llava-v1.6-34b-hf", gpuid = 5)
# lmm = lmm_llava_7b(model_path = "path_to_your/llava-v1.6-mistral-7b-hf", gpuid = 2)

log('loading t2i...')
# t2i = text2img_sdxl_replicate(replicate_key='r8_ZCtxxxxxxxxxxxxxx')
t2i = text2img_sdxl(sdxl_base_path='path_to_your/stable-diffusion-xl-base-1.0', 
                    sdxl_refiner_path='path_to_your/stable-diffusion-xl-refiner-1.0', 
                    gpuid=2)

log('loading i23d...')
i23d = img23d_TripoSR(model_path = 'path_to_your/TripoSR' ,gpuid=2)
log('loading finish.')

open Idea23D/idea23d_pipeline.ipynb, Explore freely in the notebook ~

🧐Tips

Using GPT4V, SD-XL or DALL·E, TripoSR as LMM was able to get the best results so far. The effects in the paper were obtained using Zero123, so they are inferior compared to TripoSR.

If you don't have access to GPT4V you can use Qwen-VL or LLaVA, if you use LLaVA it is recommended to use the llava-v1.6-34b model. Although we gave a pipeline built with llava-v1.6-mistral-7b, it works poorly, while llava-v1.6-34b can correctly fulfill user commands.

🗓ToDO List

✅1. Release offline version of Idea23D implementation (llava-1.6-34b, SD-XL, TripoSR)

✅2. Release online running version of Idea23D implementation (GPT4-V, SD-XL, TripoSR)

🔘3. Release complete rendering script with 3d model input support. we have encountered an issue where we can't render vertex shaded objs, only objs with texture maps. if you know how to handle this please contact us. Will release the rendering part of the code after resolving this issue.

🔘4. Components supported by release: Qwen-VL, Zero123, DALL-E, Wonder3D, Stable Zero123, Deepfloyd IF. The release date for the complete set of all components will be delayed due to ongoing follow-up work.

📜Cite

@article{chen2024idea23d,
  title={Idea-2-3D: Collaborative LMM Agents Enable 3D Model Generation from Interleaved Multimodal Inputs}, 
  author={Junhao Chen and Xiang Li and Xiaojun Ye and Chao Li and Zhaoxin Fan and Hao Zhao},
  year={2024},
  eprint={2404.04363},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

🧰Acknowledgement

We have intensively borrow codes from the following repositories. Many thanks to the authors for sharing their codes.

Qwen-VL, LLaVA, TripoSR, Zero123, Stable Zero123, Wonder3D, SD-XL, Deepfloyd IF

⭐️ Star History

Star History Chart

idea23d's People

Contributors

yisuanwang avatar

Stargazers

stevensunzh avatar Jing Tang avatar Zhaoxin Fan avatar Zhegong Shangguan avatar Guiyu Zhang  avatar  avatar Niangao avatar  avatar Ye Fang avatar inFinith avatar Weize Li avatar Drewvv avatar  avatar Armando Teles Fortes avatar Zhijie Yan avatar Lihan Jiang avatar Huan-ang Gao avatar Shaocong Xu avatar Song Xiaowei avatar ZhenXin Zhu avatar Gasai Yuno avatar Mingkang Xiong avatar DoBetter avatar  avatar Jingnan Gao avatar  avatar Lu Ming avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

mark-jeong

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.