pku-yuangroup / envision3d Goto Github PK

Envision3D: One Image to 3D with Anchor Views Interpolation

Python 100.00%

envision3d's Introduction

Envision3D: One Image to 3D with Anchor Views Interpolation, ArXiv, Project Page, Model Weights

Yatian Pang, Tanghui Jia, Yujun Shi, Zhenyu Tang, Junwu Zhang, Xinhua Cheng, Xing Zhou, Francis E.H. Tay, Li Yuan

TL;DR

We propose a novel cascade diffusion framework to efficiently generate dense(32) multi-view consistent images and extract high-quality 3D content. The cascade diffusion framework inference only takes less than 12GB VRAM.

Abstract

We present Envision3D, a novel method for efficiently generating high-quality 3D content from a single image. Recent methods that extract 3D content from multi-view images generated by diffusion models show great potential. However, it is still challenging for diffusion models to generate dense multi-view consistent images, which is crucial for the quality of 3D content extraction. To address this issue, we propose a novel cascade diffusion framework, which decomposes the challenging dense views generation task into two tractable stages, namely anchor views generation and anchor views interpolation. In the first stage, we train the image diffusion model to generate global consistent anchor views conditioning on image-normal pairs. Subsequently, leveraging our video diffusion model fine-tuned on consecutive multi-view images, we conduct interpolation on the previous anchor views to generate extra dense views. This framework yields dense, multi-view consistent images, providing comprehensive 3D information. To further enhance the overall generation quality, we introduce a coarse-to-fine sampling strategy for the reconstruction algorithm to robustly extract textured meshes from the generated dense images. Extensive experiments demonstrate that our method is capable of generating high-quality 3D content in terms of texture and geometry, surpassing previous image-to-3D baseline methods.

Set up

pip install -r req.txt
pip install carvekit --no-deps

Inference

1. Download model checkpoints

Download our pre-trained model checkpoints from here.

Download the image normal estimation model omnidata_dpt_normal_v2.ckpt from here.

Place all of them under pretrained_models dir.

2. Pre-process input image

Run the following command to resize the input image and predict the normal map.

CUDA_VISIBLE_DEVICES=0 python process_img.py example_imgs/pumpkin.png processed_imgs/ --size 256 --recenter

3. Inference stage I and stage II

Modify config files in the cfgs directory and run the following command to inference.

CUDA_VISIBLE_DEVICES=0 python gen_s1.py --config cfgs/s1.yaml  validation_dataset.filepaths=['pumpkin.png'] validation_dataset.crop_size=224
CUDA_VISIBLE_DEVICES=0 python gen_s2.py --config cfgs/s2.yaml  validation_dataset.scene=pumpkin

4. Textured mesh extraction

After getting 32 views of images, first set the correct path to the output images and then run the following command for 3D content extraction.

cd instant-nsr-pl/
python launch.py --config configs/neuralangelo-pinhole-wmask-opt.yaml --gpu 0 --train dataset.scene=pumpkin

Results

Work in progress ...

Inference code
Checkpoints
Instructions
Training code

Acknowledgements

We thank the authors of Wonder3D, Stable Video Diffusion, omnidata, Diffusers and AnimateAnything for their great works and open-source codes.

Citation

@misc{pang2024envision3d,
      title={Envision3D: One Image to 3D with Anchor Views Interpolation}, 
      author={Yatian Pang and Tanghui Jia and Yujun Shi and Zhenyu Tang and Junwu Zhang and Xinhua Cheng and Xing Zhou and Francis E. H. Tay and Li Yuan},
      year={2024},
      eprint={2403.08902},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

envision3d's People

Contributors

Stargazers

Watchers

Forkers

pang-yatian camenduru jclarkk ezhangle techthiyanes paperwave frozenmafia

envision3d's Issues

非常棒的工作。有个问题请教一下。

关于Evaluation Metrics如何计算的。请问PSNR、SSIM等的预测图和GT图是如何获取的呢？预测图是从3d模型中特定角度渲染的吗？可否共享一下这部分的代码？万分感激。

NotImplementedError: No operator found for `memory_efficient_attention_forward` with inputs:

你好，我在运行阶段一的代码的时候报出来这个警告，请问代码中的注意力部分是不是精读太高了，还是我这边的运行环境和您的不太一样，这个代码是不是需要A100才能跑起来呢？想了解一下您在跑这个代码的时候环境是怎么样的。
Traceback (most recent call last):
File "/home/yw/yw2/LZH/PaperCode/Envision3D/Envision3D-main/gen_s1.py", line 230, in
main(cfg)
File "/home/yw/yw2/LZH/PaperCode/Envision3D/Envision3D-main/gen_s1.py", line 177, in main
pipeline = load_envision3d_pipeline(cfg)
File "/home/yw/yw2/LZH/PaperCode/Envision3D/Envision3D-main/gen_s1.py", line 161, in load_envision3d_pipeline
pipeline.unet.enable_xformers_memory_efficient_attention()
File "/home/yw/anaconda3/envs/InstantMesh/lib/python3.10/site-packages/diffusers/models/modeling_utils.py", line 299, in enable_xformers_memory_efficient_attention
self.set_use_memory_efficient_attention_xformers(True, attention_op)
File "/home/yw/anaconda3/envs/InstantMesh/lib/python3.10/site-packages/diffusers/models/modeling_utils.py", line 263, in set_use_memory_efficient_attention_xformers
fn_recursive_set_mem_eff(module)
File "/home/yw/anaconda3/envs/InstantMesh/lib/python3.10/site-packages/diffusers/models/modeling_utils.py", line 259, in fn_recursive_set_mem_eff
fn_recursive_set_mem_eff(child)
File "/home/yw/anaconda3/envs/InstantMesh/lib/python3.10/site-packages/diffusers/models/modeling_utils.py", line 259, in fn_recursive_set_mem_eff
fn_recursive_set_mem_eff(child)
File "/home/yw/anaconda3/envs/InstantMesh/lib/python3.10/site-packages/diffusers/models/modeling_utils.py", line 259, in fn_recursive_set_mem_eff
fn_recursive_set_mem_eff(child)
File "/home/yw/anaconda3/envs/InstantMesh/lib/python3.10/site-packages/diffusers/models/modeling_utils.py", line 256, in fn_recursive_set_mem_eff
module.set_use_memory_efficient_attention_xformers(valid, attention_op)
File "/home/yw/anaconda3/envs/InstantMesh/lib/python3.10/site-packages/diffusers/models/modeling_utils.py", line 263, in set_use_memory_efficient_attention_xformers
fn_recursive_set_mem_eff(module)
File "/home/yw/anaconda3/envs/InstantMesh/lib/python3.10/site-packages/diffusers/models/modeling_utils.py", line 259, in fn_recursive_set_mem_eff
fn_recursive_set_mem_eff(child)
File "/home/yw/anaconda3/envs/InstantMesh/lib/python3.10/site-packages/diffusers/models/modeling_utils.py", line 259, in fn_recursive_set_mem_eff
fn_recursive_set_mem_eff(child)
File "/home/yw/anaconda3/envs/InstantMesh/lib/python3.10/site-packages/diffusers/models/modeling_utils.py", line 256, in fn_recursive_set_mem_eff
module.set_use_memory_efficient_attention_xformers(valid, attention_op)
File "/home/yw/anaconda3/envs/InstantMesh/lib/python3.10/site-packages/diffusers/models/attention_processor.py", line 268, in set_use_memory_efficient_attention_xformers
raise e
File "/home/yw/anaconda3/envs/InstantMesh/lib/python3.10/site-packages/diffusers/models/attention_processor.py", line 262, in set_use_memory_efficient_attention_xformers
_ = xformers.ops.memory_efficient_attention(
File "/home/yw/anaconda3/envs/InstantMesh/lib/python3.10/site-packages/xformers/ops/fmha/init.py", line 197, in memory_efficient_attention
return _memory_efficient_attention(
File "/home/yw/anaconda3/envs/InstantMesh/lib/python3.10/site-packages/xformers/ops/fmha/init.py", line 293, in _memory_efficient_attention
return _memory_efficient_attention_forward(
File "/home/yw/anaconda3/envs/InstantMesh/lib/python3.10/site-packages/xformers/ops/fmha/init.py", line 309, in _memory_efficient_attention_forward
op = _dispatch_fw(inp)
File "/home/yw/anaconda3/envs/InstantMesh/lib/python3.10/site-packages/xformers/ops/fmha/dispatch.py", line 95, in _dispatch_fw
return _run_priority_list(
File "/home/yw/anaconda3/envs/InstantMesh/lib/python3.10/site-packages/xformers/ops/fmha/dispatch.py", line 70, in _run_priority_list
raise NotImplementedError(msg)
NotImplementedError: No operator found for memory_efficient_attention_forward with inputs:
query : shape=(1, 2, 1, 40) (torch.float32)
key : shape=(1, 2, 1, 40) (torch.float32)
value : shape=(1, 2, 1, 40) (torch.float32)
attn_bias : <class 'NoneType'>
p : 0.0
flshattF is not supported because:
xFormers wasn't build with CUDA support
dtype=torch.float32 (supported: {torch.float16, torch.bfloat16})
tritonflashattF is not supported because:
xFormers wasn't build with CUDA support
dtype=torch.float32 (supported: {torch.float16, torch.bfloat16})
requires A100 GPU
cutlassF is not supported because:
xFormers wasn't build with CUDA support
smallkF is not supported because:
xFormers wasn't build with CUDA support
max(query.shape[-1] != value.shape[-1]) > 32
unsupported embed per head: 40

can not download carvekit pretrain model

Hi,I can't use huggingface to download the fba and trimap_generator, and I didn't find the 'model_path' arg in FBAMatting.I wonder is there any way to use the local model path of fba? Here is the detailed error:
ConnectionError: Exception caught when downloading model! Model name: fba_matting.pth. Exception: HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /leonelhs/FBA-Matting/resolve/d8a8fd9e7b3fa0d2f1506fe7242966b34381e9c5/FBA.pth (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fbc0c525600>: Failed to establish a new connection: [Errno 101] Network is unreachable')).

visualized the result

The experimental results are a collection of photos. How can they be visualized as 3D?

TypeError: Cannot handle this data type: (1, 1, 3), <u2

When preprocessing the input image, I get a PIL library error about the impossibility of converting numbers from one type to another (if I understand the principle of the program correctly).
I tried to work on both Windows 10 and WSL 2, but the result is the same. What am I doing wrong?

Full traceback:

python process_img.py example_imgs/bog.png processed_imgs/ --size 256 --recenter
[INFO] loading image example_imgs/bog.png...
example_imgs/bog.png
[INFO] normal estimation...
F:\Programs\Anaconda\envs\envision3d\lib\site-packages\timm\models\_factory.py:117: UserWarning: Mapping deprecated model name vit_base_resnet50_384 to current vit_base_r50_s16_384.orig_in21k_ft_in1k.
model = create_fn(
Traceback (most recent call last):
File "F:\Programs\Anaconda\envs\envision3d\lib\site-packages\PIL\Image.py", line 3130, in fromarray
  mode, rawmode = _fromarray_typemap[typekey]
KeyError: ((1, 1, 3), '<u2')

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "K:\programs\Envision3d\process_img.py", line 273, in <module>
  preprocess_single_image(opt.path, opt)
File "K:\programs\Envision3d\process_img.py", line 161, in preprocess_single_image
  normal = dpt_normal_model(image)[0].transpose(1, 2, 0)
File "F:\Programs\Anaconda\envs\envision3d\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
  return func(*args, **kwargs)
File "K:\programs\Envision3d\process_img.py", line 100, in __call__
  image = Image.fromarray(image)
File "F:\Programs\Anaconda\envs\envision3d\lib\site-packages\PIL\Image.py", line 3134, in fromarray
  raise TypeError(msg) from e
TypeError: Cannot handle this data type: (1, 1, 3), <u2

Cannot load pretrained model to run the inference s1_pipe

I encountered an issue while attempting to load a pre-trained model using the from_pretrained method in the transformers library. The error message indicates a discrepancy in the shape of the class_embedding tensor between the pre-trained model and the code. Specifically, the model expects a tensor of shape [1280] for class_embedding, while the code expects [1024]

CUDA_VISIBLE_DEVICES=0 python gen_s1.py --config cfgs/s1.yaml  validation_dataset.filepaths=['pumpkin.png'] validation_dataset.crop_size=224

/home/thaoanh/anaconda3/envs/wonder3dd/lib/python3.10/site-packages/diffusers/utils/outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  torch.utils._pytree._register_pytree_node(
{'pretrained_model_name_or_path': '/home/thaoanh/workspace/Envision3D/ckpts/s1_pipe', 'revision': None, 'validation_dataset': {'root_dir': './example_imgs/', 'bg_color': 'white', 'crop_size': 224, 'img_wh': [256, 256], 'filepaths': ['pumpkin.png']}, 'seed': 43, 'validation_batch_size': 1, 'enable_xformers_memory_efficient_attention': True, 'dataloader_num_workers': 64, 'save_dir': './img_outputs/'}
Loading pipeline components...:  40%|████████████████████████████████████████████▊                                                                   | 2/5 [00:00<00:00, 27.49it/s]
Traceback (most recent call last):
  File "/home/thaoanh/workspace/Envision3D/gen_s1.py", line 230, in <module>
    main(cfg)
  File "/home/thaoanh/workspace/Envision3D/gen_s1.py", line 177, in main
    pipeline = load_envision3d_pipeline(cfg)
  File "/home/thaoanh/workspace/Envision3D/gen_s1.py", line 154, in load_envision3d_pipeline
    pipeline = MVDiffusionRefImagePipeline.from_pretrained(
  File "/home/thaoanh/anaconda3/envs/wonder3dd/lib/python3.10/site-packages/diffusers/pipelines/pipeline_utils.py", line 1265, in from_pretrained
    loaded_sub_model = load_sub_model(
  File "/home/thaoanh/anaconda3/envs/wonder3dd/lib/python3.10/site-packages/diffusers/pipelines/pipeline_utils.py", line 520, in load_sub_model
    loaded_sub_model = load_method(os.path.join(cached_folder, name), **loading_kwargs)
  File "/home/thaoanh/anaconda3/envs/wonder3dd/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3677, in from_pretrained
    ) = cls._load_pretrained_model(
  File "/home/thaoanh/anaconda3/envs/wonder3dd/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4104, in _load_pretrained_model
    new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
  File "/home/thaoanh/anaconda3/envs/wonder3dd/lib/python3.10/site-packages/transformers/modeling_utils.py", line 886, in _load_state_dict_into_meta_model
    set_module_tensor_to_device(model, param_name, param_device, **set_module_kwargs)
  File "/home/thaoanh/anaconda3/envs/wonder3dd/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 358, in set_module_tensor_to_device
    raise ValueError(
ValueError: Trying to set a tensor of shape torch.Size([1280]) in "class_embedding" (which has shape torch.Size([1024])), this look incorrect
```.`