Giter Club home page Giter Club logo

anydoor's Introduction

AnyDoor: Zero-shot Object-level Image Customization

Xi Chen · Lianghua Huang · Yu Liu · Yujun Shen · Deli Zhao · Hengshuang Zhao

Paper PDF Project Page
The University of Hong Kong   |   Alibaba Group |   Ant Group

News

  • [2023.12.17] Release train & inference & demo code, and pretrained checkpoint.
  • [2023.12.24] 🔥 Support online demo on ModelScope and HuggingFace.
  • [Soon] Release the new version paper.
  • [On-going] Scale-up the training data and release stronger models as the foundaition model for downstream region-to-region generation tasks.
  • [On-going] Release specific-designed models for downstream tasks like virtual tryon, face swap, text and logo transfer, etc.

Installation

Install with conda:

conda env create -f environment.yaml
conda activate anydoor

or pip:

pip install -r requirements.txt

Additionally, for training, you need to install panopticapi, pycocotools, and lvis-api.

pip install git+https://github.com/cocodataset/panopticapi.git

pip install pycocotools -i https://pypi.douban.com/simple

pip install lvis

Download Checkpoints

Download AnyDoor checkpoint:

Note: We include all the optimizer params for Adam, so the checkpoint is big. You could only keep the "state_dict" to make it much smaller.

Download DINOv2 checkpoint and revise /configs/anydoor.yaml for the path (line 83)

Download Stable Diffusion V2.1 if you want to train from scratch.

Inference

We provide inference code in run_inference.py (from Line 222 - ) for both inference single image and inference a dataset (VITON-HD Test). You should modify the data path and run the following code. The generated results are provided in examples/TestDreamBooth/GEN for single image, and VITONGEN for VITON-HD Test.

python run_inference.py

The inferenced results on VITON-Test would be like [garment, ground truth, generation].

Noticing that AnyDoor does not contain any specific design/tuning for tryon, we think it would be helpful to add skeleton infos or warped garment, and tune on tryon data to make it better :)

Our evaluation data for DreamBooth an COCOEE coud be downloaded at Google Drive:

  • URL: [to be released]

Gradio demo

Currently, we suport local gradio demo. To launch it, you should firstly modify /configs/demo.yaml for the path to the pretrained model, and /configs/anydoor.yaml for the path to DINOv2(line 83).

Afterwards, run the script:

python run_gradio_demo.py

The gradio demo would look like the UI shown below:

  • 📢 This version requires users to annotate the mask of the target object, too coarse mask would influence the generation quality. We plan to add mask refine module or interactive segmentation modules in the demo.

  • 📢 We provide an segmentation module to refine the user annotated reference mask. We could chose to disable it by setting use_interactive_seg: False in /configs/demo.yaml.

Train

Prepare datasets

  • Download the datasets that present in /configs/datasets.yaml and modify the corresponding paths.
  • You could prepare you own datasets according to the formates of files in ./datasets.
  • If you use UVO dataset, you need to process the json following ./datasets/Preprocess/uvo_process.py
  • You could refer to run_dataset_debug.py to verify you data is correct.

Prepare initial weight

  • If your would like to train from scratch, convert the downloaded SD weights to control copy by running:
sh ./scripts/convert_weight.sh  

Start training

  • Modify the training hyper-parameters in run_train_anydoor.py Line 26-34 according to your training resources. We verify that using 2-A100 GPUs with batch accumulation=1 could get satisfactory results after 300,000 iterations.

  • Start training by executing:

sh ./scripts/train.sh  

🔥 Community Contributions

@bdsqlsz

Acknowledgements

This project is developped on the codebase of ControlNet. We appreciate this great work!

Citation

If you find this codebase useful for your research, please use the following entry.

@article{chen2023anydoor,
  title={Anydoor: Zero-shot object-level image customization},
  author={Chen, Xi and Huang, Lianghua and Liu, Yu and Shen, Yujun and Zhao, Deli and Zhao, Hengshuang},
  journal={arXiv preprint arXiv:2307.09481},
  year={2023}
}

anydoor's People

Contributors

axwaizee avatar lucataco avatar xavierchen34 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

anydoor's Issues

A small question about the requirement for Ram/VRam

Hello, Xavier, this is a brilliant contribution, thank you very much.

I encountered a small issue:
When running run_gradio_demo.py,
my process kept on being stopped without explicit error message (as below).

such as:

...
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads.
DiffusionWrapper has 865.91 M params.
making attention of type 'vanilla-xformers' with 512 in_channels
building MemoryEfficientAttnBlock with 512 in_channels...
Working with z of shape (1, 4, 16, 16) = 1024 dimensions.
making attention of type 'vanilla-xformers' with 512 in_channels
building MemoryEfficientAttnBlock with 512 in_channels...
^C

Is it potentially because my system ram is too low (12G)?
when monitoring the process, I noticed that during the cross-attention phase, my CPU ram is hitting its upper limit 12G, but the GPU Vram is almost nil.

Could you provide some suggestions for potential causes of such an issue? ❤

Question about details map and unet decoder

In the paper, "For the detail maps, we concatenate them with UNet decoder features at each resolution", but usually we add maps from controlnet and frozen the whole unet.
Why do you use "concatenate" and how do you initialize the weights?

How to get the high-frequancy map?

I want to know how to get the high-frequancy map. I have tried some methods but couldn't get the effect as shown in Figure 3. Can you release this part of the code?

HOW Many GPU FOR TEST DEMO

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 58.00 MiB. GPU 0 has a total capacty of 15.70 GiB of which 53.31 MiB is free. Including non-PyTorch memory, this process has 15.07 GiB memory in use. Of the allocated memory 14.65 GiB is allocated by PyTorch, and 196.92 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Where should I place the downloaded checkpoints?

Enhancement of Object Segmentation Precision for Improved Image Customisation

Dear AnyDoor Contributors,

I hope this message finds you well. I am writing to address a potential area for enhancement within the AnyDoor project, specifically pertaining to the precision of object segmentation during the image customisation process.

Upon utilising the AnyDoor model for various image editing tasks, I have observed that while the overall performance is commendable, there are instances where the object segmentation module could benefit from increased accuracy. This is particularly evident in images with intricate backgrounds or when objects have complex edges.

The current segmentation approach, although effective for a broad range of scenarios, occasionally struggles with fine details, leading to a less than optimal customisation outcome. This is especially true for tasks that demand high fidelity, such as virtual try-on applications where precise garment edges are crucial.

To illustrate, I have attached a series of images (see attachments) that demonstrate the challenges faced with the current segmentation algorithm. In these examples, you will notice that the segmentation mask does not fully capture the nuanced contours of the objects, resulting in a slight misalignment in the customised images.

I propose the exploration of advanced segmentation techniques, such as those employing deep learning architectures specifically designed for edge detection, or the integration of interactive segmentation tools that allow for user refinement. The latter could be particularly beneficial in providing end-users with the ability to make fine-grained adjustments, thereby enhancing the overall quality of the customisation.

I believe that addressing this aspect could significantly elevate the user experience and expand the practical applications of the AnyDoor model. I would be keen to hear your thoughts on this matter and discuss potential collaborative efforts to develop and integrate such improvements.

Thank you for your time and consideration. I look forward to your response and am excited about the prospect of contributing to the advancement of the AnyDoor project.

Best regards,
yihong1120

activity

{
"Name": "activity2",
"IsCustom": true,
"Description": "Can read and create storage accounts",
"Actions": [
"/read",
"Microsoft.Storage/storageAccounts/write",
"Microsoft.Resources/deployments/
"
],
"NotActions": [

],
"AssignableScopes": [
  "/subscriptions/{your-subscription-id}"
]

}

details about vitrual try-on

Hi, dear author, thanks for your greate work. I have some problems about virtual try-on. Could you share more information about data producer? How to get the person box on training and test? What is the input of network (not noise) ? Is it the portrait itself, or portrait with a box mask filled with gray pixel? You use two dataset trainning together, then test on the VitonHD-test, or use only VitonHD-train to train and test single? Trainning virtual try-on, not use other dataset? Thanks very much.

path/TryOn/VitonHD/test/cloth

DiffusionWrapper has 865.91 M params.
making attention of type 'vanilla-xformers' with 512 in_channels
building MemoryEfficientAttnBlock with 512 in_channels...
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
making attention of type 'vanilla-xformers' with 512 in_channels
building MemoryEfficientAttnBlock with 512 in_channels...
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is None and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 640, context_dim is 1024 and using 10 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is None and using 20 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 1280, context_dim is 1024 and using 20 heads.
Loaded model config from [configs/anydoor.yaml]
Loaded state_dict from [path/epoch=1-step=8687-pruned.ckpt]
Traceback (most recent call last):
File "F:\AI\AnyDoor\run_inference.py", line 266, in
image_names = os.listdir(test_dir)
FileNotFoundError: [WinError 3] 系统找不到指定的路径。: 'path/TryOn/VitonHD/test/cloth'

Ignored x_noisy in ControlNet?

I noticed here that the noisy sample is practically ignored in the control branch. What is the idea behind getting rid of that condition (instead of let's say adding or concating to the hint)?

Error while running the inference.py file

E:\AnyDoor>python run_inference.py
Traceback (most recent call last):
File "E:\AnyDoor\run_inference.py", line 4, in
import torch
ModuleNotFoundError: No module named 'torch'

E:\AnyDoor>pip install torch
ERROR: Could not find a version that satisfies the requirement torch (from versions: none)
ERROR: No matching distribution found for torch

[notice] A new release of pip is available: 23.2.1 -> 23.3.2
[notice] To update, run: python.exe -m pip install --upgrade pip

THIS IS WHAT I ENCOUNTERED EVEN AFTER TRYING MULTIPLE TIME.........................PLEASE HELP ME SOLVE THIS ISSUE

simply the links in download checkponts in readme.md.

There are a lot of links to various sites in this section. It could be just me but I think It would be nice if the links were added to point without URLs.

This point can also be written in a note rather then a point.
We include all the optimizer params for Adam, so the checkpoint is big. You could only keep the "state_dict" to make it much smaller.

image

which python should be used?

numpy==1.23.1

numpy/core/src/multiarray/scalartypes.c.src:2967:12: error: too few arguments to function ‘_Py_HashDouble’
2967 | return _Py_HashDouble((double) PyArrayScalar_VAL(obj, @name@));
......

Failed to build numpy.
This may related the the python version according to numpy/numpy#22520 (comment)

opencv_contrib_python

"ERROR: Ignored the following versions that require a different python version: 1.21.2 Requires-Python >=3.7,<3.11; 1.21.3 Requires-Python >=3.7,<3.11; 1.21.4 Requires-Python >=3.7,<3.11; 1.21.5 Requires-Python >=3.7,<3.11; 1.21.6 Requires-Python >=3.7,<3.11
ERROR: Could not find a version that satisfies the requirement opencv_contrib_python==4.3.0.36 (from versions: 3.4.11.45, 3.4.13.47, 3.4.14.51, 3.4.15.55,
3.4.16.59, 3.4.17.61, 3.4.17.63, 3.4.18.65, 4.4.0.46, 4.5.1.48, 4.5.2.52, 4.5.3.56, 4.5.4.58, 4.5.4.60, 4.5.5.62, 4.5.5.64, 4.6.0.66, 4.7.0.68, 4.7.0.72, 4.8.0.74, 4.8.0.76, 4.8.1.78)
ERROR: No matching distribution found for opencv_contrib_python==4.3.0.36"

What version should I change to?

Website Demo

I saw that your website says demo soon, and it would be cool to have a way to use this model without any setup. I propose an implementation like the one I proposed in issue #26, but in the browser so that this implementation of the model can be used directly in the AnyDoor website. A proposed fix would be either creating a js-driven webpage that talks to a python api, or uses a different mode of comunication to python, or using a python web framework. What do you think? I would be interested in collaborating on this issue, I can work on the interface if you want and just leave the python connection up to you or more, or maybe help with implementing a python webpage for it. Thoughts? ꒰ · ◡ · ꒱

Did you call validation_step during training?

Hi, thanks for your great work! I plan to use your model to train on custom dataset. However, when I trained it on a subset of your specified dataset, I did not see a validation procedure. Will the function validation_step be called? Moreover, it seems that you did not have a separate valid_dataloader because only one dataloader was passed into trainer.fit.

trainer.fit(model, dataloader)

Then how do you save the best model parameters which gave the lowerest loss during validation? Yes, I asked this question because I wanted to save the best model on validation dataset but did not make it yet:( I am not so familiar with pytorch lightning and the way controlnet should be trained, so correct me if I said something wrong:)

Btw, why did you repeat some datasets in the ConcatDataset
https://github.com/ali-vilab/AnyDoor/blob/ddcfbafb8fa4f27a2da705a3bcf5bfd2de4fbf98/run_train_anydoor.py#L64C3-L64C3
, but did not repeat image_data?

Google Colab

Has anyone managed to get this project running on colab ?
Here is my code for now:

!git clone https://github.com/ali-vilab/AnyDoor.git

%cd AnyDoor/

!pip install -r requirements.txt

I changed the opencv-contrib-python version to 4.6.0.66 as suggested here #31

But still have errors during installation.

Click to expand error log
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. 
This behaviour is the source of the following dependency conflicts.
ipython 7.34.0 requires jedi>=0.16, which is not installed.
lida 0.0.10 requires kaleido, which is not installed.
tensorflow 2.15.0 requires numpy<2.0.0,>=1.23.5, but you have numpy 1.23.1 which is incompatible.
tensorflow-probability 0.22.0 requires typing-extensions<4.6.0, but you have typing-extensions 4.9.0 
which is incompatible.
torchaudio 2.1.0+cu121 requires torch==2.1.0, but you have torch 2.0.0 which is incompatible.
torchdata 0.7.0 requires torch==2.1.0, but you have torch 2.0.0 which is incompatible.
torchtext 0.16.0 requires torch==2.1.0, but you have torch 2.0.0 which is incompatible.

Decrease inference time

Currently I'm testing your project on TryOn datasets and it executes inference_single_image() function for 20-40 seconds depending on the GPU I use.
What are the ways I can decrease the inference time?

How many GPUs were used?

The results look great. How many GPUs were used to train the model, and how long did the training take?

Virtual TryOn related questions

Hello,

Thanks for this amazing project. I've been trying Virtual Try-On. Few questions:

  1. What is the recommended size of input images ref image/mask and target image/mask for best results?
  2. I notice that sometimes face of the subject gets slightly altered even though it's not part of target masked image. Why would that happen?

Any other tips to get best results?

Thanks

about pose control

Hi, I saw that on the home page of GitHub, you posted the results about pose control, but I didn’t see this result in the paper. I would like to ask how you implemented this part specifically, and how I should test it.
Thank you very much !

setuptools - raise ValueError("path '%s' cannot be absolute" % pathname)

I've encountered the installation problem on share=1.0.4 module. With the suggested environment config. Installed setuptools=66.0.0 and wheel=0.41.2.
I don't want to alter any code from setuptools. But if it's the only way to do it, please suggests. Thanks.

  Building wheel for share (setup.py) ... error
  error: subprocess-exited-with-error

  × python setup.py bdist_wheel did not run successfully.
  │ exit code: 1
  ╰─> [40 lines of output]
      C:\Users\<my_user_name>\.conda\envs\anydoor\lib\site-packages\setuptools\command\install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
        warnings.warn(
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "C:\Users\<my_user_name>\AppData\Local\Temp\pip-install-xyoqj7k8\share_84a33f02528d4e36a4ca9998fc05ed01\setup.py", line 11, in <module>
          setup(
        File "C:\Users\<my_user_name>\.conda\envs\anydoor\lib\site-packages\setuptools\_distutils\core.py", line 185, in setup
          return run_commands(dist)
        File "C:\Users\<my_user_name>\.conda\envs\anydoor\lib\site-packages\setuptools\_distutils\core.py", line 201, in run_commands
          dist.run_commands()
        File "C:\Users\<my_user_name>\.conda\envs\anydoor\lib\site-packages\setuptools\_distutils\dist.py", line 969, in run_commands
          self.run_command(cmd)
        File "C:\Users\<my_user_name>\.conda\envs\anydoor\lib\site-packages\setuptools\dist.py", line 1208, in run_command
          super().run_command(command)
        File "C:\Users\<my_user_name>\.conda\envs\anydoor\lib\site-packages\setuptools\_distutils\dist.py", line 988, in run_command
          cmd_obj.run()
        File "C:\Users\<my_user_name>\.conda\envs\anydoor\lib\site-packages\wheel\bdist_wheel.py", line 399, in run
          self.run_command("install")
        File "C:\Users\<my_user_name>\.conda\envs\anydoor\lib\site-packages\setuptools\_distutils\cmd.py", line 318, in run_command
          self.distribution.run_command(command)
        File "C:\Users\<my_user_name>\.conda\envs\anydoor\lib\site-packages\setuptools\dist.py", line 1208, in run_command
          super().run_command(command)
        File "C:\Users\<my_user_name>\.conda\envs\anydoor\lib\site-packages\setuptools\_distutils\dist.py", line 988, in run_command
          cmd_obj.run()
        File "C:\Users\<my_user_name>\.conda\envs\anydoor\lib\site-packages\setuptools\command\install.py", line 68, in run
          return orig.install.run(self)
        File "C:\Users\<my_user_name>\.conda\envs\anydoor\lib\site-packages\setuptools\_distutils\command\install.py", line 709, in run
          self.run_command(cmd_name)
        File "C:\Users\<my_user_name>\.conda\envs\anydoor\lib\site-packages\setuptools\_distutils\cmd.py", line 318, in run_command
          self.distribution.run_command(command)
        File "C:\Users\<my_user_name>\.conda\envs\anydoor\lib\site-packages\setuptools\dist.py", line 1208, in run_command
          super().run_command(command)
        File "C:\Users\<my_user_name>\.conda\envs\anydoor\lib\site-packages\setuptools\_distutils\dist.py", line 988, in run_command
          cmd_obj.run()
        File "C:\Users\<my_user_name>\.conda\envs\anydoor\lib\site-packages\setuptools\_distutils\command\install_data.py", line 61, in run
          dir = convert_path(f[0])
        File "C:\Users\<my_user_name>\.conda\envs\anydoor\lib\site-packages\setuptools\_distutils\util.py", line 139, in convert_path
          raise ValueError("path '%s' cannot be absolute" % pathname)
      ValueError: path '/etc' cannot be absolute
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for share
  Running setup.py clean for share
Failed to build share
ERROR: Could not build wheels for share, which is required to install pyproject.toml-based projects

(anydoor) C:\Users\<my_user_name>\...\0001_Github_Downloads\AnyDoor - Image Customization, virtual try-on>pip list
Package       Version
------------- -------
cmake         3.28.1
ffmpeg-python 0.2.0
pip           23.3.1
setuptools    66.0.0
wheel         0.41.2

How DINOv2 encode an image as a global token 1526 dims and a patch token 256*1536

Hello! I think this paper is a amazing work , but I have a question about 3.1. Identity Feature Extraction.
In paper , you said that

We choose the currently strongest self-supervised model DINO-V2 [37] as the backbone of our ID extractor, which encodes image as a global token $T_{g}^{11536}$ , and patch tokens$T_{p}^{2561536}$ .

But in NINOv2 paper, Fast and memory-efficient attention. they said that

As a consequence, our ViT-g architecture slightly differs from the architecture proposed by Zhai et al. (2022) in order to maximize compute efficiency, and we use an embedding dimension of 1536 with 24 heads (64 dim/head), rather than 1408 with 16 heads (88 dim/head).

I am confused in fact 🤣 because I cant understand how to get patch token.DINO only encode a token why you can encode an image as so many tokens.This question may be a bit stupid, I hope you can help me answer it.

How to run inference with multiple GPU's

How would I go about running the inference script with multiple GPU's? Currently, it's maxing out my one A10 but have four, so want it to run in parallel, but unsure about how:-)

Conda ResolvePackageNotFound:

Hello there, I can't find a way to install all these libraries.

If I use conda:


**conda env create -f environment.yaml**
Collecting package metadata (repodata.json): done
Solving environment: failed

ResolvePackageNotFound:
  - tk==8.6.12=h1ccaba5_0
  - wheel==0.41.2=py38h06a4308_0
  - libffi==3.3=he6710b0_2
  - ld_impl_linux-64==2.38=h1181459_1
  - ca-certificates==2023.08.22=h06a4308_0
  - python==3.8.5=h7579374_1
  - libgomp==11.2.0=h1234567_1
  - libgcc-ng==11.2.0=h1234567_1
  - zlib==1.2.13=h5eee18b_0
  - sqlite==3.41.2=h5eee18b_0
  - xz==5.4.5=h5eee18b_0
  - openssl==1.1.1w=h7f8727e_0
  - libstdcxx-ng==11.2.0=h1234567_1
  - pip==23.3.1=py38h06a4308_0
  - _openmp_mutex==5.1=1_gnu
  - readline==8.2=h5eee18b_0
  - ncurses==6.4=h6a678d5_0

With pip things also start looking very bad :(
Do you have any suggestion for me?

Thanks,

Andrea

Implementation for object moving/swapping and multi-subject composition

The paper references additional features of the model, such as object moving/swapping and multi-subject composition.
However, only the task of teleporting a single object to a specific location is demonstrated in the pipeline.
I am curious how the model handles the implementation of other tasks.

Easy to use out of the box functionality

It would be cool if, after cloning the project and installing the dependencies there would be a python file that you can run and it would opened a GUI asking for a input image and a output one and then it would ask you to highlight the area in the output image where you want the main object of the input image to be placed at, this being done by dragging with the mouse on the picture, and to automatically run the model after and show the output with the possibility of saving. This would be a cool feature for people wanting to try the app out, and in general for use.

Details around the detail extractor

Could you please give a bit more details on the architecture of the detail extractor? What does it mean a "a ControlNet-style [60] UNet encoder"? What are the layers / feature sizes/dimensions?

Thank you!

invalid load key, 'v' when running run_inference.py

Traceback (most recent call last):
File "run_inference.py", line 28, in
model = create_model(model_config ).cpu()
File "/github/AnyDoor/cldm/model.py", line 26, in create_model
model = instantiate_from_config(config.model).cpu()
File "/github/AnyDoor/ldm/util.py", line 81, in instantiate_from_config
return get_obj_from_str(config["target"])(**config.get("params", dict()))
File "/github/AnyDoor/cldm/cldm.py", line 310, in init
super().init(*args, **kwargs)
File "/github/AnyDoor/ldm/models/diffusion/ddpm.py", line 565, in init
self.instantiate_cond_stage(cond_stage_config)
File "/github/AnyDoor/ldm/models/diffusion/ddpm.py", line 632, in instantiate_cond_stage
model = instantiate_from_config(config)
File "/github/AnyDoor/ldm/util.py", line 81, in instantiate_from_config
return get_obj_from_str(config["target"])(**config.get("params", dict()))
File "/github/AnyDoor/ldm/modules/encoders/modules.py", line 286, in init
state_dict = torch.load(DINOv2_weight_path)
File "/.conda/envs/anydoor/lib/python3.8/site-packages/torch/serialization.py", line 815, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "/.conda/envs/anydoor/lib/python3.8/site-packages/torch/serialization.py", line 1033, in _legacy_load
magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, 'v'.

Can anyone help me to solve the problem?

source

logging improved.
No module 'xformers'. Proceeding without it.
ControlLDM: Running in eps-prediction mode
DiffusionWrapper has 865.91 M params.
making attention of type 'vanilla' with 512 in_channels
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
making attention of type 'vanilla' with 512 in_channels
xFormers not available
xFormers not available
Loaded model config from [configs/anydoor.yaml]
Loaded state_dict from [E:\webui\换衣\AnyDoor\moxing\1.ckpt]
Traceback (most recent call last):
File "E:\webui\换衣\AnyDoor\run_gradio_demo.py", line 257, in
base = gr.Image(label="Background", source="upload", tool="sketch", type="pil", height=512, brush_color='#FFFFFF', mask_opacity=0.5)
File "E:\webui\换衣\AnyDoor\venv\lib\site-packages\gradio\component_meta.py", line 155, in wrapper
return fn(self, **kwargs)
TypeError: Image.init() got an unexpected keyword argument 'source'

Which is the correct DINOv2 checkpoint?

Hello,

This looks like an excellent piece of work - thank you for sharing.

May i clarify exactly which is the correct DINOv2 checkpoint to download? I thought it was the ViT-L/14 distilled but that throws an error regarding missing keys, so must not be the intended target.

Any and all assistance is appreciated.

All the best

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.