ziqihuangg / collaborative-diffusion Goto Github PK

[CVPR 2023] Collaborative Diffusion

Home Page: https://ziqihuangg.github.io/projects/collaborative-diffusion.html

License: Other

Python 100.00%

aigc diffusion-models face-editing face-generation gen-ai image-editing image-generation latent-diffusion-models multi-modality stable-diffusion

collaborative-diffusion's People

Contributors

Stargazers

Watchers

collaborative-diffusion's Issues

env error

I follow the turtors to create env,but when I run the code,the error accurs.
Traceback (most recent call last):
File "main.py", line 6, in
import pytorch_lightning as pl
File "/home/wpx/miniconda3/envs/codiff/lib/python3.8/site-packages/pytorch_lightning/init.py", line 20, in
from pytorch_lightning import metrics # noqa: E402
File "/home/wpx/miniconda3/envs/codiff/lib/python3.8/site-packages/pytorch_lightning/metrics/init.py", line 15, in
from pytorch_lightning.metrics.classification import ( # noqa: F401
File "/home/wpx/miniconda3/envs/codiff/lib/python3.8/site-packages/pytorch_lightning/metrics/classification/init.py", line 14, in
from pytorch_lightning.metrics.classification.accuracy import Accuracy # noqa: F401
File "/home/wpx/miniconda3/envs/codiff/lib/python3.8/site-packages/pytorch_lightning/metrics/classification/accuracy.py", line 16, in
from torchmetrics import Accuracy as _Accuracy
File "/home/wpx/miniconda3/envs/codiff/lib/python3.8/site-packages/torchmetrics/init.py", line 14, in
from torchmetrics import functional # noqa: E402
File "/home/wpx/miniconda3/envs/codiff/lib/python3.8/site-packages/torchmetrics/functional/init.py", line 68, in
from torchmetrics.functional.text.bert import bert_score
File "/home/wpx/miniconda3/envs/codiff/lib/python3.8/site-packages/torchmetrics/functional/text/bert.py", line 28, in
from transformers import AutoModel, AutoTokenizer
File "/home/wpx/miniconda3/envs/codiff/lib/python3.8/site-packages/transformers/init.py", line 43, in
from . import dependency_versions_check
File "/home/wpx/miniconda3/envs/codiff/lib/python3.8/site-packages/transformers/dependency_versions_check.py", line 41, in
require_version_core(deps[pkg])
File "/home/wpx/miniconda3/envs/codiff/lib/python3.8/site-packages/transformers/utils/versions.py", line 94, in require_version_core
return require_version(requirement, hint)
File "/home/wpx/miniconda3/envs/codiff/lib/python3.8/site-packages/transformers/utils/versions.py", line 85, in require_version
if want_ver is not None and not ops[op](version.parse(got_ver), version.parse(want_ver)):
File "/home/wpx/miniconda3/envs/codiff/lib/python3.8/site-packages/packaging/version.py", line 54, in parse
return Version(version)
File "/home/wpx/miniconda3/envs/codiff/lib/python3.8/site-packages/packaging/version.py", line 200, in init
raise InvalidVersion(f"Invalid version: '{version}'")
packaging.version.InvalidVersion: Invalid version: '0.10.1,<0.11'

FID

Hi, I want to ask how you calculate the FID. Generating 3000 image for the 3000 testing data and calculate the FID between these 3000 images and the training dataset or the whole dataset?

Why does the text file only contain beard and age?

https://drive.google.com/drive/folders/1j8gcV5UyGoc-ylMxc7tCEzftL46ySdFE

I want to generate some fake face of Asian people, i try to use set the input text of asian people to generate but not work

Which image shoulw I use in image editing

Hello Ziqi Huang, I am very interested in your work, thank you for sharing the codes of Collaborative Diffusion.
I am a little bit confused about the image editing results. I think 0C_interText_optDM_?_alpha is the generated result. But there are results from different alpha, In this case, I think the image name "0C_interText_optDM_0_alpha=-1.0" should be the best one. But if you automatically select an image which alpha is possibly the best?

the folder sketch in datasets is empty

the folder sketch in datasets is empty, where can i get the tensor file ?

no script for 256 models?

I couldn't find a way to run 256 models. is the model missing or something like that?

Text-to-Face.

Hi, great work. We find this code seems only work on your given text "This man has beard of medium length. He is in his thirties.". when i use "He doesn't have any bangs and has an extremely mild smile, no beard at all, and no glasses. this person is in the thirties.", the results are far from the given text. The results almost are female. So is there any bug here?

Collaborative_edit

Excellent work! I am very interested in Multi-Modal Collaborative Editing. I have a question: why do the results of Mask_edit and Text_edit show a significant difference in skin tone compared to the input image, while the result of Collaborative_edit has a skin tone very similar to the input image? I look forward to your response, thank you !

Inquiry about Training Time and RuntimeError in Diffuser Code

Hello,

Thank you for your nice job. I recently encountered an issue while running the training code for Diffuser on GitHub, and I would appreciate your guidance.

During training, I encountered the following error:

Diffusion/ldm/models/diffusion/ddpm_compose.py", line 1237, in p_losses
logvar_t = self.logvar[t].to(self.device)
RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (CPU)

I managed to resolve the issue by moving 't' to the CPU. However, I noticed that the training time for a single epoch is quite long, nearly an hour. I am unsure if this training time is normal or if my actions, such as training on the CPU, are causing the slowdown.

Could you please share your typical training time for a single epoch, so I can better understand if my situation is unusual? Additionally, if you suspect that there may be issues with my setup, I would greatly appreciate any suggestions or solutions you can offer.

Thank you very much for your assistance.

Script for calculating quantitative metrics

Do you have a script for calculating quantitative metrics? For example, FID and text/mask alignment. Thank you!

How to create my own dataset

How to create text、mask、sketch about my own image

condition resolution

Nice work! It seems the resolution of condition input is just 32, [19,1024] for mask and [1,1024] for sketch? Is it right?

About the training epoch of VAE model and uni-model for text to face

Hello! Based on the instructions you provided, I am trying to retrain the VAE model and uni-model for text to face on RTX3090, may I ask what is the epoch for training these two models respectively? Or are you judging whether to end the model training process based on the visualization results of reconstructions_gs-xxxxxx_e-xxxxxx_b-xxxxxxx.png and samples_gs-xxxxxx_e-xxxxxx_b-xxxxxxx.png?
Looking forward to your answer.

Image editing

Could you release the instruction for image editing?

about Training Time

It seems that max_epochs will default to 1000 for training single modal. And It costs about 25min for one epoch using 8 gpus. Does it mean it costs about 1000 * 25 / 60 / 24 ~ 17days for training text2img model?

about segmentation masks

Hello, I think your work is excellent and inspiring. May I ask if you have the code to convert facial images into segmentation masks? Thank you very much!

About training time

Hi,

Thank you for your nice work.

I would like to know the time required for their training, including vae, uni-modal and dynamic model.

I train the vae model for about 3 hours using 4 gpus. But I still find the sampled image is poor.

The recon image looks ok.

RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)

Hi, I'm running the training text-image model on an RTX3090 GPU with this error, "Hi, I'm running the training text-image model on an RTX3090 GPU with this error, "(cpu)"
python 3.11 pytorch:2.0.1 cuda:11.7

any script for single modal generation?

Hi, is there any script for single modal generation?

"pop from empty list" during the process of reproducing VAE training

Hello, sorry to bother you. I want to ask if it's normal to see "pop from empty list" during the process of reproducing VAE training. .

Empty Condition for training unmodal image generation

I am very interested in your work, thank you for sharing the codes of Collaborative Diffusion.
I have two questions, what is the empty condition when you train text/mask to image generation? If I understand correctly, for text-to-image generation, the empty condition is [""], as your code shows "uc = model.get_learned_conditioning(n_samples * [""])". However, it's the empty condition for training mask to image generation. Is it a zero mask? Thank you very much for your attention.

I'm so confused by https://github.com/ziqihuangg/Collaborative-Diffusion/blob/master/editing/collaborative_edit.py

what's optDM.ckpt?where to got it.

FileNotFoundError: [Errno 2] No such file or directory: './test_data/test_mask_edit/test/optDM.ckpt'

details about dynamic diffuser and Influence Functions

Hi, in your paper seems no details about how dynamic diffuser works and how to get Influence Functions. can you provide more details? or the codes related to this part. Thank you

about the cross attention in Dynamic Diffuser

Hi, you have done a nice work. I 'm interested in your work but I wonder why can we do cross attention between the feature of the x_t and the context.

context: mask -(resize)-> [bz, 32, 32] -(one hot)-> [bz, 19, 32, 32] -(flattern)-> [bz, 19, 1024] -(linear layer)-> [bz, 19, 640]

Collaborative-Diffusion/ldm/modules/attention.py

Line 170 in 8e2e6d3

def forward(self, x, context=None, mask=None):

def forward(self, x, context=None, mask=None):
        h = self.heads   # x.shape: [bz, 1024, 384], context.shape: [4, 19, 640]

        q = self.to_q(x)         # q.shape: [bz, 1024, 384]
        context = default(context, x)
        k = self.to_k(context)   # k.shape: [bz, 19, 384]
        v = self.to_v(context)   # v.shape: [bz, 19, 384]

        q, k, v = map(lambda t: rearrange(t, 'b n (h d) -> (b h) n d', h=h), (q, k, v)) # q.shape: [128, 1024, 12], k[128, 19, 12], v[128, 19, 12]

        sim = einsum('b i d, b j d -> b i j', q, k) * self.scale    # sim: [248, 1024, 19]

        if exists(mask):
            mask = rearrange(mask, 'b ... -> b (...)')
            max_neg_value = -torch.finfo(sim.dtype).max
            mask = repeat(mask, 'b j -> (b h) () j', h=h)
            sim.masked_fill_(~mask, max_neg_value)

        # attention, what we cannot get enough of
        attn = sim.softmax(dim=-1)         # attn: [128, 1024, 19]

        out = einsum('b i j, b j d -> b i d', attn, v)   # out.shape: [128, 1024, 12]
        out = rearrange(out, '(b h) n d -> b n (h d)', h=h)   # out.shape: [4, 1024, 384]
        return self.to_out(out)

So why can we do cross attention between x_t(exactly z_t) and the feature of the mask, what does it mean?
Thank you very much if you could solve my problem.

Pre-training model download from Google drive always fail

A fantastic work, very interested in it.

The pre-training model shared by Google drive is too unstable to download in China. I've been downloading all day and it always gets interrupted automatically and the download fails. Could you share another network disk, such as Baidu network disk.

is this project only for Linux？

because i find that scaNN package is only for Linux

about training time of all model

hi, could you please give the list of every model's trainning time? and the number of gpus you used.
thanks for your reply~

missing pretrained model 256_text.ckpt

thanks for your updates! amazing work!
But it seems there is no 256_text.ckpt in google driver.
Could you please upload one?

how to get the .pt files and what do they mean

Hello, I would like to ask where you obtained the. pt files in the mask and sketch folders or how you obtained them by processing other data, and what meanings they represent respectively. I seem unable to find this data from other sources

preprocessing code for sketch and mask

I am very interested in your work, thank you for sharing the codes of Collaborative Diffusion.

I want to generate images from my own sketch and mask. Could you share the preprocessing code for sketch and mask?

I noticed that sketches and masks you used are 19*1024 tensors, but the raw sketches and masks are 512x512 px images.

How can I convert my own mask or sketch from 512x512 px images to 19*1024 tensors? Can you give me some advice？

Thank you very much!

NameError: name 'trainer' is not defined

(codiff) ubuntu@ubun:~/lixiaoyi/Collaborative-Diffusion$ python main.py --logdir 'outputs/512_vae' --base 'configs/512_vae.yaml' -t --gpus 0,1,2,3,
2023-06-04 20:28:56.401572: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-06-04 20:28:56.460492: I tensorflow/core/util/port.cc:104] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0.
2023-06-04 20:28:56.711893: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/ubuntu/plumed-2.8.2:/usr/local/cuda-11.3/lib64:
2023-06-04 20:28:56.711925: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/ubuntu/plumed-2.8.2:/usr/local/cuda-11.3/lib64:
2023-06-04 20:28:56.711928: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
Global seed set to 23
Running on GPUs 0,1,2,3,
making attention of type 'vanilla' with 512 in_channels
Working with z of shape (1, 3, 64, 64) = 12288 dimensions.
making attention of type 'vanilla' with 512 in_channels
loaded pretrained LPIPS loss from taming/modules/autoencoder/lpips/vgg.pth
Monitoring val/rec_loss as checkpoint metric.
Merged modelckpt-cfg:
{'target': 'pytorch_lightning.callbacks.ModelCheckpoint', 'params': {'dirpath': 'outputs/512_vae/2023-06-04T20-28-57_512_vae/checkpoints', 'filename': '{epoch:06}', 'verbose': True, 'save_last': True, 'monitor': 'val/rec_loss', 'save_top_k': 10}}
Traceback (most recent call last):
File "main.py", line 672, in
trainer = Trainer.from_argparse_args(trainer_opt, **trainer_kwargs)
File "/home/ubuntu/anaconda3/envs/codiff/lib/python3.8/site-packages/pytorch_lightning/trainer/properties.py", line 421, in from_argparse_args
return from_argparse_args(cls, args, **kwargs)
File "/home/ubuntu/anaconda3/envs/codiff/lib/python3.8/site-packages/pytorch_lightning/utilities/argparse.py", line 52, in from_argparse_args
return cls(**trainer_kwargs)
File "/home/ubuntu/anaconda3/envs/codiff/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/env_vars_connector.py", line 40, in insert_env_defaults
return fn(self, **kwargs)
File "/home/ubuntu/anaconda3/envs/codiff/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 346, in init
gpu_ids, tpu_cores = self._parse_devices(gpus, auto_select_gpus, tpu_cores)
File "/home/ubuntu/anaconda3/envs/codiff/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1251, in _parse_devices
gpu_ids = device_parser.parse_gpu_ids(gpus)
File "/home/ubuntu/anaconda3/envs/codiff/lib/python3.8/site-packages/pytorch_lightning/utilities/device_parser.py", line 91, in parse_gpu_ids
return _sanitize_gpu_ids(gpus)
File "/home/ubuntu/anaconda3/envs/codiff/lib/python3.8/site-packages/pytorch_lightning/utilities/device_parser.py", line 163, in _sanitize_gpu_ids
raise MisconfigurationException(
pytorch_lightning.utilities.exceptions.MisconfigurationException: You requested GPUs: [0, 1, 2, 3]
But your machine only has: [0]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "main.py", line 753, in
if trainer.global_rank == 0:
NameError: name 'trainer' is not defined

Mask Alignment

Hi, I use the face parsing model (https://github.com/switchablenorms/CelebAMask-HQ/tree/master/face_parsing) to test the mask accuracy. I get the 512x512 mask result for the first 50 testing images and calculate the accuracy 0.46 for your pretrained model. Can you give me some advice？

Thank you for your excellent work, confidence map weight of different conditions, how I set them? for example seg_mask_schedule, seg_mask_scale_factor

ERROR:run “pip install git+https://github.com/arogozhnikov/einops.git”i

when I run “pip install git+https://github.com/arogozhnikov/einops.git”in Linux on a remote server，

there appears a
ERROR: Command errored out with exit status 128: git clone -q https://github.com/arogozhnikov/einops.git /tmp/pip-req-build-i3bwe_2t Check the logs for full command output.

How can I do？

About GPU

Hi,how much GPU memory is required，can I run on RTX3090 ?

Questions about differences between Multi-ControlNet

Hi, thanks for your excellent work!
I understand that ControlNet was released after the CVPR 2023 deadline, but I'm curious about the differences between your work and Multi-ControlNet and any additional advantages in your work. It appears that Multi-ControlNet can also handle multi-modal generation.

pre-processed data

Hi~,
could you provide the code of how to get the pre-processed data, i.e., the mask image (0.pt, 1.pt, ...) files.
Thank you very much!

RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 72 but got size 71 for tensor number 1 in the list.

hi， when I run python editing/imagic_edit_text.py. It occurs the RuntimeError.

How to get the face mask

Like controlnet, can you provide an annotator model

ziqihuangg / collaborative-diffusion Goto Github PK

collaborative-diffusion's People

Contributors

Stargazers

Watchers

Forkers

collaborative-diffusion's Issues

Recommend Projects

Recommend Topics

Recommend Org