Giter Club home page Giter Club logo

collaborative-diffusion's People

Contributors

ziqihuangg avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

collaborative-diffusion's Issues

env error

I follow the turtors to create env,but when I run the code,the error accurs.
Traceback (most recent call last):
File "main.py", line 6, in
import pytorch_lightning as pl
File "/home/wpx/miniconda3/envs/codiff/lib/python3.8/site-packages/pytorch_lightning/init.py", line 20, in
from pytorch_lightning import metrics # noqa: E402
File "/home/wpx/miniconda3/envs/codiff/lib/python3.8/site-packages/pytorch_lightning/metrics/init.py", line 15, in
from pytorch_lightning.metrics.classification import ( # noqa: F401
File "/home/wpx/miniconda3/envs/codiff/lib/python3.8/site-packages/pytorch_lightning/metrics/classification/init.py", line 14, in
from pytorch_lightning.metrics.classification.accuracy import Accuracy # noqa: F401
File "/home/wpx/miniconda3/envs/codiff/lib/python3.8/site-packages/pytorch_lightning/metrics/classification/accuracy.py", line 16, in
from torchmetrics import Accuracy as _Accuracy
File "/home/wpx/miniconda3/envs/codiff/lib/python3.8/site-packages/torchmetrics/init.py", line 14, in
from torchmetrics import functional # noqa: E402
File "/home/wpx/miniconda3/envs/codiff/lib/python3.8/site-packages/torchmetrics/functional/init.py", line 68, in
from torchmetrics.functional.text.bert import bert_score
File "/home/wpx/miniconda3/envs/codiff/lib/python3.8/site-packages/torchmetrics/functional/text/bert.py", line 28, in
from transformers import AutoModel, AutoTokenizer
File "/home/wpx/miniconda3/envs/codiff/lib/python3.8/site-packages/transformers/init.py", line 43, in
from . import dependency_versions_check
File "/home/wpx/miniconda3/envs/codiff/lib/python3.8/site-packages/transformers/dependency_versions_check.py", line 41, in
require_version_core(deps[pkg])
File "/home/wpx/miniconda3/envs/codiff/lib/python3.8/site-packages/transformers/utils/versions.py", line 94, in require_version_core
return require_version(requirement, hint)
File "/home/wpx/miniconda3/envs/codiff/lib/python3.8/site-packages/transformers/utils/versions.py", line 85, in require_version
if want_ver is not None and not ops[op](version.parse(got_ver), version.parse(want_ver)):
File "/home/wpx/miniconda3/envs/codiff/lib/python3.8/site-packages/packaging/version.py", line 54, in parse
return Version(version)
File "/home/wpx/miniconda3/envs/codiff/lib/python3.8/site-packages/packaging/version.py", line 200, in init
raise InvalidVersion(f"Invalid version: '{version}'")
packaging.version.InvalidVersion: Invalid version: '0.10.1,<0.11'

FID

Hi, I want to ask how you calculate the FID. Generating 3000 image for the 3000 testing data and calculate the FID between these 3000 images and the training dataset or the whole dataset?

Which image shoulw I use in image editing

image
Hello Ziqi Huang, I am very interested in your work, thank you for sharing the codes of Collaborative Diffusion.
I am a little bit confused about the image editing results. I think 0C_interText_optDM_?_alpha is the generated result. But there are results from different alpha, In this case, I think the image name "0C_interText_optDM_0_alpha=-1.0" should be the best one. But if you automatically select an image which alpha is possibly the best?

Text-to-Face.

Hi, great work. We find this code seems only work on your given text "This man has beard of medium length. He is in his thirties.". when i use "He doesn't have any bangs and has an extremely mild smile, no beard at all, and no glasses. this person is in the thirties.", the results are far from the given text. The results almost are female. So is there any bug here?

Collaborative_edit

Excellent work! I am very interested in Multi-Modal Collaborative Editing. I have a question: why do the results of Mask_edit and Text_edit show a significant difference in skin tone compared to the input image, while the result of Collaborative_edit has a skin tone very similar to the input image? I look forward to your response, thank you !
image

Inquiry about Training Time and RuntimeError in Diffuser Code

Hello,

Thank you for your nice job. I recently encountered an issue while running the training code for Diffuser on GitHub, and I would appreciate your guidance.

During training, I encountered the following error:

Diffusion/ldm/models/diffusion/ddpm_compose.py", line 1237, in p_losses
logvar_t = self.logvar[t].to(self.device)
RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (CPU)

I managed to resolve the issue by moving 't' to the CPU. However, I noticed that the training time for a single epoch is quite long, nearly an hour. I am unsure if this training time is normal or if my actions, such as training on the CPU, are causing the slowdown.

Could you please share your typical training time for a single epoch, so I can better understand if my situation is unusual? Additionally, if you suspect that there may be issues with my setup, I would greatly appreciate any suggestions or solutions you can offer.

Thank you very much for your assistance.

condition resolution

Nice work! It seems the resolution of condition input is just 32, [19,1024] for mask and [1,1024] for sketch? Is it right?

About the training epoch of VAE model and uni-model for text to face

Hello! Based on the instructions you provided, I am trying to retrain the VAE model and uni-model for text to face on RTX3090, may I ask what is the epoch for training these two models respectively? Or are you judging whether to end the model training process based on the visualization results of reconstructions_gs-xxxxxx_e-xxxxxx_b-xxxxxxx.png and samples_gs-xxxxxx_e-xxxxxx_b-xxxxxxx.png?
Looking forward to your answer.

Image editing

Could you release the instruction for image editing?

about Training Time

It seems that max_epochs will default to 1000 for training single modal. And It costs about 25min for one epoch using 8 gpus. Does it mean it costs about 1000 * 25 / 60 / 24 ~ 17days for training text2img model?

about segmentation masks

Hello, I think your work is excellent and inspiring. May I ask if you have the code to convert facial images into segmentation masks? Thank you very much!

About training time

Hi,

Thank you for your nice work.

I would like to know the time required for their training, including vae, uni-modal and dynamic model.

I train the vae model for about 3 hours using 4 gpus. But I still find the sampled image is poor.

image

The recon image looks ok.

image

Empty Condition for training unmodal image generation

I am very interested in your work, thank you for sharing the codes of Collaborative Diffusion.
I have two questions, what is the empty condition when you train text/mask to image generation? If I understand correctly, for text-to-image generation, the empty condition is [""], as your code shows "uc = model.get_learned_conditioning(n_samples * [""])". However, it's the empty condition for training mask to image generation. Is it a zero mask? Thank you very much for your attention.

about the cross attention in Dynamic Diffuser

Hi, you have done a nice work. I 'm interested in your work but I wonder why can we do cross attention between the feature of the x_t and the context.

context: mask -(resize)-> [bz, 32, 32] -(one hot)-> [bz, 19, 32, 32] -(flattern)-> [bz, 19, 1024] -(linear layer)-> [bz, 19, 640]

def forward(self, x, context=None, mask=None):

def forward(self, x, context=None, mask=None):
        h = self.heads   # x.shape: [bz, 1024, 384], context.shape: [4, 19, 640]

        q = self.to_q(x)         # q.shape: [bz, 1024, 384]
        context = default(context, x)
        k = self.to_k(context)   # k.shape: [bz, 19, 384]
        v = self.to_v(context)   # v.shape: [bz, 19, 384]

        q, k, v = map(lambda t: rearrange(t, 'b n (h d) -> (b h) n d', h=h), (q, k, v)) # q.shape: [128, 1024, 12], k[128, 19, 12], v[128, 19, 12]

        sim = einsum('b i d, b j d -> b i j', q, k) * self.scale    # sim: [248, 1024, 19]

        if exists(mask):
            mask = rearrange(mask, 'b ... -> b (...)')
            max_neg_value = -torch.finfo(sim.dtype).max
            mask = repeat(mask, 'b j -> (b h) () j', h=h)
            sim.masked_fill_(~mask, max_neg_value)

        # attention, what we cannot get enough of
        attn = sim.softmax(dim=-1)         # attn: [128, 1024, 19]

        out = einsum('b i j, b j d -> b i d', attn, v)   # out.shape: [128, 1024, 12]
        out = rearrange(out, '(b h) n d -> b n (h d)', h=h)   # out.shape: [4, 1024, 384]
        return self.to_out(out)

So why can we do cross attention between x_t(exactly z_t) and the feature of the mask, what does it mean?
Thank you very much if you could solve my problem.

Pre-training model download from Google drive always fail

A fantastic work, very interested in it.

  1. The pre-training model shared by Google drive is too unstable to download in China. I've been downloading all day and it always gets interrupted automatically and the download fails. Could you share another network disk, such as Baidu network disk.

how to get the .pt files and what do they mean

Hello, I would like to ask where you obtained the. pt files in the mask and sketch folders or how you obtained them by processing other data, and what meanings they represent respectively. I seem unable to find this data from other sources

preprocessing code for sketch and mask

I am very interested in your work, thank you for sharing the codes of Collaborative Diffusion.

I want to generate images from my own sketch and mask. Could you share the preprocessing code for sketch and mask?

I noticed that sketches and masks you used are 19*1024 tensors, but the raw sketches and masks are 512x512 px images.

How can I convert my own mask or sketch from 512x512 px images to 19*1024 tensors? Can you give me some advice?

Thank you very much!

NameError: name 'trainer' is not defined

(codiff) ubuntu@ubun:~/lixiaoyi/Collaborative-Diffusion$ python main.py --logdir 'outputs/512_vae' --base 'configs/512_vae.yaml' -t --gpus 0,1,2,3,
2023-06-04 20:28:56.401572: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-06-04 20:28:56.460492: I tensorflow/core/util/port.cc:104] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable TF_ENABLE_ONEDNN_OPTS=0.
2023-06-04 20:28:56.711893: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/ubuntu/plumed-2.8.2:/usr/local/cuda-11.3/lib64:
2023-06-04 20:28:56.711925: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/ubuntu/plumed-2.8.2:/usr/local/cuda-11.3/lib64:
2023-06-04 20:28:56.711928: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
Global seed set to 23
Running on GPUs 0,1,2,3,
making attention of type 'vanilla' with 512 in_channels
Working with z of shape (1, 3, 64, 64) = 12288 dimensions.
making attention of type 'vanilla' with 512 in_channels
loaded pretrained LPIPS loss from taming/modules/autoencoder/lpips/vgg.pth
Monitoring val/rec_loss as checkpoint metric.
Merged modelckpt-cfg:
{'target': 'pytorch_lightning.callbacks.ModelCheckpoint', 'params': {'dirpath': 'outputs/512_vae/2023-06-04T20-28-57_512_vae/checkpoints', 'filename': '{epoch:06}', 'verbose': True, 'save_last': True, 'monitor': 'val/rec_loss', 'save_top_k': 10}}
Traceback (most recent call last):
File "main.py", line 672, in
trainer = Trainer.from_argparse_args(trainer_opt, **trainer_kwargs)
File "/home/ubuntu/anaconda3/envs/codiff/lib/python3.8/site-packages/pytorch_lightning/trainer/properties.py", line 421, in from_argparse_args
return from_argparse_args(cls, args, **kwargs)
File "/home/ubuntu/anaconda3/envs/codiff/lib/python3.8/site-packages/pytorch_lightning/utilities/argparse.py", line 52, in from_argparse_args
return cls(**trainer_kwargs)
File "/home/ubuntu/anaconda3/envs/codiff/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/env_vars_connector.py", line 40, in insert_env_defaults
return fn(self, **kwargs)
File "/home/ubuntu/anaconda3/envs/codiff/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 346, in init
gpu_ids, tpu_cores = self._parse_devices(gpus, auto_select_gpus, tpu_cores)
File "/home/ubuntu/anaconda3/envs/codiff/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1251, in _parse_devices
gpu_ids = device_parser.parse_gpu_ids(gpus)
File "/home/ubuntu/anaconda3/envs/codiff/lib/python3.8/site-packages/pytorch_lightning/utilities/device_parser.py", line 91, in parse_gpu_ids
return _sanitize_gpu_ids(gpus)
File "/home/ubuntu/anaconda3/envs/codiff/lib/python3.8/site-packages/pytorch_lightning/utilities/device_parser.py", line 163, in _sanitize_gpu_ids
raise MisconfigurationException(
pytorch_lightning.utilities.exceptions.MisconfigurationException: You requested GPUs: [0, 1, 2, 3]
But your machine only has: [0]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "main.py", line 753, in
if trainer.global_rank == 0:
NameError: name 'trainer' is not defined

About GPU

Hi,how much GPU memory is required,can I run on RTX3090 ?

Questions about differences between Multi-ControlNet

Hi, thanks for your excellent work!
I understand that ControlNet was released after the CVPR 2023 deadline, but I'm curious about the differences between your work and Multi-ControlNet and any additional advantages in your work. It appears that Multi-ControlNet can also handle multi-modal generation.

pre-processed data

Hi~,
could you provide the code of how to get the pre-processed data, i.e., the mask image (0.pt, 1.pt, ...) files.
Thank you very much!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.