omerbt / multidiffusion Goto Github PK

Official Pytorch Implementation for "MultiDiffusion: Fusing Diffusion Paths for Controlled Image Generation" presenting "MultiDiffusion" (ICML 2023)

Home Page: https://multidiffusion.github.io/

Python 2.71% Jupyter Notebook 97.29%

diffusion-models generative-model icml image-generation multidiffusion stable-diffusion text-to-image

multidiffusion's People

Contributors

Stargazers

Watchers

multidiffusion's Issues

Bad results on the region-based demo

Anyone can help ? The results I obtained are significantly different from those presented in the paper.

Problem of Formula Derivation

Hello!Thank you for your great work! But I have problem on formula derivation.
I met difficulties when trying to derivatie equation 3 to equation 5.

Could you please help me and show the process.
Thanks a lot!

What does FTD stand for?

In the paper, the abbreviation FTD is used to represent the loss of multi diffusion as compared to standard diffusion. It's never explicitly mentioned what this abbreviation stands for, so I was hoping that could be clarified.

cannot import name 'StableDiffusionPanoramaPipeline' from 'diffusers'

I have installed the requirements. When I run the demo.ipynb, this problem occured.

When will codes about how to train the ckpt release?

Nice job! I wonder when will codes about how to train the ckpt release.

FoV of generated Panorama

What is the fov of generated Panorama？

When will codes for region-based generation release?

Interesting work, and I wonder when will codes for region-based generation release.
BTW, the hugging face demo for region-based generation crashed, can you fix it? I really want to try out

The evaluation dataset about COCO dataset

I want to use your evaluation dataset to test my method, can you release it? Thanks!!!

get_views returns negative indices for values less than 512

MultiDiffusion/panorama.py

Line 24 in f91b1c2

num_blocks_width = (panorama_width - window_size) // stride + 1

get_views with a value of 256 does not return correct values.
Specifically, it will lead to a negative number of blocks:
(256/8 - 64) // 8 + 1 = -3

Ablation to Scheduler/Guidance Scale?

Since multidiffusion is just a diffusion process, did you ever compare it with different choices of schedulers and guidance scales? If you increase the guidance scale, can you get an with very little content variation, but still smoothly interpolates between different regions?

Bootstrapping implementation

Do you plan to release the bootstrapping implementation?

Color coded?

Does this allow for color coding the text to the corresponding mask? The example images shown use differently colored text which apparently corresponds to the colored mask.

spatial controls Runtime Error

the web demo for the spatial controls hosted on HuggingFace has crashed......

is this repository being maintained?

I want to cite an issue, but I see that there hasn't been a commit in three months and that was just the README file.

Bug in vae_optimize.py, vae_tile_forward uses 'result' when result is None

(Updated)
The last line of vae_tile_forward (vae_optimize.py line 650) is

return result if result is not None else result_approx.to(device)

When you interrupt the generation using HiRes fix (click Interrupt in the webui), an exception is thrown from this line,

     File "stable-diffusion-webui/extensions/multidiffusion-upscaler-for-automatic1111/scripts/vae_optimize.py", line 650, in vae_tile_forward
        return result if result is not None else result_approx.to(device)
    AttributeError: 'NoneType' object has no attribute 'to'

Equation 4 interpetation for the panorama use case

Thanks for the awesome paper and very clear code!

For the panorama use case, can the method be reduced to the following implementation:
At each de-noising step, take the average pixel value of the overlapping regions

MultiDiffusion/panorama.py

Line 138 in f91b1c2

latent = torch.where(count > 0, value / count, value)

If yes, how does the least squares formulation of the paper align with it?

And again, thanks!

Why is averaging over overlapping regions necessary?

Why can't we just do non-overlapping generation without averaging? I presume you have tried and found that averaging gives better results. Is there any intuition or theory for that?

region based not working for multiple prompts

Hello. I ran into a problem, can anyone help me on this.
Here's the code I run

device = torch.device('cuda')
sd = MultiDiffusion(device)


mask = torch.zeros(2,1,512,512).cuda()
mask[0,:,:256]=1
mask[1,:,256:]=1

fg_masks = mask
bg_mask = 1 - torch.sum(fg_masks, dim=0, keepdim=True)
bg_mask[bg_mask < 0] = 0
masks = torch.cat([bg_mask, fg_masks])

prompts = ['dog' ,'cat']# + ['artifacts' ] ,'cat'
#neg_prompts = [opt.bg_negative] + opt.fg_negative
print(masks.shape , len(prompts))
img = sd.generate(masks, prompts , '' , width = 512 )

It gave the following error.

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[12], line 17
     15 #neg_prompts = [opt.bg_negative] + opt.fg_negative
     16 print(masks.shape , len(prompts))
---> 17 img = sd.generate(masks, prompts , '' , width = 512 )

File ~/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py:115, in context_decorator.<locals>.decorate_context(*args, **kwargs)
    112 @functools.wraps(func)
    113 def decorate_context(*args, **kwargs):
    114     with ctx_factory():
--> 115         return func(*args, **kwargs)

File ~/Desktop/project/MultiDiffusion/region_based.py:142, in MultiDiffusion.generate(self, masks, prompts, negative_prompts, height, width, num_inference_steps, guidance_scale, bootstrapping)
    139     bg = self.scheduler.add_noise(bg, noise[:, :, h_start:h_end, w_start:w_end], t)
    140     #print(latent.shape , 'latent')
    141     #print(latent_view.shape ,bg.shape,masks_view.shape)
--> 142     latent_view[1:] = latent_view[1:] * masks_view[1:] + bg * (1 - masks_view[1:])
    144 # expand the latents if we are doing classifier-free guidance to avoid doing two forward passes.
    145 latent_model_input = torch.cat([latent_view] * 2)

RuntimeError: The expanded size of the tensor (1) must match the existing size (2) at non-singleton dimension 0.  Target sizes: [1, 4, 64, 64].  Tensor sizes: [2, 4, 64, 64]

Thank you.

Paint-by-words code

Any idea when paint-by-words pipeline will be released? I implemented one myself at https://github.com/lukovnikov/diffusers/blob/paintbywords/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_multidiffusion_paintbywords.py but i don't know if I did it right.

My custom implemetation in Automatic1111's WebUI

Dear authors,

I have implemented your algorithm to Automatic1111's WebUI with the following optimization:

Cropping views in a more symmetric way to get a better result.
Pre-calculate weights to save time (as weights won't change once the views are determined.
Batched latent view processing for acceleration.

Some WebUI related stuffs:

Compatibility with all samplers.
Compatibility with ControlNet.

Here is the link:

https://github.com/pkuliyi2015/multidiffusion-upscaler-for-automatic1111

Great thanks to your fantastic work especially in img2img and panorama generation! We are working on text prompt now.

But the uncontrolled large image generation is not ideal at all, as repeated patterns always appears and the image is mostly unusable.

Would you please give us some insights, if we can generate large images without a user-specified prompt mask?

For example, I have an idea (without proof): we may generate a small reference image first, obtain the prompt attention map, scale it to a larger resolution, and finally we automatically locate the prompt to its correct views during multi-diffusion.

Thank you very much!

Resolution or Projection Question about MultiDiffusion

Panoramas represented in Equirectangular Projection usually are generated in a resolution of Hx2H format such as 512x1024.
But MultiDiffusion uses the resolution of 512x2048. I'm very curious about which projection MultiDiffusion is using? And how can I transfer the generated image into a sphere image?

I notice this also wasn't mentioned in limitations, so I was wondering if you had ever tried it?

omerbt / multidiffusion Goto Github PK

multidiffusion's People

Contributors

Stargazers

Watchers

Forkers

multidiffusion's Issues

Recommend Projects

Recommend Topics

Recommend Org