tonylianlong / stable-diffusion-xl-demo Goto Github PK

A gradio web UI demo for Stable Diffusion XL 1.0, with refiner and MultiGPU support

Home Page: https://colab.research.google.com/github/TonyLianLong/stable-diffusion-xl-demo/blob/main/Stable_Diffusion_XL_Demo.ipynb

Python 54.34% Jupyter Notebook 45.66%

gradio sd sdxl stable-diffusion stable-diffusion-webui stablediffusion webui

stable-diffusion-xl-demo's Introduction

title	emoji	colorFrom	colorTo	sdk	sdk_version	app_file	pinned	license
Stable Diffusion XL 1.0	🔥	yellow	gray	gradio	3.11.0	app.py	true	mit

StableDiffusion XL Gradio Demo WebUI

This is a gradio demo with web ui supporting Stable Diffusion XL 1.0. This demo loads the base and the refiner model.

This is forked from StableDiffusion v2.1 Demo WebUI. Refer to the git commits to see the changes.

Update 🔥🔥🔥: Latent consistency models (LCM) LoRA is supported and enabled by default (controlled by ENABLE_LCM)! Turn on USE_SSD to use SSD-1B for a even faster generation (4.9 sec/image on free colab T4 without additional optimizations)! Colab has been updated to use this by default.

Update 🔥🔥🔥: Check out our work LLM-grounded Diffusion (LMD), which introduces LLMs into the diffusion world and achieves much better prompt understanding compared to the standard Stable Diffusion without any fine-tuning! LMD with SDXL is supported on our Github repo and a demo with SD is available.

Update: SDXL 1.0 is released and our Web UI demo supports it! No application is needed to get the weights! Launch the colab to get started. You can run this demo on Colab for free even on T4.

Update: Multiple GPUs are supported. You can easily spread the workload to different GPUs by setting MULTI_GPU=True. This uses data parallelism to split the workload to different GPUs.

Examples

Update: See a more comprehensive comparison with 1200+ images here. Both SD XL and SD v2.1 are benchmarked on prompts from StableStudio.

Left: SDXL. Right: SD v2.1.

Without any tuning, SDXL generates much better images compared to SD v2.1!

Example 1

Example 2

Example 3

Example 4

Example 5

Installation

With torch 2.0.1 installed, we also need to install:

pip install accelerate transformers invisible-watermark "numpy>=1.17" "PyWavelets>=1.1.1" "opencv-python>=4.1.0.25" safetensors "gradio==3.11.0"
pip install git+https://github.com/huggingface/diffusers.git

Launching

It's free and no form is needed now. Leaked weights seem to be available on reddit, but I have not used/tested them.

There are two ways to load the weights. Option 1 works out of the box (no need for manual download). If you prefer loading from local repo, you can use Option 2.

Option 1

Run the command to automatically set up the weights:

PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512 python app.py

Option 1

If you have cloned both repo (base, refiner) locally (please change the path_to_sdxl):

PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512 SDXL_MODEL_DIR=/path_to_sdxl python app.py

Note that stable-diffusion-xl-base-1.0 and stable-diffusion-xl-refiner-1.0 should be placed in a directory. The path of the directory should replace /path_to_sdxl.

`torch.compile` support

Turn on torch.compile will make overall inference faster. However, this will add some overhead to the first run (i.e., have to wait for compilation during the first run).

To save memory

Turn on pipe.enable_model_cpu_offload() and turn off pipe.to("cuda") in app.py.
Turn off refiner by setting enable_refiner to False.
More ways to save memory and make things faster.

Several options through environment variables

USE_SSD: use segmind/SSD-1B. This is a distilled SDXL model that is faster. This is disabled by default.
ENABLE_LCM: use LCM LoRA. This is enabled by default.
SDXL_MODEL_DIR: load SDXL locally.
ENABLE_REFINER=true/false turn on/off the refiner (refiner refines the generation). The refiner is disabled by default if LCM LoRA or SSD model is enabled.
OFFLOAD_BASE and OFFLOAD_REFINER can be set to true/false to enable/disable model offloading (model offloading saves memory at the cost of slowing down generation).
OUTPUT_IMAGES_BEFORE_REFINER=true/false useful is refiner is enabled. Output images before and after the refiner stage.
SHARE=true/false creates public link (useful for sharing and on colab)
MULTI_GPU=true/false enables data parallelism on multi gpus.

If you enjoy this demo, please give this repo a star ⭐.

stable-diffusion-xl-demo's People

Contributors

Stargazers

Watchers

stable-diffusion-xl-demo's Issues

RuntimeError: Input type (c10::Half) and bias type (float) should be the same

It is ok for use to generate images for the first time. However, it will raise the following error if we generate images for second time.

Traceback (most recent call last):
  File "/docker/software/anaconda3/envs/r3d/lib/python3.8/site-packages/gradio/routes.py", line 321, in run_predict
    output = await app.blocks.process_api(
  File "/docker/software/anaconda3/envs/r3d/lib/python3.8/site-packages/gradio/blocks.py", line 1006, in process_api
    result = await self.call_function(fn_index, inputs, iterator, request)
  File "/docker/software/anaconda3/envs/r3d/lib/python3.8/site-packages/gradio/blocks.py", line 847, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "/docker/software/anaconda3/envs/r3d/lib/python3.8/site-packages/anyio/to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/docker/software/anaconda3/envs/r3d/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "/docker/software/anaconda3/envs/r3d/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "app.py", line 123, in infer
    images = pipe_refiner(prompt=prompt, negative_prompt=negative, image=images, num_inference_steps=steps, strength=refiner_strength, generator=g).images
  File "/docker/software/anaconda3/envs/r3d/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/docker/software/anaconda3/envs/r3d/lib/python3.8/site-packages/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl_img2img.py", line 998, in __call__
    image = self.vae.decode(latents / self.vae.config.scaling_factor, return_dict=False)[0]
  File "/docker/software/anaconda3/envs/r3d/lib/python3.8/site-packages/diffusers/utils/accelerate_utils.py", line 46, in wrapper
    return method(self, *args, **kwargs)
  File "/docker/software/anaconda3/envs/r3d/lib/python3.8/site-packages/diffusers/models/autoencoder_kl.py", line 270, in decode
    decoded = self._decode(z).sample
  File "/docker/software/anaconda3/envs/r3d/lib/python3.8/site-packages/diffusers/models/autoencoder_kl.py", line 256, in _decode
    z = self.post_quant_conv(z)
  File "/docker/software/anaconda3/envs/r3d/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/docker/software/anaconda3/envs/r3d/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 463, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/docker/software/anaconda3/envs/r3d/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 459, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Input type (c10::Half) and bias type (float) should be the same

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.

Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil>=2.8.1->pandas->gradio==3.14.0->-r requirements.txt (line 11)) (1.16.0)
Requirement already satisfied: exceptiongroup in /usr/local/lib/python3.10/dist-packages (from anyio->httpx->gradio==3.14.0->-r requirements.txt (line 11)) (1.2.0)
Installing collected packages: triton, torch
Attempting uninstall: triton
Found existing installation: triton 2.1.0
Uninstalling triton-2.1.0:
Successfully uninstalled triton-2.1.0
Attempting uninstall: torch
Found existing installation: torch 2.1.0
Uninstalling torch-2.1.0:
Successfully uninstalled torch-2.1.0
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
torchaudio 2.1.0 requires torch==2.1.0, but you have torch 2.0.1 which is incompatible.
torchdata 0.7.0 requires torch==2.1.0, but you have torch 2.0.1 which is incompatible.
torchtext 0.16.0 requires torch==2.1.0, but you have torch 2.0.1 which is incompatible.
torchvision 0.16.0 requires torch==2.1.0, but you have torch 2.0.1 which is incompatible.
Successfully installed torch-2.0.1 triton-2.0.0

Not an issue but more of a question

Is it possible to set specific seed for each image?

Google Colab only local URL

Hi,
I'm not a very experienced developer, so I can't seem to figure this out.
When I run your code in Google Colab I get only a local URL:

Running on local URL:  http://127.0.0.1:7860/
*** Failed to connect to ec2.gradio.app:22: [Errno 110] Connection timed out

I tried to make a fork in which I forced the share property to be true :

block.queue().launch(share="true")

But the problem persists. Any idea if I'm doing something wrong?

Thank you,
Erwin Velthuis

CUDA Usage is 0%

PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512 SDXL_MODEL_DIR=pretrained_models OFFLOAD_BASE=false OFFLOAD_REFINER=false  python 
app.py

Question about the refiner step

I am reading the SDXL paper and found that the refiner is applying to the latent image

but in your codes,

images = pipe_refiner(prompt=prompt, negative_prompt=negative, image=images, num_inference_steps=steps, strength=refiner_strength).images

the input are the images instead of the latents
are they the same?

thanks

[suggestion] add aspect ratio options

colab dropdown, just turn it to html

 # @param ["1:1", "4:1", "16:9", "5:2", "2:1", "7:4", "3:2", "8:7", "9:8", "8:9", "7:8", "2:3", "4:7", "1:2", "2:5", "1:3", "9:16"] {allow-input: true}

python code

aspect_ratio = "9:16"
max_pixel = 1024*1024
w, h = (int(e) for e in aspect_ratio.split(':'))
width, height = (round(math.sqrt(max_pixel * x / y) / 8) * 8 for x, y in ((w, h), (h, w)))

width and height can be the param of the param of the model

Suggesting features to advanced settings in demo colab

I am quite surprised by the image quality of the SD XL 1.0 model, and in addition to the Refiner model, which according to my tests really improves the overall image quality.

I noticed that there is a little bit of GPU memory to generate images with a bit more resolution than 1024px. Could you please add the option to select aspect ratio and resolution in the google colab demo?

Next I'll show you my examples with and without Refiner. All generation parameters are completely identical, they are distinguished by Refiner on and Refiner off.

with Refiner

without Refiner

As we can see, this example alone is enough to see that Refiner really improves the image. And it is not bad, but it requires a lot of memory.

Thanks for your attention, and for your work!

Enabling Multi-GPU Support for SDXL for WebUI

Enabling Multi-GPU Support for SDXL

Dear developer,

I am currently using the SDXL for my project, and I am encountering some difficulties with enabling multi-GPU support. I have four Nvidia 3090 GPUs at my disposal, but so far, I have only been able to run the software on one of them, running such a large model on a single 3090 is very slow and ram cosuming.

My attempts to distribute the workload among all GPUs have been unsuccessful. These have included:

Attempt : Use the accelerate to config the multi GPUS and run.

Unfortunately, these methods have not resulted in successful multi-GPU utilization.

For your reference, here is some information about my setup:

Operating System: Ubuntu
Python Version: 3.10.12
CUDA Version:12.2
Deep Learning Framework: PyTorch 2.0.1

I'm not sure if I'm missing something or if there's an issue with SDXL itself. Therefore, I'm writing to ask if you could provide some guidance on this matter. Is there a built-in way to enable multi-GPU support, or are there any additional steps that I might have overlooked? Thank you!

tonylianlong / stable-diffusion-xl-demo Goto Github PK

stable-diffusion-xl-demo's Introduction

StableDiffusion XL Gradio Demo WebUI

Examples

Example 1

Example 2

Example 3

Example 4

Example 5

Installation

Launching

Option 1

Option 1

torch.compile support

To save memory

Several options through environment variables

If you enjoy this demo, please give this repo a star ⭐.

stable-diffusion-xl-demo's People

Contributors

Stargazers

Watchers

Forkers

stable-diffusion-xl-demo's Issues

Recommend Projects

Recommend Topics

Recommend Org

`torch.compile` support