Giter Club home page Giter Club logo

stable-diffusion-xl-demo's Introduction

title emoji colorFrom colorTo sdk sdk_version app_file pinned license
Stable Diffusion XL 1.0
๐Ÿ”ฅ
yellow
gray
gradio
3.11.0
app.py
true
mit

StableDiffusion XL Gradio Demo WebUI

This is a gradio demo with web ui supporting Stable Diffusion XL 1.0. This demo loads the base and the refiner model.

This is forked from StableDiffusion v2.1 Demo WebUI. Refer to the git commits to see the changes.

Update ๐Ÿ”ฅ๐Ÿ”ฅ๐Ÿ”ฅ: Latent consistency models (LCM) LoRA is supported and enabled by default (controlled by ENABLE_LCM)! Turn on USE_SSD to use SSD-1B for a even faster generation (4.9 sec/image on free colab T4 without additional optimizations)! Colab has been updated to use this by default. Open In Colab

Update ๐Ÿ”ฅ๐Ÿ”ฅ๐Ÿ”ฅ: Check out our work LLM-grounded Diffusion (LMD), which introduces LLMs into the diffusion world and achieves much better prompt understanding compared to the standard Stable Diffusion without any fine-tuning! LMD with SDXL is supported on our Github repo and a demo with SD is available.

Update: SDXL 1.0 is released and our Web UI demo supports it! No application is needed to get the weights! Launch the colab to get started. You can run this demo on Colab for free even on T4. Open In Colab

Update: Multiple GPUs are supported. You can easily spread the workload to different GPUs by setting MULTI_GPU=True. This uses data parallelism to split the workload to different GPUs.

SDXL with SSD-1B, LCM LoRA

Examples

Update: See a more comprehensive comparison with 1200+ images here. Both SD XL and SD v2.1 are benchmarked on prompts from StableStudio.

Left: SDXL. Right: SD v2.1.

Without any tuning, SDXL generates much better images compared to SD v2.1!

Example 1

Example 2

Example 3

Example 4

Example 5

Installation

With torch 2.0.1 installed, we also need to install:

pip install accelerate transformers invisible-watermark "numpy>=1.17" "PyWavelets>=1.1.1" "opencv-python>=4.1.0.25" safetensors "gradio==3.11.0"
pip install git+https://github.com/huggingface/diffusers.git

Launching

It's free and no form is needed now. Leaked weights seem to be available on reddit, but I have not used/tested them.

There are two ways to load the weights. Option 1 works out of the box (no need for manual download). If you prefer loading from local repo, you can use Option 2.

Option 1

Run the command to automatically set up the weights:

PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512 python app.py

Option 1

If you have cloned both repo (base, refiner) locally (please change the path_to_sdxl):

PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512 SDXL_MODEL_DIR=/path_to_sdxl python app.py

Note that stable-diffusion-xl-base-1.0 and stable-diffusion-xl-refiner-1.0 should be placed in a directory. The path of the directory should replace /path_to_sdxl.

torch.compile support

Turn on torch.compile will make overall inference faster. However, this will add some overhead to the first run (i.e., have to wait for compilation during the first run).

To save memory

  1. Turn on pipe.enable_model_cpu_offload() and turn off pipe.to("cuda") in app.py.
  2. Turn off refiner by setting enable_refiner to False.
  3. More ways to save memory and make things faster.

Several options through environment variables

  • USE_SSD: use segmind/SSD-1B. This is a distilled SDXL model that is faster. This is disabled by default.
  • ENABLE_LCM: use LCM LoRA. This is enabled by default.
  • SDXL_MODEL_DIR: load SDXL locally.
  • ENABLE_REFINER=true/false turn on/off the refiner (refiner refines the generation). The refiner is disabled by default if LCM LoRA or SSD model is enabled.
  • OFFLOAD_BASE and OFFLOAD_REFINER can be set to true/false to enable/disable model offloading (model offloading saves memory at the cost of slowing down generation).
  • OUTPUT_IMAGES_BEFORE_REFINER=true/false useful is refiner is enabled. Output images before and after the refiner stage.
  • SHARE=true/false creates public link (useful for sharing and on colab)
  • MULTI_GPU=true/false enables data parallelism on multi gpus.

If you enjoy this demo, please give this repo a star โญ.

stable-diffusion-xl-demo's People

Contributors

tonylianlong avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

stable-diffusion-xl-demo's Issues

RuntimeError: Input type (c10::Half) and bias type (float) should be the same

It is ok for use to generate images for the first time. However, it will raise the following error if we generate images for second time.

Traceback (most recent call last):
  File "/docker/software/anaconda3/envs/r3d/lib/python3.8/site-packages/gradio/routes.py", line 321, in run_predict
    output = await app.blocks.process_api(
  File "/docker/software/anaconda3/envs/r3d/lib/python3.8/site-packages/gradio/blocks.py", line 1006, in process_api
    result = await self.call_function(fn_index, inputs, iterator, request)
  File "/docker/software/anaconda3/envs/r3d/lib/python3.8/site-packages/gradio/blocks.py", line 847, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "/docker/software/anaconda3/envs/r3d/lib/python3.8/site-packages/anyio/to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/docker/software/anaconda3/envs/r3d/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "/docker/software/anaconda3/envs/r3d/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "app.py", line 123, in infer
    images = pipe_refiner(prompt=prompt, negative_prompt=negative, image=images, num_inference_steps=steps, strength=refiner_strength, generator=g).images
  File "/docker/software/anaconda3/envs/r3d/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/docker/software/anaconda3/envs/r3d/lib/python3.8/site-packages/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl_img2img.py", line 998, in __call__
    image = self.vae.decode(latents / self.vae.config.scaling_factor, return_dict=False)[0]
  File "/docker/software/anaconda3/envs/r3d/lib/python3.8/site-packages/diffusers/utils/accelerate_utils.py", line 46, in wrapper
    return method(self, *args, **kwargs)
  File "/docker/software/anaconda3/envs/r3d/lib/python3.8/site-packages/diffusers/models/autoencoder_kl.py", line 270, in decode
    decoded = self._decode(z).sample
  File "/docker/software/anaconda3/envs/r3d/lib/python3.8/site-packages/diffusers/models/autoencoder_kl.py", line 256, in _decode
    z = self.post_quant_conv(z)
  File "/docker/software/anaconda3/envs/r3d/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/docker/software/anaconda3/envs/r3d/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 463, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/docker/software/anaconda3/envs/r3d/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 459, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Input type (c10::Half) and bias type (float) should be the same

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.

Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil>=2.8.1->pandas->gradio==3.14.0->-r requirements.txt (line 11)) (1.16.0)
Requirement already satisfied: exceptiongroup in /usr/local/lib/python3.10/dist-packages (from anyio->httpx->gradio==3.14.0->-r requirements.txt (line 11)) (1.2.0)
Installing collected packages: triton, torch
Attempting uninstall: triton
Found existing installation: triton 2.1.0
Uninstalling triton-2.1.0:
Successfully uninstalled triton-2.1.0
Attempting uninstall: torch
Found existing installation: torch 2.1.0
Uninstalling torch-2.1.0:
Successfully uninstalled torch-2.1.0
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
torchaudio 2.1.0 requires torch==2.1.0, but you have torch 2.0.1 which is incompatible.
torchdata 0.7.0 requires torch==2.1.0, but you have torch 2.0.1 which is incompatible.
torchtext 0.16.0 requires torch==2.1.0, but you have torch 2.0.1 which is incompatible.
torchvision 0.16.0 requires torch==2.1.0, but you have torch 2.0.1 which is incompatible.
Successfully installed torch-2.0.1 triton-2.0.0

Google Colab only local URL

Hi,
I'm not a very experienced developer, so I can't seem to figure this out.
When I run your code in Google Colab I get only a local URL:

Running on local URL:  http://127.0.0.1:7860/
*** Failed to connect to ec2.gradio.app:22: [Errno 110] Connection timed out

I tried to make a fork in which I forced the share property to be true :

block.queue().launch(share="true")

But the problem persists. Any idea if I'm doing something wrong?

Thank you,
Erwin Velthuis

CUDA Usage is 0%

image
PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512 SDXL_MODEL_DIR=pretrained_models OFFLOAD_BASE=false OFFLOAD_REFINER=false  python 
app.py

Question about the refiner step

I am reading the SDXL paper and found that the refiner is applying to the latent image

image

but in your codes,

images = pipe_refiner(prompt=prompt, negative_prompt=negative, image=images, num_inference_steps=steps, strength=refiner_strength).images

the input are the images instead of the latents
are they the same?

thanks

[suggestion] add aspect ratio options

colab dropdown, just turn it to html

 # @param ["1:1", "4:1", "16:9", "5:2", "2:1", "7:4", "3:2", "8:7", "9:8", "8:9", "7:8", "2:3", "4:7", "1:2", "2:5", "1:3", "9:16"] {allow-input: true}

python code

aspect_ratio = "9:16"
max_pixel = 1024*1024
w, h = (int(e) for e in aspect_ratio.split(':'))
width, height = (round(math.sqrt(max_pixel * x / y) / 8) * 8 for x, y in ((w, h), (h, w)))

width and height can be the param of the param of the model

Suggesting features to advanced settings in demo colab

I am quite surprised by the image quality of the SD XL 1.0 model, and in addition to the Refiner model, which according to my tests really improves the overall image quality.

I noticed that there is a little bit of GPU memory to generate images with a bit more resolution than 1024px. Could you please add the option to select aspect ratio and resolution in the google colab demo?

Next I'll show you my examples with and without Refiner. All generation parameters are completely identical, they are distinguished by Refiner on and Refiner off.

with Refiner
Refiner

without Refiner
no refiner

As we can see, this example alone is enough to see that Refiner really improves the image. And it is not bad, but it requires a lot of memory.


Thanks for your attention, and for your work!

Enabling Multi-GPU Support for SDXL for WebUI

Enabling Multi-GPU Support for SDXL

Dear developer,

I am currently using the SDXL for my project, and I am encountering some difficulties with enabling multi-GPU support. I have four Nvidia 3090 GPUs at my disposal, but so far, I have only been able to run the software on one of them, running such a large model on a single 3090 is very slow and ram cosuming.

My attempts to distribute the workload among all GPUs have been unsuccessful. These have included:

  • Attempt : Use the accelerate to config the multi GPUS and run.

Unfortunately, these methods have not resulted in successful multi-GPU utilization.

For your reference, here is some information about my setup:

  • Operating System: Ubuntu
  • Python Version: 3.10.12
  • CUDA Version:12.2
  • Deep Learning Framework: PyTorch 2.0.1

I'm not sure if I'm missing something or if there's an issue with SDXL itself. Therefore, I'm writing to ask if you could provide some guidance on this matter. Is there a built-in way to enable multi-GPU support, or are there any additional steps that I might have overlooked? Thank you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.