Giter Club home page Giter Club logo

stable-diffusion-pytorch's Introduction

stable-diffusion-pytorch

Open in Colab

Yet another PyTorch implementation of Stable Diffusion.

I tried my best to make the codebase minimal, self-contained, consistent, hackable, and easy to read. Features are pruned if not needed in Stable Diffusion (e.g. Attention mask at CLIP tokenizer/encoder). Configs are hard-coded (based on Stable Diffusion v1.x). Loops are unrolled when that shape makes more sense.

Despite of my efforts, I feel like I cooked another sphagetti. Well, help yourself!

Heavily referred to following repositories. Big kudos to them!

Dependencies

  • PyTorch
  • Numpy
  • Pillow
  • regex
  • tqdm

How to Install

  1. Clone or download this repository.
  2. Install dependencies: Run pip install torch numpy Pillow regex or pip install -r requirements.txt.
  3. Download data.v20221029.tar from here and unpack in the parent folder of stable_diffusion_pytorch. Your folders should be like this:
stable-diffusion-pytorch(-main)/
├─ data/
│  ├─ ckpt/
│  ├─ ...
├─ stable_diffusion_pytorch/
│  ├─ samplers/
└  ┴─ ...

Note that checkpoint files included in data.zip have different license -- you should agree to the license to use checkpoint files.

How to Use

Import stable_diffusion_pytorch as submodule.

Here's some example scripts. You can also read the docstring of stable_diffusion_pytorch.pipeline.generate.

Text-to-image generation:

from stable_diffusion_pytorch import pipeline

prompts = ["a photograph of an astronaut riding a horse"]
images = pipeline.generate(prompts)
images[0].save('output.jpg')

...with multiple prompts:

prompts = [
    "a photograph of an astronaut riding a horse",
    ""]
images = pipeline.generate(prompts)

...with unconditional(negative) prompts:

prompts = ["a photograph of an astronaut riding a horse"]
uncond_prompts = ["low quality"]
images = pipeline.generate(prompts, uncond_prompts)

...with seed:

prompts = ["a photograph of an astronaut riding a horse"]
images = pipeline.generate(prompts, uncond_prompts, seed=42)

Preload models (you will need enough VRAM):

from stable_diffusion_pytorch import model_loader
models = model_loader.preload_models('cuda')

prompts = ["a photograph of an astronaut riding a horse"]
images = pipeline.generate(prompts, models=models)

If you get OOM with above code but have enough RAM (not VRAM), you can move models to GPU when needed and move back to CPU when not needed:

from stable_diffusion_pytorch import model_loader
models = model_loader.preload_models('cpu')

prompts = ["a photograph of an astronaut riding a horse"]
images = pipeline.generate(prompts, models=models, device='cuda', idle_device='cpu')

Image-to-image generation:

from PIL import Image

prompts = ["a photograph of an astronaut riding a horse"]
input_images = [Image.open('space.jpg')]
images = pipeline.generate(prompts, input_images=images)

...with custom strength:

prompts = ["a photograph of an astronaut riding a horse"]
input_images = [Image.open('space.jpg')]
images = pipeline.generate(prompts, input_images=images, strength=0.6)

Change classifier-free guidance scale:

prompts = ["a photograph of an astronaut riding a horse"]
images = pipeline.generate(prompts, cfg_scale=11)

...or disable classifier-free guidance:

prompts = ["a photograph of an astronaut riding a horse"]
images = pipeline.generate(prompts, do_cfg=False)

Reduce steps (faster generation, lower quality):

prompts = ["a photograph of an astronaut riding a horse"]
images = pipeline.generate(prompts, n_inference_steps=28)

Use different sampler:

prompts = ["a photograph of an astronaut riding a horse"]
images = pipeline.generate(prompts, sampler="k_euler")
# "k_lms" (default), "k_euler", or "k_euler_ancestral" is available

Generate image with custom size:

prompts = ["a photograph of an astronaut riding a horse"]
images = pipeline.generate(prompts, height=512, width=768)

LICENSE

All codes on this repository are licensed with MIT License. Please see LICENSE file.

Note that checkpoint files of Stable Diffusion are licensed with CreativeML Open RAIL-M License. It has use-based restriction caluse, so you'd better read it.

stable-diffusion-pytorch's People

Contributors

aaavvv avatar kjsman avatar mspronesti avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

stable-diffusion-pytorch's Issues

Is 4GB VRAM too small for this program?

Thanks for the implementation!
I got
OutOfMemoryError: CUDA out of memory. Tried to allocate 1024.00 MiB (GPU 0; 4.00 GiB total capacity; 3.31 GiB already allocated; 0 bytes free; 3.37 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
for running demo.ipynb. Is there some solution to it?

How are the models in data.zip made?

Thank you for making this repo its very educational. This minimal implementation is brilliant. The bigger SD repos are very hard to understand.

Did you have a script to convert them for official models like this one: https://huggingface.co/runwayml/stable-diffusion-v1-5/blob/main/v1-5-pruned-emaonly.ckpt to the format you use in this repo?

Or are you using a model from some other source?

Are you using SD 1.5 model?

How hard is it to make this repo to use models trained by others? Like Inkpunk for example? https://huggingface.co/Envvi/Inkpunk-Diffusion/blob/main/Inkpunk-Diffusion-v2.ckpt

Should i scale input image

Hello, @kjsman, thanks for this easily readable implementation. I have a question: am I correct that I should scale the input with a sampler before passing the image to the U-Net model during the training process?

query

I hope this message finds you well. I recently came across your repository for Stable Diffusion in PyTorch and I must say, your effort in making the codebase minimal, and easy to read is commendable. I am new to generative models and your implementation has piqued my interest.
I was wondering if you could provide some insights into the training process of your Stable Diffusion model. Specifically, I am curious about the following:

  1. Training Data: Could you please let me know on which dataset you trained your model? Understanding the dataset used would help me get a better understanding of the capabilities and limitations of the model.

  2. Training Time: I'm also interested to know how much time it took for your model to train. This information will help me gauge the computational requirements and plan accordingly for any experiments or projects involving Stable Diffusion.

Moreover, I would like to know more about your approach to writing this code. Did you primarily refer to research papers, or did you take inspiration from other implementations? For instance, you mentioned using Andrej Karpathy's miniGPT. Could you share your thought process behind choosing this reference or any other methods you considered during your implementation?
I greatly appreciate your assistance and expertise in this matter. Thank you for your time and for sharing your work with the community. I look forward to your response.

Image in latent space gets shifted during encoding.

I am using a simple red image as input:

red

from stable_diffusion_pytorch import pipeline
from PIL import Image

prompts = ["a photograph of an astronaut riding a horse"]
input_images = [Image.open('red.png')]
images = pipeline.generate(prompts, input_images=input_images)
images[0].save('output.png')

But I am getting the input image shifted down 8px,8px and it generates ugly brown border:

output

I am pretty sure it happens during the Encode pass as its already shifter in latent space. Here is custom dumping of latent space to image:

encodeDecode

Some thing in the Encode pass that is shifting it by a pixel in the latent space. And I can't figure out what.

[Enhancement] automate weights download without user action

Hello @kjsman,
this is more a feature proposal than an actual issue. Instead of requiring the user to download and open the tar file containing the weights and the vocabulary from your huggingface hub repository, one can directly make the model_loader and the Tokenizer download and cache them.

For the first part, it only requires replacing torch.load(...) here (and for the other 3 functions in the same file) with

torch.hub.load_state_dict_from_url(weights_url, check_hash=True)

All it takes on your side is to upload on hugginface hub the 4 pt files (not in a zipped file) and thats' it.

As regards the tokenizer, just takes to add a default_bpe() method / function

@lru_cache()
def default_bpe():
    p = os.path.join(
        os.path.dirname(os.path.abspath(__file__)), "bpe_simple_vocab_16e6.txt.gz"
    )

    if os.path.exists(p):
        return p
    else:
        p = urlretrieve(
            "https://github.com/openai/CLIP/blob/main/clip/bpe_simple_vocab_16e6.txt.gz?raw=true",
            "bpe_simple_vocab_16e6.txt.gz",
        )
        if len(p) != 1:
            # if it also contains the
            # HTTP message as second entry
            return p[0]
        else:
            return p

Another option is, if you prefer to keep your vocab.json and merges.txt, to upload them as well to Hugginface hub (not in a tar file) or directly to GitHub like the original reposiotry does with its vocab.

If you like it, I will open a new PR, otherwise please let me know if you have any better idea or close this issue if you are not interested in this feature 😄

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.