dome272 / wuerstchen Goto Github PK

View Code? Open in Web Editor NEW

486.0 486.0 33.0 8.81 MB

Official implementation of Würstchen: Efficient Pretraining of Text-to-Image Models

Home Page: https://arxiv.org/abs/2306.00637

License: MIT License

Python 0.27% Shell 0.01% Jupyter Notebook 99.73%

diffusion-models efficiency machine-learning stable-diffusion

wuerstchen's People

Contributors

Stargazers

Watchers

wuerstchen's Issues

Please create a stable diffusion web ui fork

How much is VRAM for training compared to Stable Diffusion?

One-Step Stable Diffusion

hello

is your model able to do One-Step generation ?

ezgif-3-e0f7c625a7.mp4

Hey @dome272,
Amazing work on the V2.
Looking at the code, I see that stage C is not diffusing in the latent space of EffNet, since it's shape is Bx16x24x24 and not 16*12x12 as stated in the paper. I however see that stage B uncond shape is still 16x12x12, so I'm a bit confuse with what is happening there.
Also if i understand well, Stage B is not Paella like anymore?

Will there be a V2 of the paper as well with all the changes?
Thanks!

Checkpoints missing

Hi,
The only checkpoint available on huggingface is stage c.
Where can we find the other stages?

Thanks

Question about diagram in paper

I noticed a small oddity with one of the diagram explaining the model architecture in the paper. In the following image, text-conditioned diffusion model (Stage C) is placed under Stage A, while VQGAN (Stage A) is put under Stage C. While the paper is rather clear about diffusion model being Stage C, with inference starting with Stage C, then B->A, the diagram is slightly confusing.

Question about stage B training instabilities

I have a question about the Discussion section of the paper. It mentions two issue, the issue of varying image size and the issue of training instabilities. I'd like to know more about the latter issue (and potentially fix the first issue). Can you provide more information on the training instabilities you encounter while training stage B? If you still have the loss chart saved, can you share it (if related to the training instability encountered)?

Add ControlNet for Würstchen

Please implement ControlNet pipeline and training script for Würstchen.
If Würstchen requires only 12x12 latent space instead of 64x64 compared to Stable Diffusion and this also means a 28x times (4096 / 144) speed-up in training, this would be awesome!

Less thinking, more tinkering!

crosspost: huggingface/diffusers#5071

Query About Integrating ControlNet into Wurstchen

Hello Wurstchen Team,

I am intrigued by Wurstchen's capabilities and wonder if there are any plans to integrate ControlNet into the project. ControlNet's advanced image direction features could greatly enhance Wurstchen’s functionality, particularly in controlled image generation.

Could you share any insights on the potential incorporation of ControlNet, or if such an integration is under consideration?

Thank you for your innovative work.

Best regards,

Lora training ?

Is it possible to do lora training on these models ?
It would give a big boost to this project, also finetuning on 24 images doesnt give any result at all, why the training stages have no epoch ?

OS error hugginface config file not available

Here is the error :
OSError: laion/CLIP-ViT-H-14-laion2B-s32B-b79K does not appear to have a file named config.json. Checkout 'https://huggingface.co/laion/CLIP-ViT-H-14-laion2B-s32B-b79K/main' for available files.

Artifacts (possibly related to MPS)

Hey! Your model looks really cool- just wondering if you can point me in the right direction as to how to solve this issue. I'm using this code:

import torch
from diffusers import AutoPipelineForText2Image
from diffusers.pipelines.wuerstchen import DEFAULT_STAGE_C_TIMESTEPS
from pathlib import Path
import time

DIR_NAME="./images/"
dirpath = Path(DIR_NAME)
# create parent dir if doesn't exist
dirpath.mkdir(parents=True, exist_ok=True)

pipe = AutoPipelineForText2Image.from_pretrained("warp-ai/wuerstchen", torch_dtype=torch.float16).to("mps")

caption = "A grim woman wearing rusty atompunk power-armor, holding a massive gauss rifle, standing on a cliff overlooking a vast desert, 70mm film still"
negative = "3d, cartoon, doll, lowres"
images = pipe(
    prompt=caption, 
    negative_prompt=negative,
    width=1280,
    height=1024,
    prior_timesteps=DEFAULT_STAGE_C_TIMESTEPS,
    prior_guidance_scale=4.0,
    num_images_per_prompt=1,
).images

for idx, image in enumerate(images):
    image_name = f'{time.time()}.png'
    image_path = dirpath / image_name
    image.save(image_path)

This is the console output I get:

Loading pipeline components...: 100%|█████████████| 5/5 [00:07<00:00,  1.45s/it]
Loading pipeline components...: 100%|█████████████| 4/4 [00:11<00:00,  2.75s/it]
100%|███████████████████████████████████████████| 29/29 [00:40<00:00,  1.41s/it]
  0%|                                                    | 0/12 [00:00<?, ?it/s]/Users/jackwooldridge/StableDiffusion/diffusers/venv/lib/python3.9/site-packages/torch/nn/functional.py:4027: UserWarning: The operator 'aten::_upsample_bicubic2d_aa.out' is not currently supported on the MPS backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/mps/MPSFallback.mm:13.)
  return torch._C._nn._upsample_bicubic2d_aa(input, output_size, align_corners, scale_factors)
100%|███████████████████████████████████████████| 12/12 [00:30<00:00,  2.54s/it]

And here's the image that gets output at the end:

I've tried with no negative prompts and different output sizes. The output always seems to have this distortion.

Image embeddings

Is it possible to use Wuerstchen for img2img generation? Like skipping the prior step and using image embedding directly?

About proper fine-tuning approach

First of all, thanks for sharing the wonderful work of the Wuerstchen model.
I would like to know what is the proper way to fine-tune Wuerstchen V2 pretrained weights with a small data set.
Should we fine-tune only stage-C or both stage-C and stage-B?
Any advice?

How To Access Model On Diffusers App?

Hello the Readme and Diffusers page on HF both state that the model is available in diffusers format. However when using the AppStore app for HF diffusers I cannot see an option for the model to be downloaded. I tried to search for a diffusers version that I could clone in manually but the only model links I found were .pt files. Would anyone be able to point me in the right direction to grabbing a version of this I can run on the diffusers app on osx?

Thank you.

How to switch to the interpolated/finetuned pipeline when using diffusers?

This is probably just me being a beginner when it comes to using diffusers, but I see https://huggingface.co/warp-ai/wuerstchen-prior-model-interpolated and https://huggingface.co/warp-ai/wuerstchen-prior-model-interpolated. How can I use those instead of the default prior model from https://huggingface.co/warp-ai/wuerstchen-prior? The images I'm getting don't look anywhere near as good as the examples, so I assume I'm using the base prior model.

code does not work with older graphics cards

Code uses dtype=torch.bfloat16 and torch.backends.cuda.matmul.allow_tf32 = True, only available in the latest graphics cards.
Error is easy to detect when loading in free colab, which typically gives you Tesla generation.

train_stage_B.py
train_stage_C.py
würstchen-stage-B.ipynb
würstchen-stage-C.ipynb

as well as the file linked with 'open in colab' which is not one of the the ipynbs

There is another problem with the "open in colab" is that the free colab graphics cards are typically smaller.
files affected

Installation error: inconsistent name: expected 'pytorch-tools', but metadata has 'torchtools'

(wuerstchen) ~/Developer/Wuerstchen (main u=) $ pip install -r requirements.txt
Collecting pytorch-tools@ git+https://github.com/pabloppp/pytorch-tools@master (from -r requirements.txt (line 4))
  Cloning https://github.com/pabloppp/pytorch-tools (to revision master) to /tmp/pip-install-bgmuvmb4/pytorch-tools_c07c3a635b684eafb1465faf6168efa1
  Running command git clone --filter=blob:none --quiet https://github.com/pabloppp/pytorch-tools /tmp/pip-install-bgmuvmb4/pytorch-tools_c07c3a635b684eafb1465faf6168efa1
  Resolved https://github.com/pabloppp/pytorch-tools to commit 610158d5016d6418aee27f956e7afd17ff35ba04
  Preparing metadata (setup.py) ... done
  WARNING: Generating metadata for package pytorch-tools produced metadata for project name torchtools. Fix your #egg=pytorch-tools fragments.
Discarding git+https://github.com/pabloppp/pytorch-tools@master: Requested torchtools from git+https://github.com/pabloppp/pytorch-tools@master (from -r requirements.txt (line 4)) has inconsistent name: expected 'pytorch-tools', but metadata has 'torchtools'
Collecting webdataset (from -r requirements.txt (line 1))
  Downloading webdataset-0.2.48-py3-none-any.whl (51 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 51.9/51.9 kB 7.4 MB/s eta 0:00:00
Collecting transformers (from -r requirements.txt (line 2))
  Downloading transformers-4.30.2-py3-none-any.whl (7.2 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7.2/7.2 MB 49.9 MB/s eta 0:00:00
Collecting warmup_scheduler (from -r requirements.txt (line 3))
  Downloading warmup_scheduler-0.3.tar.gz (2.1 kB)
  Preparing metadata (setup.py) ... done
ERROR: Could not find a version that satisfies the requirement pytorch-tools (unavailable) (from versions: 0.1.4, 0.1.5, 0.1.7, 0.1.8, 0.1.9)
ERROR: No matching distribution found for pytorch-tools (unavailable)

I was able to get it running with the following change:

(wuerstchen) ~/Developer/Wuerstchen (main * u=) $ git diff requirements.txt
diff --git a/requirements.txt b/requirements.txt
index fb0ee12..b86346a 100644
--- a/requirements.txt
+++ b/requirements.txt
@@ -1,4 +1,4 @@
 webdataset
 transformers
 warmup_scheduler
-pytorch-tools @ git+https://github.com/pabloppp/pytorch-tools@master
+torchtools @ git+https://github.com/pabloppp/pytorch-tools@master

More than 12Gb VRAM for training (expected) !!

I managed to make training on one gpu despite the training code not made for that at the beginning, full fine tuning asks for more than 12Gb of vram, which could be expected but is definetly a big drawback for most users with consumer card GPU, if 12Gb is not enough, then not a lot of cards can benefit from the marketed fast training and inference speed.

  File "Wuerstchen\train_stage_B.py", line 379, in <module>
    train(0, 1, 1)
  File "Wuerstchen\train_stage_B.py", line 252, in train
    loss = criterion(pred, latents)
  File "v2\train\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "v2\train\lib\site-packages\torch\nn\modules\loss.py", line 1174, in forward
    return F.cross_entropy(input, target, weight=self.weight,
  File "v2\train\lib\site-packages\torch\nn\functional.py", line 3029, in cross_entropy
    return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 256.00 MiB (GPU 0; 11.99 GiB total capacity; 10.84 GiB already allocated; 0 bytes free; 10.98 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
wandb: Waiting for W&B process to finish... (failed 1). Press Ctrl-C to abort syncing.

dome272 / wuerstchen Goto Github PK

wuerstchen's People

Contributors

Stargazers

Watchers

Forkers

wuerstchen's Issues

Recommend Projects

Recommend Topics

Recommend Org