dome272 / wuerstchen Goto Github PK
View Code? Open in Web Editor NEWOfficial implementation of Würstchen: Efficient Pretraining of Text-to-Image Models
Home Page: https://arxiv.org/abs/2306.00637
License: MIT License
Official implementation of Würstchen: Efficient Pretraining of Text-to-Image Models
Home Page: https://arxiv.org/abs/2306.00637
License: MIT License
hello
is your model able to do One-Step generation ?
Hey @dome272,
Amazing work on the V2.
Looking at the code, I see that stage C is not diffusing in the latent space of EffNet, since it's shape is Bx16x24x24 and not 16*12x12 as stated in the paper. I however see that stage B uncond shape is still 16x12x12, so I'm a bit confuse with what is happening there.
Also if i understand well, Stage B is not Paella like anymore?
Will there be a V2 of the paper as well with all the changes?
Thanks!
Hi,
The only checkpoint available on huggingface is stage c.
Where can we find the other stages?
Thanks
I noticed a small oddity with one of the diagram explaining the model architecture in the paper. In the following image, text-conditioned diffusion model (Stage C) is placed under Stage A, while VQGAN (Stage A) is put under Stage C. While the paper is rather clear about diffusion model being Stage C, with inference starting with Stage C, then B->A, the diagram is slightly confusing.
I have a question about the Discussion section of the paper. It mentions two issue, the issue of varying image size and the issue of training instabilities. I'd like to know more about the latter issue (and potentially fix the first issue). Can you provide more information on the training instabilities you encounter while training stage B? If you still have the loss chart saved, can you share it (if related to the training instability encountered)?
Please implement ControlNet pipeline and training script for Würstchen.
If Würstchen requires only 12x12 latent space instead of 64x64 compared to Stable Diffusion and this also means a 28x times (4096 / 144) speed-up in training, this would be awesome!
Less thinking, more tinkering!
crosspost: huggingface/diffusers#5071
Hello Wurstchen Team,
I am intrigued by Wurstchen's capabilities and wonder if there are any plans to integrate ControlNet into the project. ControlNet's advanced image direction features could greatly enhance Wurstchen’s functionality, particularly in controlled image generation.
Could you share any insights on the potential incorporation of ControlNet, or if such an integration is under consideration?
Thank you for your innovative work.
Best regards,
Is it possible to do lora training on these models ?
It would give a big boost to this project, also finetuning on 24 images doesnt give any result at all, why the training stages have no epoch ?
Here is the error :
OSError: laion/CLIP-ViT-H-14-laion2B-s32B-b79K does not appear to have a file named config.json. Checkout 'https://huggingface.co/laion/CLIP-ViT-H-14-laion2B-s32B-b79K/main' for available files.
Hey! Your model looks really cool- just wondering if you can point me in the right direction as to how to solve this issue. I'm using this code:
import torch
from diffusers import AutoPipelineForText2Image
from diffusers.pipelines.wuerstchen import DEFAULT_STAGE_C_TIMESTEPS
from pathlib import Path
import time
DIR_NAME="./images/"
dirpath = Path(DIR_NAME)
# create parent dir if doesn't exist
dirpath.mkdir(parents=True, exist_ok=True)
pipe = AutoPipelineForText2Image.from_pretrained("warp-ai/wuerstchen", torch_dtype=torch.float16).to("mps")
caption = "A grim woman wearing rusty atompunk power-armor, holding a massive gauss rifle, standing on a cliff overlooking a vast desert, 70mm film still"
negative = "3d, cartoon, doll, lowres"
images = pipe(
prompt=caption,
negative_prompt=negative,
width=1280,
height=1024,
prior_timesteps=DEFAULT_STAGE_C_TIMESTEPS,
prior_guidance_scale=4.0,
num_images_per_prompt=1,
).images
for idx, image in enumerate(images):
image_name = f'{time.time()}.png'
image_path = dirpath / image_name
image.save(image_path)
This is the console output I get:
Loading pipeline components...: 100%|█████████████| 5/5 [00:07<00:00, 1.45s/it]
Loading pipeline components...: 100%|█████████████| 4/4 [00:11<00:00, 2.75s/it]
100%|███████████████████████████████████████████| 29/29 [00:40<00:00, 1.41s/it]
0%| | 0/12 [00:00<?, ?it/s]/Users/jackwooldridge/StableDiffusion/diffusers/venv/lib/python3.9/site-packages/torch/nn/functional.py:4027: UserWarning: The operator 'aten::_upsample_bicubic2d_aa.out' is not currently supported on the MPS backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/mps/MPSFallback.mm:13.)
return torch._C._nn._upsample_bicubic2d_aa(input, output_size, align_corners, scale_factors)
100%|███████████████████████████████████████████| 12/12 [00:30<00:00, 2.54s/it]
And here's the image that gets output at the end:
I've tried with no negative prompts and different output sizes. The output always seems to have this distortion.
Is it possible to use Wuerstchen for img2img generation? Like skipping the prior step and using image embedding directly?
First of all, thanks for sharing the wonderful work of the Wuerstchen model.
I would like to know what is the proper way to fine-tune Wuerstchen V2 pretrained weights with a small data set.
Should we fine-tune only stage-C or both stage-C and stage-B?
Any advice?
Hello the Readme and Diffusers page on HF both state that the model is available in diffusers format. However when using the AppStore app for HF diffusers I cannot see an option for the model to be downloaded. I tried to search for a diffusers version that I could clone in manually but the only model links I found were .pt files. Would anyone be able to point me in the right direction to grabbing a version of this I can run on the diffusers app on osx?
Thank you.
This is probably just me being a beginner when it comes to using diffusers, but I see https://huggingface.co/warp-ai/wuerstchen-prior-model-interpolated and https://huggingface.co/warp-ai/wuerstchen-prior-model-interpolated. How can I use those instead of the default prior model from https://huggingface.co/warp-ai/wuerstchen-prior? The images I'm getting don't look anywhere near as good as the examples, so I assume I'm using the base prior model.
Code uses dtype=torch.bfloat16
and torch.backends.cuda.matmul.allow_tf32 = True
, only available in the latest graphics cards.
Error is easy to detect when loading in free colab, which typically gives you Tesla generation.
train_stage_B.py
train_stage_C.py
würstchen-stage-B.ipynb
würstchen-stage-C.ipynb
as well as the file linked with 'open in colab' which is not one of the the ipynbs
There is another problem with the "open in colab" is that the free colab graphics cards are typically smaller.
files affected
(wuerstchen) ~/Developer/Wuerstchen (main u=) $ pip install -r requirements.txt
Collecting pytorch-tools@ git+https://github.com/pabloppp/pytorch-tools@master (from -r requirements.txt (line 4))
Cloning https://github.com/pabloppp/pytorch-tools (to revision master) to /tmp/pip-install-bgmuvmb4/pytorch-tools_c07c3a635b684eafb1465faf6168efa1
Running command git clone --filter=blob:none --quiet https://github.com/pabloppp/pytorch-tools /tmp/pip-install-bgmuvmb4/pytorch-tools_c07c3a635b684eafb1465faf6168efa1
Resolved https://github.com/pabloppp/pytorch-tools to commit 610158d5016d6418aee27f956e7afd17ff35ba04
Preparing metadata (setup.py) ... done
WARNING: Generating metadata for package pytorch-tools produced metadata for project name torchtools. Fix your #egg=pytorch-tools fragments.
Discarding git+https://github.com/pabloppp/pytorch-tools@master: Requested torchtools from git+https://github.com/pabloppp/pytorch-tools@master (from -r requirements.txt (line 4)) has inconsistent name: expected 'pytorch-tools', but metadata has 'torchtools'
Collecting webdataset (from -r requirements.txt (line 1))
Downloading webdataset-0.2.48-py3-none-any.whl (51 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 51.9/51.9 kB 7.4 MB/s eta 0:00:00
Collecting transformers (from -r requirements.txt (line 2))
Downloading transformers-4.30.2-py3-none-any.whl (7.2 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7.2/7.2 MB 49.9 MB/s eta 0:00:00
Collecting warmup_scheduler (from -r requirements.txt (line 3))
Downloading warmup_scheduler-0.3.tar.gz (2.1 kB)
Preparing metadata (setup.py) ... done
ERROR: Could not find a version that satisfies the requirement pytorch-tools (unavailable) (from versions: 0.1.4, 0.1.5, 0.1.7, 0.1.8, 0.1.9)
ERROR: No matching distribution found for pytorch-tools (unavailable)
I was able to get it running with the following change:
(wuerstchen) ~/Developer/Wuerstchen (main * u=) $ git diff requirements.txt
diff --git a/requirements.txt b/requirements.txt
index fb0ee12..b86346a 100644
--- a/requirements.txt
+++ b/requirements.txt
@@ -1,4 +1,4 @@
webdataset
transformers
warmup_scheduler
-pytorch-tools @ git+https://github.com/pabloppp/pytorch-tools@master
+torchtools @ git+https://github.com/pabloppp/pytorch-tools@master
I managed to make training on one gpu despite the training code not made for that at the beginning, full fine tuning asks for more than 12Gb of vram, which could be expected but is definetly a big drawback for most users with consumer card GPU, if 12Gb is not enough, then not a lot of cards can benefit from the marketed fast training and inference speed.
File "Wuerstchen\train_stage_B.py", line 379, in <module>
train(0, 1, 1)
File "Wuerstchen\train_stage_B.py", line 252, in train
loss = criterion(pred, latents)
File "v2\train\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "v2\train\lib\site-packages\torch\nn\modules\loss.py", line 1174, in forward
return F.cross_entropy(input, target, weight=self.weight,
File "v2\train\lib\site-packages\torch\nn\functional.py", line 3029, in cross_entropy
return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 256.00 MiB (GPU 0; 11.99 GiB total capacity; 10.84 GiB already allocated; 0 bytes free; 10.98 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
wandb: Waiting for W&B process to finish... (failed 1). Press Ctrl-C to abort syncing.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.