jd-p / cloob-latent-diffusion Goto Github PK
View Code? Open in Web Editor NEWCLOOB Conditioned Latent Diffusion training and inference code
License: MIT License
CLOOB Conditioned Latent Diffusion training and inference code
License: MIT License
Hi there, this is great work and I cant wait to get this running and to test it!
I have tried to create a colab notebook and believe I have downloaded the neccesary models and requirements, however unfortunately Im getting an error. When i run
!./cfg_sample.py prompts "A photorealist detailed snarling goblin" --autoencoder kl_f8 --method "plms" --checkpoint yfcc-latent-diffusion-f8-e2-s250k.ckpt --seed 4485 --steps 50 && v-diffusion-pytorch/make_grid.py out_*.png
I get an error like:
/bin/bash: line 1: 969 Killed
My full code is simply:
!git clone --recursive https://github.com/JD-P/cloob-latent-diffusion
!pip install omegaconf
!pip install pytorch-lightning
!pip3 install pillow einops wandb ftfy regex pycocotools
!pip3 install -r /content/cloob-latent-diffusion/CLIP/requirements.txt
%cd cloob-latent-diffusion
#Get models
!wget https://the-eye.eu/public/AI/models/cloob/cloob_laion_400m_vit_b_16_16_epochs-405a3c31572e0a38f8632fa0db704d0e4521ad663555479f86babd3d178b1892.pkl #Cloob Checkpoint
!wget https://ommer-lab.com/files/latent-diffusion/kl-f8.zip #Autoencoder
!wget https://raw.githubusercontent.com/CompVis/latent-diffusion/main/configs/autoencoder/autoencoder_kl_32x32x4.yaml #Autoencoder config
!wget https://the-eye.eu/public/AI/models/yfcc-latent-diffusion-f8-e2-s250k.ckpt
!unzip /content/cloob-latent-diffusion/kl-f8.zip
%cd /content/cloob-latent-diffusion
sys.path.append("/content/cloob-latent-diffusion")
os.rename("/content/cloob-latent-diffusion/autoencoder_kl_32x32x4.yaml","/content/cloob-latent-diffusion/kl_f8.yaml")
os.rename("model.ckpt","kl_f8.ckpt")
os.rename("cloob_laion_400m_vit_b_16_16_epochs-405a3c31572e0a38f8632fa0db704d0e4521ad663555479f86babd3d178b1892.pkl","cloob_laion_400m_vit_b_16_16_epochs.pkl")
Hoping you can help, thanks!
If you try training with a number of prompts other than 16, you'll get a runtime error like RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 50 but got size 32 for tensor number 1 in the list.
(caused by trying 25 prompts)
Might be worth making clear in the readme that the list of demo prompts MUST be 16 lines long, and maybe check that in def on_batch_end(self, trainer, module):
(train_latent_diffusion.py, line 435ish) and throw an error if there's a shape mismatch that is more descriptive?
I also had trouble loading the pretrained model lined in the readme by just passing in the ckpt file as the --resume-from
argument. Instead, I had to modify the training script to do self.model.load_state_dict(torch.load(path_to_ckpt))
in the init function of class LightningDiffusion. I'd guess this is because the checkpoint shared is just for the model (not the full bundle with ema_model, cloob and the autoencoder that would be saved if someone trained from scratch themselves. It's not a biggie, but for people wanting to fine-tune from your shared checkpoint it's currently something that requires a bit of figuring out and I wanted to share in case it's an easy fix.
Thanks for all the work you've done on this!
Hi, I am about to train my own cloob latent diffusion and would like to confirm this is right.
autoencoder_scale
in your example was about 100 but I got something like 6.85.
It depends on the training dataset but I saw a different line in your previous log to compute the final scale.
So, I just would like to double check if the master's code is doing right, or not.
Thank you!
autoencoder_scale = torch.tensor(var_accum ** 0.5)
in the master branch
autoencoder_scale = torch.tensor((var_accum / 32) ** 0.5)
I'm getting an error when I try to run the danbooru command line:
$ python cfg_sample.py "anime portrait of a man in a flight jacket leaning against a biplane" --autoencoder danbooru-kl-f8 --checkpoint danbooru-latent-diffusion-e88.ckpt --cloob-checkpoint cloob_laion_400m_vit_b_16_32_epochs --base-channels 128 --channel-multipliers 4,4,8,8 -n 16 --seed 4485 && v-diffusion-pytorch/make_grid.py out_*.png
Using device: cuda:0
making attention of type 'vanilla' with 512 in_channels
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
making attention of type 'vanilla' with 512 in_channels
loaded pretrained LPIPS loss from taming/modules/autoencoder/lpips\vgg.pth
Restored from danbooru-kl-f8.ckpt
{'url': 'https://the-eye.eu/public/AI/models/cloob/cloob_laion_400m_vit_b_16_32_epochs-646f61628eb4bc03a01ce5c23b727a348105f0405b6037a329da062739a0644
1.pkl', 'd_embed': 512, 'inv_tau': 30.0, 'scale_hopfield': 15.0, 'image_encoder': {'type': 'ViT', 'image_size': 224, 'input_channels': 3, 'normalize':
{'mean': [0.48145466, 0.4578275, 0.40821073], 'std': [0.26862954, 0.26130258, 0.27577711]}, 'patch_size': 16, 'n_layers': 12, 'd_model': 768, 'n_head
s': 12}, 'text_encoder': {'type': 'transformer', 'tokenizer': 'clip', 'text_size': 77, 'vocab_size': 49408, 'n_layers': 12, 'd_model': 512, 'n_heads':
8}}
Traceback (most recent call last):
File "cfg_sample.py", line 208, in
main()
File "cfg_sample.py", line 144, in main
cloob.text_encoder(cloob.tokenize(txt).to(device)).float())
File "C:\Users\Bart\anaconda3\envs\cloob\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "D:\ai\cloob-latent-diffusion./cloob-training\cloob_training\model_pt.py", line 105, in forward
padding_mask = torch.cumsum(eot_mask, dim=-1) == 0 | eot_mask
TypeError: unsupported operand type(s) for |: 'int' and 'Tensor'
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.