nerdyrodent / vqgan-clip Goto Github PK
View Code? Open in Web Editor NEWJust playing with getting VQGAN+CLIP running locally, rather than having to use colab.
License: Other
Just playing with getting VQGAN+CLIP running locally, rather than having to use colab.
License: Other
https://twitter.com/e08477/status/1418440857578098691?s=21
Would love to be able to recreate this. We need to build out a taxonomy of styles.
I. Mentioned on a comment on. YouTube that could take a video / ffmpeg and stick each image in and see it reimagine video. Can you try with the something? And post result?
I pulled the new code and found that many new parameters appeared. I would like to know the functions of these parameters. I am looking forward to your reply
This error occurred when I adjusted the code to make it easier for the output path to be not output.png, but I could not solve it. This error also occurred when I downloaded a new source code, I do not know where I got it wrong
Could not make this run on multi GPUs. Would love some help!
Are there any good resources of key words that work well with VQGAN+CLIP?
I compiled some I heard so far:
Since this isn't really an issue, perhaps opening up Github discussions in this repo would be better for these kinds of topics.
Hey!
Thanks for this, I am so ready to create bizarreness.
Hardware:
Ryzen 7 3700X
32GB RAM
RTX 2070 Super
OS: Windows 10 Pro
I'm getting the below error when running generate.py:
python generate.py -p "Yee"
Output:
(vqgan) PS C:\Users\andre\anaconda3\envs\vqgan\VQGAN-CLIP> python generate.py -p "Yee" Working with z of shape (1, 256, 16, 16) = 65536 dimensions. loaded pretrained LPIPS loss from taming/modules/autoencoder/lpips\vgg.pth VQLPIPSWithDiscriminator running with hinge loss. Restored from checkpoints/vqgan_imagenet_f16_16384.ckpt C:\Users\andre\anaconda3\envs\vqgan\lib\site-packages\torchvision\transforms\transforms.py:280: UserWarning: Argument interpolation should be of type InterpolationMode instead of int. Please, use InterpolationMode enum. warnings.warn( Using device: cuda:0 Optimising using: Adam Using text prompts: ['Yee'] Using seed: 329366907029900 0it [00:01, ?it/s] Traceback (most recent call last): File "C:\Users\andre\anaconda3\envs\vqgan\VQGAN-CLIP\generate.py", line 461, in <module> train(i) File "C:\Users\andre\anaconda3\envs\vqgan\VQGAN-CLIP\generate.py", line 444, in train lossAll = ascend_txt() File "C:\Users\andre\anaconda3\envs\vqgan\VQGAN-CLIP\generate.py", line 423, in ascend_txt iii = perceptor.encode_image(normalize(make_cutouts(out))).float() File "C:\Users\andre\anaconda3\envs\vqgan\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "C:\Users\andre\anaconda3\envs\vqgan\VQGAN-CLIP\generate.py", line 241, in forward batch = self.augs(torch.cat(cutouts, dim=0)) File "C:\Users\andre\anaconda3\envs\vqgan\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "C:\Users\andre\anaconda3\envs\vqgan\lib\site-packages\torch\nn\modules\container.py", line 139, in forward input = module(input) File "C:\Users\andre\anaconda3\envs\vqgan\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "C:\Users\andre\anaconda3\envs\vqgan\lib\site-packages\kornia\augmentation\base.py", line 245, in forward output = self.apply_func(in_tensor, in_transform, self._params, return_transform) File "C:\Users\andre\anaconda3\envs\vqgan\lib\site-packages\kornia\augmentation\base.py", line 210, in apply_func output[to_apply] = self.apply_transform(in_tensor[to_apply], params, trans_matrix[to_apply]) File "C:\Users\andre\anaconda3\envs\vqgan\lib\site-packages\kornia\augmentation\augmentation.py", line 684, in apply_transform return warp_affine( File "C:\Users\andre\anaconda3\envs\vqgan\lib\site-packages\kornia\geometry\transform\imgwarp.py", line 192, in warp_affine dst_norm_trans_src_norm: torch.Tensor = normalize_homography(M_3x3, (H, W), dsize) File "C:\Users\andre\anaconda3\envs\vqgan\lib\site-packages\kornia\geometry\transform\homography_warper.py", line 380, in normalize_homography src_pix_trans_src_norm = _torch_inverse_cast(src_norm_trans_src_pix) File "C:\Users\andre\anaconda3\envs\vqgan\lib\site-packages\kornia\utils\helpers.py", line 48, in _torch_inverse_cast return torch.inverse(input.to(dtype)).to(input.dtype) RuntimeError: cusolver error: CUSOLVER_STATUS_INTERNAL_ERROR, when calling cusolverDnCreate(handle)
`
(vqgan) D:\art\VQGAN-CLIP>python generate.py -p "A painting of an apple in a fruit bowl"
Working with z of shape (1, 256, 16, 16) = 65536 dimensions.
loaded pretrained LPIPS loss from taming/modules/autoencoder/lpips\vgg.pth
VQLPIPSWithDiscriminator running with hinge loss.
Traceback (most recent call last):
File "D:\art\VQGAN-CLIP\generate.py", line 546, in
model = load_vqgan_model(args.vqgan_config, args.vqgan_checkpoint).to(device)
File "D:\art\VQGAN-CLIP\generate.py", line 520, in load_vqgan_model
model.init_from_ckpt(checkpoint_path)
File "D:\art\VQGAN-CLIP\taming-transformers\taming\models\vqgan.py", line 52, in init_from_ckpt
self.load_state_dict(sd, strict=False)
File "D:\ana3\envs\vqgan\lib\site-packages\torch\nn\modules\module.py", line 1406, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for VQModel:
size mismatch for loss.discriminator.main.8.weight: copying a param with shape torch.Size([1, 256, 4, 4]) from checkpoint, the shape in current model is torch.Size([512, 256, 4, 4]).
size mismatch for quantize.embedding.weight: copying a param with shape torch.Size([16384, 256]) from checkpoint, the shape in current model is torch.Size([1024, 256]).
Please consider adding to your setup instructions:
Was receiving an error:
File "generate.py", line 18, in
from taming.models import cond_transformer, vqgan
ModuleNotFoundError: No module named 'taming'
Solution:
pip3 install taming-transformers
Is there any way to make the program save each final product as prompt.jpg instead of overwriting output.jpg without manually telling it to use a different name at the start?
Excuse my tech illiteracy if this is obvious. I'm on Windows. I'm trying to combine a zoom with the "storyboard" mode, with multiple sequential text inputs. Currently I only know how do this with the simple built-in zoom option (as opposed to zoom.sh), but it zooms into the bottom-right corner by, apparently, displacing the entire image up and left 5 pixels per iteration. Is there a solution to this?
Most likely unimportant, but here's what I use:
python generate.py -p "Roses|photo:-1 ^ Sunflowers ^ Daisies ^ Daffodils" -cpe 1500 -zvid -i 6000 -zse 10 -vl 20 -zsc 1.005 -opt Adagrad -lr 0.15 -se 6000 -s 250 250
I'm getting UserWarning: CUDA initialization: CUDA driver initialization failed, you might not have a CUDA gpu. (Triggered internally at /pytorch/c10/cuda/CUDAFunctions.cpp:115.)
. I do have a GPU, but I'm using CUDA version 8 (it's a shared lab machine).
Is the old CUDA version why I get the above error? Any way to fix this, apart from setting up a brand new system?
I found a magical realization that took less than a minute
https://huggingface.co/spaces/akhaliq/VQGAN_CLIP
on your youtube videos - you have a ubuntu desktop showing some heads up display.
does it show gpu? whats the name?
Temporary solution is to pass in a 'cpe' argument that is any value greater the 'i' argument. As in:
python generate.py -i 2000 -p "vase" -cpe 3000
Error message points to generate.py line 620
Code is trying to go to the next story step when prompt is a single sentence.
After following your video—with the conda approach, making the environment, updating it with the .yml
and getting torch==1.9.0—I am getting the following error from generate.py
:
ModuleNotFoundError: No module named 'CLIP'
I tried to even install the CLIP repo via pip before re-installing torch and everything else but it didn't work...
I am sure this is a silly issue
I get this error when using -vid
Traceback (most recent call last):
File "C:\Users\vanceagher\vqgan\generate.py", line 669, in <module>
p = Popen(['ffmpeg',
File "C:\Users\vanceagher\anaconda3\envs\vqgan\lib\subprocess.py", line 951, in __init__
self._execute_child(args, executable, preexec_fn, close_fds,
File "C:\Users\vanceagher\anaconda3\envs\vqgan\lib\subprocess.py", line 1420, in _execute_child
hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
FileNotFoundError: [WinError 2] The system cannot find the file specified````
I got an amazing output that i got on the google collab notebook and am trying to replicate it to a larger scale running locally on my 3090. For some reason the outputs appear to have a different style than that of collab (im using the same model, prompt, seed and save interval..)
Is there something thats been altered ? Is it to do with the optimizer or learning rate? (these can't be specified on the collab notebook).
Thanks alot for this bit of software it has given me hours of experimenting and fun!
Is there a way I can save each image along the process rather than just the final output? Then using ffmpeg to combine the images into an animation? I've got it working on my PC! just interested in that feature as I can't get 900*900 on collab.. Thanks!
Hi there! I'm completely new to this.
Where are the images saved after generation? Sorry, if this question is stupid.
Error message about conda activate:
CommandNotFoundError: Your shell has not been properly configured to use 'conda activate'. To initialize your shell, run $ conda init <SHELL_NAME> Currently supported shells are: - bash
But if I try to run conda init I get this:
a init no change /root/anaconda3/condabin/conda no change /root/anaconda3/bin/conda no change /root/anaconda3/bin/conda-env no change /root/anaconda3/bin/activate no change /root/anaconda3/bin/deactivate no change /root/anaconda3/etc/profile.d/conda.sh no change /root/anaconda3/etc/fish/conf.d/conda.fish no change /root/anaconda3/shell/condabin/Conda.psm1 no change /root/anaconda3/shell/condabin/conda-hook.ps1 no change /root/anaconda3/lib/python3.8/site-packages/xontrib/conda.xsh no change /root/anaconda3/etc/profile.d/conda.csh no change /root/.bashrc No action taken.
And activate vqgan will still not work.
Hi there, just trying to generate video using the -vid argument but getting the following error
I installed it yesterday on my work machine and it worked just as it should
Today I tried to install it on my home machine, but I get following error:
vstil@DESKTOP-R251CM7 MINGW64 ~/VQGAN-CLIP (main)
$ python generate.py -se 1 -p "a cat"
C:\Users\vstil\AppData\Local\Programs\Python\Python39\lib\site-packages\setuptools\distutils_patch.py:25: UserWarning: Distutils was imported before Setuptools. This usage is discouraged and may exhibit undesirable behaviors or errors. Please use Setuptools' objects directly or at least import Setuptools first.
warnings.warn(
Working with z of shape (1, 256, 16, 16) = 65536 dimensions.
loaded pretrained LPIPS loss from taming/modules/autoencoder/lpips\vgg.pth
VQLPIPSWithDiscriminator running with hinge loss.
Traceback (most recent call last):
File "C:\Users\vstil\VQGAN-CLIP\generate.py", line 546, in <module>
model = load_vqgan_model(args.vqgan_config, args.vqgan_checkpoint).to(device)
File "C:\Users\vstil\VQGAN-CLIP\generate.py", line 520, in load_vqgan_model
model.init_from_ckpt(checkpoint_path)
File "taming-transformers\taming\models\vqgan.py", line 45, in init_from_ckpt
sd = torch.load(path, map_location="cpu")["state_dict"]
File "C:\Users\vstil\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\serialization.py", line 608, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "C:\Users\vstil\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\serialization.py", line 777, in _legacy_load
magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, 'm'.
I already tried to redownload the VQGAN-CLIP repo to no avail...
Any help would be greatly appreciated!
When I run the generate.py script, I get the following error when the video is being made.
Generating video...
Traceback (most recent call last):
File "/home/ubuntu/VQGAN-CLIP/generate.py", line 581, in <module>
p = Popen(['ffmpeg',
File "/home/ubuntu/anaconda3/envs/vqgan/lib/python3.9/subprocess.py", line 951, in __init__
self._execute_child(args, executable, preexec_fn, close_fds,
File "/home/ubuntu/anaconda3/envs/vqgan/lib/python3.9/subprocess.py", line 1821, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'ffmpeg'
I've been trying to do larger resolution images but no matter what size GPU I use, i get a message like the one below where it seems pytorch is using a massive amount of the available memory? Any advice on how to go about creating larger images?
GPU 0; 31.75 GiB total capacity; 29.72 GiB already allocated; 381.00 MiB free; 29.94 GiB reserved in total by PyTorch
when I try running the example line python generate.py -p "The inside of a sphere" -zvid -i 4500 -zse 20 -vl 10 -zsc 0.97 -opt Adagrad -lr 0.15 -se 4500
i receive an error that states
File "C:\Users\user\VQGAN-CLIP\generate.py", line 808, in <module> p = Popen(['ffmpeg', File "C:\Users\user\anaconda3\envs\vqgan\lib\subprocess.py", line 951, in __init__ self._execute_child(args, executable, preexec_fn, close_fds, File "C:\Users\user\anaconda3\envs\vqgan\lib\subprocess.py", line 1420, in _execute_child hp, ht, pid, tid = _winapi.CreateProcess(executable, args, FileNotFoundError: [WinError 2] The system cannot find the file specified
im not really sure where to go withh this error and I dont know if this is a problem on my side or something with the program
I get RuntimeError: "softmax_lastdim_kernel_impl" not implemented for 'Half'
when running this against my CPU.
$ python generate.py -p "A painting of an apple in a fruit bowl" -cd cpu
Gives
Working with z of shape (1, 256, 16, 16) = 65536 dimensions.
loaded pretrained LPIPS loss from taming/modules/autoencoder/lpips/vgg.pth
VQLPIPSWithDiscriminator running with hinge loss.
Restored from checkpoints/vqgan_imagenet_f16_16384.ckpt
Traceback (most recent call last):
File "/home/daniel/repos/vqgan-clip/generate.py", line 633, in <module>
embed = perceptor.encode_text(clip.tokenize(txt).to(device)).float()
File "/home/daniel/repos/vqgan-clip/CLIP/clip/model.py", line 344, in encode_text
x = self.transformer(x)
File "/home/daniel/anaconda3/envs/vqgan/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/daniel/repos/vqgan-clip/CLIP/clip/model.py", line 199, in forward
return self.resblocks(x)
File "/home/daniel/anaconda3/envs/vqgan/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/daniel/anaconda3/envs/vqgan/lib/python3.9/site-packages/torch/nn/modules/container.py", line 139, in forward
input = module(input)
File "/home/daniel/anaconda3/envs/vqgan/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/daniel/repos/vqgan-clip/CLIP/clip/model.py", line 186, in forward
x = x + self.attention(self.ln_1(x))
File "/home/daniel/repos/vqgan-clip/CLIP/clip/model.py", line 183, in attention
return self.attn(x, x, x, need_weights=False, attn_mask=self.attn_mask)[0]
File "/home/daniel/anaconda3/envs/vqgan/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/daniel/anaconda3/envs/vqgan/lib/python3.9/site-packages/torch/nn/modules/activation.py", line 1031, in forward
attn_output, attn_output_weights = F.multi_head_attention_forward(
File "/home/daniel/anaconda3/envs/vqgan/lib/python3.9/site-packages/torch/nn/functional.py", line 5082, in multi_head_attention_forward
attn_output, attn_output_weights = _scaled_dot_product_attention(q, k, v, attn_mask, dropout_p)
File "/home/daniel/anaconda3/envs/vqgan/lib/python3.9/site-packages/torch/nn/functional.py", line 4828, in _scaled_dot_product_attention
attn = softmax(attn, dim=-1)
File "/home/daniel/anaconda3/envs/vqgan/lib/python3.9/site-packages/torch/nn/functional.py", line 1679, in softmax
ret = input.softmax(dim)
RuntimeError: "softmax_lastdim_kernel_impl" not implemented for 'Half'
No error; generate an output image.
-cd cpu
parameterCollecting environment information...
PyTorch version: 1.9.0+cu111
Is debug build: False
CUDA used to build PyTorch: 11.1
ROCM used to build PyTorch: N/A
OS: Ubuntu 20.04.3 LTS (x86_64)
GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Clang version: 10.0.0-4ubuntu1
CMake version: version 3.16.3
Libc version: glibc-2.31
Python version: 3.9 (64-bit runtime)
Python platform: Linux-5.4.0-88-generic-x86_64-with-glibc2.31
Is CUDA available: True
CUDA runtime version: 11.4.120
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3080 Ti
Nvidia driver version: 470.57.02
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Versions of relevant libraries:
[pip3] numpy==1.21.2
[pip3] pytorch-lightning==1.4.9
[pip3] pytorch-ranger==0.1.1
[pip3] torch==1.9.0+cu111
[pip3] torch-optimizer==0.1.0
[pip3] torchaudio==0.9.0
[pip3] torchmetrics==0.5.1
[pip3] torchvision==0.10.0+cu111
[conda] numpy 1.21.2 pypi_0 pypi
[conda] pytorch-lightning 1.4.9 pypi_0 pypi
[conda] pytorch-ranger 0.1.1 pypi_0 pypi
[conda] torch 1.9.0+cu111 pypi_0 pypi
[conda] torch-optimizer 0.1.0 pypi_0 pypi
[conda] torchaudio 0.9.0 pypi_0 pypi
[conda] torchmetrics 0.5.1 pypi_0 pypi
[conda] torchvision 0.10.0+cu111 pypi_0 pypi
in line 988 "AttributeError: 'int' object has no attribute 'stdin'"
ffmpeg command failed - check your installation
0%| | 0/5 [00:00<?, ?it/s]
Traceback (most recent call last):
File "C:\Users\Caleb\VQGAN-CLIP\generate.py", line 988, in
im.save(p.stdin, 'PNG')
AttributeError: 'int' object has no attribute 'stdin'
Maybe i'm trying to make a video wrong, but the issue persists even with the provided example of the telephone box.
-o flag is working properly in the case of image generation, but there is no specific information is available on how to create video with custom name. In case of providing a file name with any extension the script result in the following error
ValueError: unknown file extension: .png'
On windows we cannot use the zoom.sh script in conda prompt. So using the command
python generate.py -p "An apple in a bowl" -zvid -i 2000 -vl 10 -o "output/test.mp4"
Are there any specific requirements to get this working? ie. Do I need an NVIDIA GPU / CUDA?
I see it in the code, but I'm not sure how to disable it?
using cuda 11.2, built torch from source
Traceback (most recent call last):
File "/home/julianallchin/github/VQGAN-CLIP/generate.py", line 548, in <module>
perceptor = clip.load(args.clip_model, jit=jit)[0].eval().requires_grad_(False).to(device)
File "/home/julianallchin/anaconda3/envs/vqgan/lib/python3.9/site-packages/torch/jit/_script.py", line 915, in fail
raise RuntimeError(name + " is not supported on ScriptModules")
RuntimeError: requires_grad_ is not supported on ScriptModules
So I'm trying to be brave and set this up on my Windows 10 machine running Conda since my Titan RTX GPU is on that box. I was able to install everything w/o any issues but when I try to run the example it bails out. Not 100% sure what the error is.
(vqgan) PS C:\Users\stiet\Desktop\Work\AIStuff\VQGAN-CLIP> python generate.py -p "A painting of an apple in a fruit bowl"
Working with z of shape (1, 256, 16, 16) = 65536 dimensions.
loaded pretrained LPIPS loss from taming/modules/autoencoder/lpips\vgg.pth
VQLPIPSWithDiscriminator running with hinge loss.
Traceback (most recent call last):
File "C:\Users\stiet\Desktop\Work\AIStuff\VQGAN-CLIP\generate.py", line 546, in <module>
model = load_vqgan_model(args.vqgan_config, args.vqgan_checkpoint).to(device)
File "C:\Users\stiet\Desktop\Work\AIStuff\VQGAN-CLIP\generate.py", line 520, in load_vqgan_model
model.init_from_ckpt(checkpoint_path)
File "C:\Users\stiet\anaconda3\envs\vqgan\lib\site-packages\taming\models\vqgan.py", line 48, in init_from_ckpt
self.load_state_dict(sd, strict=False)
File "C:\Users\stiet\anaconda3\envs\vqgan\lib\site-packages\torch\nn\modules\module.py", line 1406, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for VQModel:
size mismatch for loss.discriminator.main.8.weight: copying a param with shape torch.Size([1, 256, 4, 4]) from checkpoint, the shape in current model is torch.Size([512, 256, 4, 4]).
size mismatch for quantize.embedding.weight: copying a param with shape torch.Size([16384, 256]) from checkpoint, the shape in current model is torch.Size([1024, 256]).
(vqgan) PS C:\Users\stiet\Desktop\Work\AIStuff\VQGAN-CLIP> ls
Directory: C:\Users\stiet\Desktop\Work\AIStuff\VQGAN-CLIP
Mode LastWriteTime Length Name
---- ------------- ------ ----
d----- 9/30/2021 3:52 PM checkpoints
d----- 9/30/2021 3:23 PM CLIP
d----- 9/30/2021 3:19 PM samples
d----- 9/30/2021 3:54 PM taming
d----- 9/30/2021 3:23 PM taming-transformers
-a---- 9/30/2021 3:19 PM 190 .gitignore
-a---- 9/30/2021 3:19 PM 5277 download_models.sh
-a---- 9/30/2021 3:19 PM 42380 generate.py
-a---- 9/30/2021 3:19 PM 1095 LICENSE
-a---- 9/30/2021 3:19 PM 1592 opt_tester.sh
-a---- 9/30/2021 3:19 PM 1474 random.sh
-a---- 9/30/2021 3:19 PM 13240 README.md
-a---- 9/30/2021 3:19 PM 1187 requirements.txt
-a---- 9/30/2021 3:19 PM 1544 video_styler.sh
-a---- 9/30/2021 3:19 PM 2376 vqgan.yml
-a---- 9/30/2021 3:19 PM 1444 zoom.sh
How can i fix this?
"CUDA out of memory. Tried to allocate 16.00 MiB (GPU 0; 2.00 GiB total capacity; 1.13 GiB already allocated; 0 bytes free; 1.16 GiB reserved in total by PyTorch)"
I understand that I need to allocate more memory or change the batch parameters. But In which file should I change it? Or what command should I use? I'm newbie btw...
I don't know what happened - but had a working setup - and then was tinkering with facebook faiss - and gcc and now hit this problem.
python generate.py -p "The fashion of tomorrow"
/home/jp/Documents/gitWorkspace/VQGAN-CLIP/CLIP/clip/clip.py:23: UserWarning: PyTorch version 1.7.1 or higher is recommended
warnings.warn("PyTorch version 1.7.1 or higher is recommended")
Working with z of shape (1, 256, 16, 16) = 65536 dimensions.
loaded pretrained LPIPS loss from taming/modules/autoencoder/lpips/vgg.pth
VQLPIPSWithDiscriminator running with hinge loss.
Restored from checkpoints/vqgan_imagenet_f16_16384.ckpt
Traceback (most recent call last):
File "generate.py", line 361, in <module>
perceptor = clip.load(args.clip_model, jit=jit)[0].eval().requires_grad_(False).to(device)
File "/home/jp/miniconda3/lib/python3.8/site-packages/torch/jit/_script.py", line 919, in fail
raise RuntimeError(name + " is not supported on ScriptModules")
RuntimeError: requires_grad_ is not supported on ScriptModules
I'm on 1.10 nightly build of pytorch.
>>> print(torch.__version__)
1.10.0.dev20210715+cu111
>>> exit
Use exit() or Ctrl-D (i.e. EOF) to exit
>>> exit()
The pretrained model download links in the download script and readme is unreachable for me: http://mirror.io.community
However, the links on the taming-transformers repo can be used for downloading the models:
https://github.com/CompVis/taming-transformers#overview-of-pretrained-models
The .yaml and .ckpt then have to be renamed accordingly.
If this is a common issue, the readme and download script should be updated.
What do these lines mean and why aren't they working?
FileNotFoundError Traceback (most recent call last)
in ()
3 #@markdown Once this has been run successfully you only need to run parameters and then the program to execute with new parameters
4 device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
----> 5 model = load_vqgan_model(args.vqgan_config, args.vqgan_checkpoint).to(device)
6 perceptor = clip.load(args.clip_model, jit=False)[0].eval().requires_grad_(False).to(device)
7
/usr/local/lib/python3.7/dist-packages/omegaconf/omegaconf.py in load(file_)
181
182 if isinstance(file_, (str, pathlib.Path)):
--> 183 with io.open(os.path.abspath(file_), "r", encoding="utf-8") as f:
184 obj = yaml.load(f, Loader=get_yaml_loader())
185 elif getattr(file_, "read", None):
FileNotFoundError: [Errno 2] No such file or directory: '/content/vqgan_imagenet_f16_16384.yaml'
Hello, I followed your video (thank a lot by the way, it seems like I did not followed well actually)
Maybe you'll understand what I can do at this point :
(vqgan) C:\Users\Milaj\github\VQGAN-CLIP>python generate.py -p "A painting of an apple in a fruit bowl"
Traceback (most recent call last):
File "C:\Users\Milaj\github\VQGAN-CLIP\generate.py", line 466, in
model = load_vqgan_model(args.vqgan_config, args.vqgan_checkpoint).to(device)
File "C:\Users\Milaj\github\VQGAN-CLIP\generate.py", line 436, in load_vqgan_model
config = OmegaConf.load(config_path)
File "C:\Users\Milaj\anaconda3\envs\vqgan\lib\site-packages\omegaconf\omegaconf.py", line 183, in load
with io.open(os.path.abspath(file_), "r", encoding="utf-8") as f:
FileNotFoundError: [Errno 2] No such file or directory: 'C:\Users\Milaj\github\VQGAN-CLIP\checkpoints\vqgan_imagenet_f16_16384.yaml'
Thank you in advance, tell me if you need more infos
Excellent job converting to python from Colab. Would you consider doing the same for her other guided diffusion notebook?
RiversHaveWings Guided Diffusion
https://github.com/replicate/cog makes it easy to build Docker containers for machine learning. A cog.yaml has to be configured and the interface code written, which looks pretty straightforward. The project could probably be also be added here: https://replicate.ai/explore
Anyone who has Docker installed could then run it on there system as easy as executing something like this:
docker run -d -p 5000:5000 r8.im/nerdyrodent/VQGAN-CLIP@sha256:fe8d040a80609ff5643815e28bc3c488faf8870d968f19e045c4d0e043ffae59
curl http://localhost:5000/predict -X POST -F p="A painting of an apple in a fruit bowl"
When I use '--seed 42' on generate.py it performs as expected but when using random.py it doesn't appear to be using seed 42, or at least the print command isn't listing the same value. It doesn't make sense that it's not behaving the same. Any ideas?
Hey! Would you mind adding a requirements.txt? I'm really just looking for the version #s of the relevant repos that are used here. It should be straightforward to extract from the output of "pip freeze". Thanks in advance!
Hi. Thanks for the repo. I was just trying to test it, but I keep running into this:
traceback (most recent call last): file "/home/paperspace/vqgan-clip/generate.py", line 552, in train(i) file "/home/paperspace/vqgan-clip/generate.py", line 535, in train lossall = ascend_txt() file "/home/paperspace/vqgan-clip/generate.py", line 514, in ascend_txt iii = perceptor.encode_image(normalize(make_cutouts(out))).float() file "/home/paperspace/anaconda3/envs/vqgan/lib/python3.9/site-packages/torchvision/transforms/transforms.py", line 163, in call return f.normalize(tensor, self.mean, self.std, self.inplace) file "/home/paperspace/anaconda3/envs/vqgan/lib/python3.9/site-packages/torchvision/transforms/functional.py", line 201, in normalize raise typeerror('tensor is not a torch image.') typeerror: tensor is not a torch image.
Any idea how to fix it? Really appreciate any help.
First of all, thank you so much for this notebook. It's my favorite version of the VQGAN + CLIP notebooks out there 😊.
As noted by @nerdyrodent in a previous issue, since a couple of days ago, no matter what model you choose to download you'll get the message Could not resolve host: mirror.io.community
.
If I specify the wikiart_16384 checkpoint, the following error occurs:
Traceback (most recent call last):
File "C:\Development\ml\VQGAN-CLIP\generate.py", line 364, in <module>
model = load_vqgan_model(args.vqgan_config, args.vqgan_checkpoint).to(device)
File "C:\Development\ml\VQGAN-CLIP\generate.py", line 338, in load_vqgan_model
model.init_from_ckpt(checkpoint_path)
File "C:\Development\ml\VQGAN-CLIP\taming-transformers\taming\models\vqgan.py", line 52, in init_from_ckpt
self.load_state_dict(sd, strict=False)
File "C:\ProgramData\Miniconda3\envs\vqgan\lib\site-packages\torch\nn\modules\module.py", line 1406, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for VQModel:
size mismatch for loss.discriminator.main.8.weight: copying a param with shape torch.Size([512, 256, 4, 4]) from checkpoin
t, the shape in current model is torch.Size([1, 256, 4, 4]).
Is there a way to specify the initial model shape?
(base) PS C:\Users\Alex\vqgan-clip> python generate.py -p "A painting of an apple in a fruit bowl" Traceback (most recent call last): File "generate.py", line 546, in <module> model = load_vqgan_model(args.vqgan_config, args.vqgan_checkpoint).to(device) File "generate.py", line 516, in load_vqgan_model config = OmegaConf.load(config_path) File "C:\Users\Alex\anaconda3\lib\site-packages\omegaconf\omegaconf.py", line 184, in load obj = yaml.load(f, Loader=get_yaml_loader()) File "C:\Users\Alex\anaconda3\lib\site-packages\yaml\__init__.py", line 114, in load return loader.get_single_data() File "C:\Users\Alex\anaconda3\lib\site-packages\yaml\constructor.py", line 49, in get_single_data node = self.get_single_node() File "C:\Users\Alex\anaconda3\lib\site-packages\yaml\composer.py", line 36, in get_single_node document = self.compose_document() File "C:\Users\Alex\anaconda3\lib\site-packages\yaml\composer.py", line 58, in compose_document self.get_event() File "C:\Users\Alex\anaconda3\lib\site-packages\yaml\parser.py", line 118, in get_event self.current_event = self.state() File "C:\Users\Alex\anaconda3\lib\site-packages\yaml\parser.py", line 193, in parse_document_end token = self.peek_token() File "C:\Users\Alex\anaconda3\lib\site-packages\yaml\scanner.py", line 129, in peek_token self.fetch_more_tokens() File "C:\Users\Alex\anaconda3\lib\site-packages\yaml\scanner.py", line 223, in fetch_more_tokens return self.fetch_value() File "C:\Users\Alex\anaconda3\lib\site-packages\yaml\scanner.py", line 577, in fetch_value raise ScannerError(None, None, yaml.scanner.ScannerError: mapping values are not allowed here in "C:\Users\Alex\vqgan-clip\checkpoints\vqgan_imagenet_f16_16384.yaml", line 43, column 15 (base) PS C:\Users\Alex\vqgan-clip>
Another change I've made for myself is to break every n iterations (after checkin) and await user input. If I input Y
it reloads the image from disk and reinitialises the optimiser (the same as you do for a zoom video). This way I can "guide" it quite forcefully: if I want a skull with glowing blue eyes, and the blue eyes are not picked up from the init image (or have dissolved into nothing) by the 50th step, I can paint them in. I can also "promote" features in the output by exaggerating their presence.
Since we're reinitialising the optimiser, we can presumably also switch up the prompts 'in the middle' of the run, when loss has 'stabilised'? Depending on how far you want to take this (and I'll be doing my own experimentation) maybe we can draw up a timeline and construct a video based on prompts that change over time.
I have a cut of this code from a week or two ago.
Funnily enough I also added the option to run it on another GPU. When I do choose cuda:1, though, I get 2GB allocated on cuda:0 although that device is not specified anywhere in generate.py. Combined with disabling ECC (nvidia-smi -i 1 -e 0
) this is Fine, because I can get over 912KibiPixels (1280x720 or 1488x624), but it would be good to understand what, why and how.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.