Giter Club home page Giter Club logo

deep-daze's Introduction

Deep Daze

mist over green hills

shattered plates on the grass

cosmic love and attention

a time traveler in the crowd

life during the plague

meditative peace in a sunlit forest

a man painting a completely red image

a psychedelic experience on LSD

What is this?

Simple command line tool for text to image generation using OpenAI's CLIP and Siren. Credit goes to Ryan Murdock for the discovery of this technique (and for coming up with the great name)!

Original notebook Open In Colab

New simplified notebook Open In Colab

This will require that you have an Nvidia GPU or AMD GPU

  • Recommended: 16GB VRAM
  • Minimum Requirements: 4GB VRAM (Using VERY LOW settings, see usage instructions below)

Install

$ pip install deep-daze

Windows Install

Presuming Python is installed:

  • Open command prompt and navigate to the directory of your current version of Python
  pip install deep-daze

Examples

$ imagine "a house in the forest"

For Windows:

  • Open command prompt as administrator
  imagine "a house in the forest"

That's it.

If you have enough memory, you can get better quality by adding a --deeper flag

$ imagine "shattered plates on the ground" --deeper

Advanced

In true deep learning fashion, more layers will yield better results. Default is at 16, but can be increased to 32 depending on your resources.

$ imagine "stranger in strange lands" --num-layers 32

Usage

CLI

NAME
    imagine

SYNOPSIS
    imagine TEXT <flags>

POSITIONAL ARGUMENTS
    TEXT
        (required) A phrase less than 77 tokens which you would like to visualize.

FLAGS
    --img=IMAGE_PATH
        Default: None
        Path to png/jpg image or PIL image to optimize on
    --encoding=ENCODING
        Default: None
        User-created custom CLIP encoding. If used, replaces any text or image that was used.
    --create_story=CREATE_STORY
        Default: False
        Creates a story by optimizing each epoch on a new sliding-window of the input words. If this is enabled, much longer texts than 77 tokens can be used. Requires save_progress to visualize the transitions of the story.
    --story_start_words=STORY_START_WORDS
        Default: 5
        Only used if create_story is True. How many words to optimize on for the first epoch.
    --story_words_per_epoch=STORY_WORDS_PER_EPOCH
        Default: 5
        Only used if create_story is True. How many words to add to the optimization goal per epoch after the first one.
    --story_separator:
        Default: None
        Only used if create_story is True. Defines a separator like '.' that splits the text into groups for each epoch. Separator needs to be in the text otherwise it will be ignored
    --lower_bound_cutout=LOWER_BOUND_CUTOUT
        Default: 0.1
        Lower bound of the sampling of the size of the random cut-out of the SIREN image per batch. Should be smaller than 0.8.
    --upper_bound_cutout=UPPER_BOUND_CUTOUT
        Default: 1.0
        Upper bound of the sampling of the size of the random cut-out of the SIREN image per batch. Should probably stay at 1.0.
    --saturate_bound=SATURATE_BOUND
        Default: False
        If True, the LOWER_BOUND_CUTOUT is linearly increased to 0.75 during training.
    --learning_rate=LEARNING_RATE
        Default: 1e-05
        The learning rate of the neural net.
    --num_layers=NUM_LAYERS
        Default: 16
        The number of hidden layers to use in the Siren neural net.
    --batch_size=BATCH_SIZE
        Default: 4
        The number of generated images to pass into Siren before calculating loss. Decreasing this can lower memory and accuracy.
    --gradient_accumulate_every=GRADIENT_ACCUMULATE_EVERY
        Default: 4
        Calculate a weighted loss of n samples for each iteration. Increasing this can help increase accuracy with lower batch sizes.
    --epochs=EPOCHS
        Default: 20
        The number of epochs to run.
    --iterations=ITERATIONS
        Default: 1050
        The number of times to calculate and backpropagate loss in a given epoch.
    --save_every=SAVE_EVERY
        Default: 100
        Generate an image every time iterations is a multiple of this number.
    --image_width=IMAGE_WIDTH
        Default: 512
        The desired resolution of the image.
    --deeper=DEEPER
        Default: False
        Uses a Siren neural net with 32 hidden layers.
    --overwrite=OVERWRITE
        Default: False
        Whether or not to overwrite existing generated images of the same name.
    --save_progress=SAVE_PROGRESS
        Default: False
        Whether or not to save images generated before training Siren is complete.
    --seed=SEED
        Type: Optional[]
        Default: None
        A seed to be used for deterministic runs.
    --open_folder=OPEN_FOLDER
        Default: True
        Whether or not to open a folder showing your generated images.
    --save_date_time=SAVE_DATE_TIME
        Default: False
        Save files with a timestamp prepended e.g. `%y%m%d-%H%M%S-my_phrase_here`
    --start_image_path=START_IMAGE_PATH
        Default: None
        The generator is trained first on a starting image before steered towards the textual input
    --start_image_train_iters=START_IMAGE_TRAIN_ITERS
        Default: 50
        The number of steps for the initial training on the starting image
    --theta_initial=THETA_INITIAL
        Default: 30.0
        Hyperparameter describing the frequency of the color space. Only applies to the first layer of the network.
    --theta_hidden=THETA_INITIAL
        Default: 30.0
        Hyperparameter describing the frequency of the color space. Only applies to the hidden layers of the network.
    --save_gif=SAVE_GIF
        Default: False
        Whether or not to save a GIF animation of the generation procedure. Only works if save_progress is set to True.

Priming

Technique first devised and shared by Mario Klingemann, it allows you to prime the generator network with a starting image, before being steered towards the text.

Simply specify the path to the image you wish to use, and optionally the number of initial training steps.

$ imagine 'a clear night sky filled with stars' --start_image_path ./cloudy-night-sky.jpg

Primed starting image

Then trained with the prompt A pizza with green pepper.

Optimize for the interpretation of an image

We can also feed in an image as an optimization goal, instead of only priming the generator network. Deepdaze will then render its own interpretation of that image:

$ imagine --img samples/Autumn_1875_Frederic_Edwin_Church.jpg

Original image:

The network's interpretation:

Original image:

The network's interpretation:

Optimize for text and image combined

$ imagine "A psychedelic experience." --img samples/hot-dog.jpg

The network's interpretation:

New: Create a story

The regular mode for texts only allows 77 tokens. If you want to visualize a full story/paragraph/song/poem, set create_story to True.

Given the poem “Stopping by Woods On a Snowy Evening” by Robert Frost - "Whose woods these are I think I know. His house is in the village though; He will not see me stopping here To watch his woods fill up with snow. My little horse must think it queer To stop without a farmhouse near Between the woods and frozen lake The darkest evening of the year. He gives his harness bells a shake To ask if there is some mistake. The only other sound’s the sweep Of easy wind and downy flake. The woods are lovely, dark and deep, But I have promises to keep, And miles to go before I sleep, And miles to go before I sleep.".

We get:

Whose_woods_these_are_I_think_I_know._His_house_is_in_the_village_though._He_.mp4

Python

Invoke deep_daze.Imagine in Python

from deep_daze import Imagine

imagine = Imagine(
    text = 'cosmic love and attention',
    num_layers = 24,
)
imagine()

Save progress every fourth iteration

Save images in the format insert_text_here.00001.png, insert_text_here.00002.png, ...up to (total_iterations % save_every)

imagine = Imagine(
    text=text,
    save_every=4,
    save_progress=True
)

Prepend current timestamp on each image.

Creates files with both the timestamp and the sequence number.

e.g. 210129-043928_328751_insert_text_here.00001.png, 210129-043928_512351_insert_text_here.00002.png, ...

imagine = Imagine(
    text=text,
    save_every=4,
    save_progress=True,
    save_date_time=True,
)

High GPU memory usage

If you have at least 16 GiB of vram available, you should be able to run these settings with some wiggle room.

imagine = Imagine(
    text=text,
    num_layers=42,
    batch_size=64,
    gradient_accumulate_every=1,
)

Average GPU memory usage

imagine = Imagine(
    text=text,
    num_layers=24,
    batch_size=16,
    gradient_accumulate_every=2
)

Very low GPU memory usage (less than 4 GiB)

If you are desperate to run this on a card with less than 8 GiB vram, you can lower the image_width.

imagine = Imagine(
    text=text,
    image_width=256,
    num_layers=16,
    batch_size=1,
    gradient_accumulate_every=16 # Increase gradient_accumulate_every to correct for loss in low batch sizes
)

VRAM and speed benchmarks:

These experiments were conducted with a 2060 Super RTX and a 3700X Ryzen 5. We first mention the parameters (bs = batch size), then the memory usage and in some cases the training iterations per second:

For an image resolution of 512:

  • bs 1, num_layers 22: 7.96 GB
  • bs 2, num_layers 20: 7.5 GB
  • bs 16, num_layers 16: 6.5 GB

For an image resolution of 256:

  • bs 8, num_layers 48: 5.3 GB
  • bs 16, num_layers 48: 5.46 GB - 2.0 it/s
  • bs 32, num_layers 48: 5.92 GB - 1.67 it/s
  • bs 8, num_layers 44: 5 GB - 2.39 it/s
  • bs 32, num_layers 44, grad_acc 1: 5.62 GB - 4.83 it/s
  • bs 96, num_layers 44, grad_acc 1: 7.51 GB - 2.77 it/s
  • bs 32, num_layers 66, grad_acc 1: 7.09 GB - 3.7 it/s

@NotNANtoN recommends a batch size of 32 with 44 layers and training 1-8 epochs.

Where is this going?

This is just a teaser. We will be able to generate images, sound, anything at will, with natural language. The holodeck is about to become real in our lifetimes.

Please join replication efforts for DALL-E for Pytorch or Mesh Tensorflow if you are interested in furthering this technology.

Alternatives

Big Sleep - CLIP and the generator from Big GAN

Citations

@misc{unpublished2021clip,
    title  = {CLIP: Connecting Text and Images},
    author = {Alec Radford, Ilya Sutskever, Jong Wook Kim, Gretchen Krueger, Sandhini Agarwal},
    year   = {2021}
}
@misc{sitzmann2020implicit,
    title   = {Implicit Neural Representations with Periodic Activation Functions},
    author  = {Vincent Sitzmann and Julien N. P. Martel and Alexander W. Bergman and David B. Lindell and Gordon Wetzstein},
    year    = {2020},
    eprint  = {2006.09661},
    archivePrefix = {arXiv},
    primaryClass = {cs.CV}
}

deep-daze's People

Contributors

akx avatar dginev avatar donno2048 avatar fliens avatar incog5 avatar lorcalhost avatar lucidrains avatar notnanton avatar raymondterry avatar rexlow avatar russelldc avatar urasakikeisuke avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

deep-daze's Issues

Error #13

imagine "a crow with a crown"

OMP: Error #13: Assertion failure at kmp_csupport.cpp(597).
OMP: Hint Please submit a bug report with this message, compile and run commands used, and machine configuration info including native compiler and operating system versions. Faster response will be obtained by including all program sources. For information on submitting this issue, please see http://www.intel.com/software/products/support/.
[1] 25780 abort imagine "a crow with a crown"
SIGABORT 6

Python 3.7.3
MacOS BigSur 11.1

Can't run

I have installed the pip and installed CUDA but when I run this happens, any idea on what's the problem. I'm new to coding so I have no idea.

(base) PS C:\Users\jihad> imagine "a house in the forest"
Traceback (most recent call last):
File "c:\users\jihad\anaconda3\lib\runpy.py", line 194, in _run_module_as_main
return run_code(code, main_globals, None,
File "c:\users\jihad\anaconda3\lib\runpy.py", line 87, in run_code
exec(code, run_globals)
File "C:\Users\jihad\anaconda3\Scripts\imagine.exe_main
.py", line 4, in
File "c:\users\jihad\anaconda3\lib\site-packages\deep_daze_init
.py", line 1, in
from deep_daze.deep_daze import DeepDaze, Imagine
File "c:\users\jihad\anaconda3\lib\site-packages\deep_daze\deep_daze.py", line 25, in
assert torch.cuda.is_available(), 'CUDA must be available in order to use Deep Daze'
AssertionError: CUDA must be available in order to use Deep Daze

CUDA runs out of memory

Has anyone run into a running out of GPU memory issue when running the imagine command? Below is the error I get.

RuntimeError: CUDA out of memory. Tried to allocate 2.00 MiB (GPU 0; 6.00 GiB total capacity; 4.47 GiB already allocated; 716.80 KiB free; 4.48 GiB reserved in total by PyTorch)

I tried to use both gc.collect() and torch.cuda.empty_cache() but neither worked.

[Discussion] New augmentations

Hey,

since we had the long discussion on the random cutout size sampling in #61 I played around a bit more with it. My issue with the current state of affairs is that for e.g. "A llama wearing a scarf and glasses, sitting in a cozy cafe." there will be many llamas appearing and also the general texture of the llama fur will be placed everywhere. That makes sense: we optimize the SIREN network to maximize the CLIP similarity at randomly sampled squares of sizes down to 10% of the original size. As it is not possible to train without these augmentations (it would only generate a weird-looking adversarial example), I thought of using other augmentations.

I tried adding Gaussian noise to the image - it is better than not adding any augmentations, but not good.

Then I tried sampling around a normal distribution with mean 0.6 and std of 0.2 instead of a uniform distribution between 0.1 and 1.0. The images look quite different - the background looks less interesting as the low-level textures get less weight. But I can't say they definitely look "better".

I tested the prompts: Depression, Consciousness, Schizophrenia, "A psychedelic experience on LSD", Demon, and "A llama wearing a scarf and glasses, reading a book in a cozy cafe.".

The results for the Gaussian:

depression_gauss_fixed
Consciousness_gauss_fixed
Schizophrenia_gauss_fixed
lsd_gauss_fixed
demon_gauss_fixed
lama_gauss_fixed

Lastly, I tried a simple change. Instead of averaging the loss all random cutouts, I averaged the features of all cutouts to calculate a single loss. Here's where it gets interesting. The generated images look quite different now, often depicting some clear scenes of locations. What is strange, is there are some symbols that appear repeatedly over different prompts - a red and green blob right next to each other and also some kind of company logo appears quite regularly.

These are the results for the current version:
depression_uniform
consciousness_uniform
schizophrenia_uniform
lsd_uniform
demon_uniform
lama_uniform

And these are the results for averaged features:
depression_averaged
consciousness_averaged
schizophrenia_averaged
lsd_averaged
demon_averaged
lama_averaged

I also merged the Gaussian sampling with the feature averaging:
depression_gauss_fixed_averaged
consciousness_gauss_fixed_averaged
schizophrenia_gauss_fixed_averaged
lsd_gauss_fixed_averaged
demon_gauss_fixed_averaged
lama_gauss_fixed_averaged

Let me know what you think or if you have any other ideas! If you can tell me a nice way to put images side-by-side here I can format it a bit better to make it easier to visually compare the results.

I was thinking of potentially including the feature-averaging approach - but I would experiment with averaging the features only of certain sizes. Furthermore, I need to experiment with the feature averaging when choosing different lower_bound_cutout values.

SIREN representation black or mostly black

In @vsitzmann's SIREN notebook
https://colab.research.google.com/github/vsitzmann/siren/blob/master/explore_siren.ipynb#scrollTo=PmtazNPbfZzY
This code:

total_steps = 500 # Since the whole image is our dataset, this just means 500 gradient descent steps.
steps_til_summary = 10

optim = torch.optim.Adam(lr=1e-4, params=img_siren.parameters())

model_input, ground_truth = next(iter(dataloader))
model_input, ground_truth = model_input.cuda(), ground_truth.cuda()

for step in range(total_steps):
    model_output, coords = img_siren(model_input)    
    loss = ((model_output - ground_truth)**2).mean()
    
    if not step % steps_til_summary:
        print("Step %d, Total loss %0.6f" % (step, loss))
        img_grad = gradient(model_output, coords)
        img_laplacian = laplace(model_output, coords)

        fig, axes = plt.subplots(1,3, figsize=(18,6))
        axes[0].imshow(model_output.cpu().view(256,256).detach().numpy())
        axes[1].imshow(img_grad.norm(dim=-1).cpu().view(256,256).detach().numpy())
        axes[2].imshow(img_laplacian.cpu().view(256,256).detach().numpy())
        plt.show()

Outputs three variants of the SIREN representation of the image:
The entire representation, The normalized representation:, The laplacian representation
first

I'm not sure how much you've tested the --start_image feature, but I'm not really getting the same results as this. In particular, SIREN seems only capable of attaching to "highlights" in the image, and completely blackens shadows:

https://user-images.githubusercontent.com/3994972/106562573-17410d80-64f0-11eb-8c5f-0019d952efff.png

I'm not really sure what the difference is, but I'd really love to be able to get my initial SIREN representation up to the level of detail presented in that notebook. Let me know if you have any ideas.

Memory leak

It seems that the warmup step adds quite a bit of VRAM usage, although I'm not exactly sure why.

This line in particular:

self.model(self.encoded_text, dry_run = True) # do one warmup step due to potential issue with CLIP and CUDA

seems to balloon my vram usage from 10 GiB on a V100 with 28 layers to at least 16 GiB, often resulting in a crash. I used to be able to run deep-daze with 32 hidden layers no problem in colab.

Any idea as to why this adds so much?

Exposing SIREN's theta_initial and theta parameters.

Update: The initial image generated for delta 1,2,3,4,5,...41
https://user-images.githubusercontent.com/3994972/106395379-9f77c380-63c7-11eb-96ba-ccda3da22334.mp4

As dicussed in another issue, deep-daze is sometimes overly sensitive to its initial input. In searching for a lead on how to resolve this issue, I decided to expose the SIREN hyperparameter denoted theta (I think) in the original notebook. So far I've found some interesting results, but I'm still working out what of it is useful. It's probably just best if I show you some examples of different theta on the same text input. For reference, I'm using @dginev seed and text, as he mentioned not being able to get a "cheery" output out of the rather cheery phrase "seed of hope". You can find that discussion here: #9 (comment)

--seed=872073
text: "seed of hope"
iterations: 1000
epochs: 1
image_width: 256
num_layers: 32
batch_size: 64
gradient_accumulate_every: 1

This is the code I'm using to run through different theta parameters for the same input. Careful, this code will run 180 full 1000 iteration runs.

#!/bin/bash

for ((theta=1;theta<180;theta++)); do
  echo "processing theta: " + $theta;
  imagine "seed of hope"\
     --seed=872072 \
     --num_layers=32 \
     --image_width=256 --save_progress=True --save_every=10 --epochs=1 \
     --batch_size=32 --gradient_accumulate_every=1 \
     --iterations=1000
     --overwrite=True \
     --theta_initial=$theta \
     --theta_hidden=$theta \
     --save_date_time=True;
  wait; # Important, otherwise the loop will continue before finishing and you'll run out of memory.
done

Here's the code if you'd like to try yourself. @lucidrains Would love your input on this. Not sure it's ready to be merged quite yet but let me know if you're interested in that. https://github.com/afiaka87/deep-daze/tree/theta_output_dir_params

Early results:

This is what I was able to run on my RTX 2070 this morning. Each image represents a change in the theta value by 1. The final image is theta=56. As you can see, it definitely gives different results. But I'm still not sure what to make of it.

1_seed_of_hope
2_seed_of_hope
3_seed_of_hope
4_seed_of_hope
5_seed_of_hope
6_seed_of_hope
7_seed_of_hope
8_seed_of_hope
9_seed_of_hope
10_seed_of_hope
11_seed_of_hope
12_seed_of_hope
13_seed_of_hope
14_seed_of_hope
15_seed_of_hope
16_seed_of_hope
17_seed_of_hope
18_seed_of_hope
19_seed_of_hope
20_seed_of_hope
21_seed_of_hope
22_seed_of_hope
23_seed_of_hope
24_seed_of_hope
25_seed_of_hope
26_seed_of_hope
27_seed_of_hope
28_seed_of_hope
29_seed_of_hope
30_seed_of_hope
31_seed_of_hope
32_seed_of_hope
33_seed_of_hope
34_seed_of_hope
35_seed_of_hope
36_seed_of_hope
37_seed_of_hope
38_seed_of_hope
39_seed_of_hope
40_seed_of_hope
41_seed_of_hope
42_seed_of_hope
43_seed_of_hope
44_seed_of_hope
45_seed_of_hope
47_seed_of_hope
48_seed_of_hope
49_seed_of_hope
51_seed_of_hope
54_seed_of_hope
55_seed_of_hope
56_seed_of_hope

[Suggestion] Begin with encoded image / implicit neural representation of user image

From what I can tell, SIREN should be very capable of encoding a supplied bitmap image to an implicit neural representation. I haven't figured out how to do it myself yet, but the ability to begin a session of deep-dazing with a specific image, to some level of completion with encoding to INR), should be very helpful with guiding the image generation or perhaps even image modification. Or old Deep Dream style hallucinations.

[Rambling]
One of the first things I tried to do with the original notebook was make an emote. Well, it didn't work. It made a hazy half-remembered dream image of a screen with non-descript emotes on it. Then I realized if I stopped the training, and didn't generate a network, I could swap out the CLIP prompt and steer the ship so to speak. From there it was trying to get it to generate a yellow circle, orb, or ball, and that wasn't happening.

But what if it could begin with an image of a yellow circle? Or a yellow circle with eyes and a mouth? Would it manage to make an emote out of it when prompted "visceral nightmare emoji"? Or would it cover it the yellow circle with strange shapes that have little to do with the supplied image or structure? I don't actually know. But at the very least it may end up with an aesthetic like the old Deep Dream putting eyes and spider legs on everything.

Or perhaps something to force the generation to follow certain shapes by warping the initial -1 to 1 2D grid / mgrid that was in the old notebook.

Deep Daze vs Big Sleep?

Just checking but Deep Daze and Big Sleep have the same goal except for the use of the image generation model right?

Have you been able to determine whether SIREN or BigGAN is better for generating better / higher quality images?

Colab Notebook ignores ITERATIONS form input by default

from tqdm import trange
from IPython.display import Image, display

from deep_daze import Imagine

TEXT = 'an apple next to a fireplace' #@param {type:"string"}
NUM_LAYERS = 32 #@param {type:"number"}
SAVE_EVERY =  20#@param {type:"number"}
IMAGE_WIDTH = 512 #@param {type:"number"}
SAVE_PROGRESS = False #@param {type:"boolean"}
LEARNING_RATE = 1e-5 #@param {type:"number"}
ITERATIONS = 1050 #@param {type:"number"}

model = Imagine(
    text = TEXT,
    num_layers = NUM_LAYERS,
    save_every = SAVE_EVERY,
    image_width = IMAGE_WIDTH,
    lr = LEARNING_RATE,
    iterations = ITERATIONS,
    save_progress = SAVE_PROGRESS
)

for epoch in trange(20, desc = 'epochs'):
    for i in trange(1000, desc = 'iteration'): # Should respect `ITERATIONS` here instead of 1000
        model.train_step(epoch, i)

should be

for epoch in trange(20, desc = 'epochs'):
    for i in trange(ITERATIONS, desc = 'iteration'):

Size schedule

Hi!

I have some issues understanding the size scheduling. I care about it because in one of my projects I try to create an audio-visual mirror. There I need to continuously train on clip encodings of new images that are delivered via the webcam. Therefore I need to understand the size scheduling (I think it's the only kind of scheduling that happens) in order to modify it for my needs - I guess I'd need to remove it, generalize it, or make it dependent on the amount of change in the clip encoding between images.

I have some issues/questions regarding the scheduling:

  1. The scheduling is not dependent on the number of total batches. It only generates the schedule up to the total number of batches that are required. Is it simply not implemented yet? It seems to me that the thresholds (500, 1000 etc) and possibly the pieces_per_group would need to be modified based on the total number of batches.
  2. The scheduling partitions seem to change their ordering over time from descending to ascending in sizes: first partition is [4, 5, 3, 2, 1, 1], while last partition is [1, 1, 1, 2, 4, 7]. Yet, in line 215 the sampled sizes are sorted. That does not make any sense.
  3. Could someone explain the point of the scheduling?

I linked the relevant lines below:

def generate_size_schedule(self):
batches = 0
counter = 0
self.scheduled_sizes = []
while batches <= self.total_batches:
counter += 1
sizes = self.sample_sizes(counter)
batches += len(sizes)
self.scheduled_sizes.extend(sizes)
def sample_sizes(self, counter):
pieces_per_group = 4
# 6 piece schedule increasing in context as model saturates
if counter < 500:
partition = [4, 5, 3, 2, 1, 1]
elif counter < 1000:
partition = [2, 5, 4, 2, 2, 1]
elif counter < 1500:
partition = [1, 4, 5, 3, 2, 1]
elif counter < 2000:
partition = [1, 3, 4, 4, 2, 2]
elif counter < 2500:
partition = [1, 2, 2, 4, 4, 3]
elif counter < 3000:
partition = [1, 1, 2, 3, 4, 5]
else:
partition = [1, 1, 1, 2, 4, 7]
dbase = .38
step = .1
width = self.image_width
sizes = []
for part_index in range(len(partition)):
groups = partition[part_index]
for _ in range(groups * pieces_per_group):
sizes.append(torch.randint(
int((dbase + step * part_index + .01) * width),
int((dbase + step * (1 + part_index)) * width), ()))
sizes.sort()
return sizes

Colab Display Doesn't Work

It seems the "display" feature isn't working in COLAB and is generating URL's expecting a base64 format, but with only the filename appended.

For instance: "data:image/png;base64,./8bitvideogame.png"

"Priming" Learning rate 3e-4 not working for layers greater than 16

We discussed this elsewhere, but just to be rigorous -

As it stands, I think priming only works on about 16-20 layers. Otherwise, the loss gets stuck in the 0.08 range. I found it's able to escape this 0.08 value by lowering the learning rate.

Now what would really be nice is if we found good rates for certain layer counts. In the meantime, I just made it tweakable from the Imagine interface and the CLI. Here's the code -

#38

Method 'forward' is not defined

I installed the module via

$ pip install deep-daze

and just tried the provided example with

$ imagine "a house in the forest"

but after it loaded something for a few minutes (the first time I run the command) it throws this error

Traceback (most recent call last):
  File "/home/luca/anaconda3/bin/imagine", line 5, in <module>
    from deep_daze.cli import main
  File "/home/luca/anaconda3/lib/python3.7/site-packages/deep_daze/__init__.py", line 1, in <module>
    from deep_daze.deep_daze import DeepDaze, Imagine
  File "/home/luca/anaconda3/lib/python3.7/site-packages/deep_daze/deep_daze.py", line 39, in <module>
    perceptor, normalize_image = load()
  File "/home/luca/anaconda3/lib/python3.7/site-packages/deep_daze/clip.py", line 192, in load
    model.apply(patch_device)
  File "/home/luca/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 473, in apply
    module.apply(fn)
  File "/home/luca/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 473, in apply
    module.apply(fn)
  File "/home/luca/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 473, in apply
    module.apply(fn)
  [Previous line repeated 3 more times]
  File "/home/luca/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 474, in apply
    fn(self)
  File "/home/luca/anaconda3/lib/python3.7/site-packages/deep_daze/clip.py", line 183, in patch_device
    graphs = [module.graph] if hasattr(module, "graph") else []
  File "/home/luca/anaconda3/lib/python3.7/site-packages/torch/jit/_script.py", line 449, in graph
    return self._c._get_method("forward").graph
RuntimeError: Method 'forward' is not defined.

My system is:

Ubuntu 18.04.4 LTS
GeForce RTX 2070
pytorch 1.7.1
python version 3.7.1

Dealing With Punctuation

Filenames need to be sanitized of any punctuation before being saved. Any '.' characters, for instance, will be rendered unreadable by IPython.display, although they do manage to save on Linux. I don't think this will be the case on Windows though.

Anyway, I'd like to use punctuation as apparently it is considered by CLIP, but the filenames should be cleaned of it.

import string, re
def underscorify(value):
  no_punctuation = str(value.translate(str.maketrans('', '', string.punctuation)))
  spaces_to_one_underline = re.sub(r'[-\s]+', '_', no_punctuation).strip('-_') # strip gets rid of leading or trailing underscores
  return spaces_to_one_underline

This is a variation on the method django uses to create its stubs from text phrases. Strips all punctuation and converts any amount of spaces to a single underscore.

This will overwrite phrases' files using the same ordering of words but different punctuation. Would need to add in collision detection and append a 'num_collision' count to the file upon collision.

Control the starting seed?

I wanted to make a PR, but I couldn't figure out where would be the best place to intervene in the cascade of initialization calls upto Siren itself.

Compared to the original notebook, which randomizes its initial seed on each run, the deep-daze approach currently always seems to start with a very noir seed:

and amazingly CLIP is quite happy to navigate inside it and keep generating in that style (even when I would prefer a cheery light generation instead). Here are two examples of the types of generation one ends up with:

Ideally one would like to specify some hex color as "dominant" for the seed and expose that as a command line option, but I could do that myself if I found out where to look. Could you ( @lucidrains ) point me to the right general area where one would set a custom init seed for Siren?

References

Since it is not super obvious what this repository is about, and because Twitter timelines move fast, here are a few links for others:

Colab - pip install --upgrade might be best

!pip install --upgrade deep-daze

I think this is more appropriate considering this is an instance of Colab usage in which people are explicitly restarting the kernel to use cached versions (and since you are apparently quite active still with your updates).

For me, specifically, I didn't know why the save_progress kwarg wasn't available until I realized I needed to update.

There may be downfalls to this approach though as well. Benefits to pinned versions n' all that. Just a suggestion.

create_story feature generates washed out images

I'm seeing lots of commented out normalizations. Any reason in particular those need to be gone? I noticed this feature wasn't exactly advertised anywhere (and still isn't accessible from the cli). Does that mean it's not complete?

create_story not working

Has anyone been able to use create_story with the colab notebooks? If so, how'd you do it? I can't get it to train past the first phrase

Files overwritten after each epoch

Bug: Images saved with --save_every have the incorrect sequence number and start to overwrite themselves after each epoch.

This one's my bad again. I re-wrote the code for this and made the mistake of using the iteration instead of self.iterations * epoch + iteration // self.save_every for the filename sequence number.

Here's a PR that should fix it.

#34

(Suggestion) Include more useful parameters as form inputs in colab

I've included these form inputs in my personal copy of your Colab notebook. In particular, reducing the image_width parameter allows for one to vastly increase the number of hidden_layers. By going to an image_width of 256 (instead of the default 512) I was able to run 32 hidden layers without problems on a T4.

from tqdm import trange
from IPython.display import Image, display

from deep_daze import Imagine

TEXT = 'blue marshmallow' #@param {type:"string"}
NUM_LAYERS = 16 #@param {type:"number"}
SAVE_EVERY =  20#@param {type:"number"}
IMAGE_WIDTH = 512 #@param {type:"number"}
SAVE_PROGRESS = False #@param {type:"boolean"}
LEARNING_RATE = 1e-5 #@param {type:"number"}
ITERATIONS = 1050 #@param {type:"number"}

model = Imagine(
    text = TEXT,
    num_layers = NUM_LAYERS,
    save_every = SAVE_EVERY,
    image_width = IMAGE_WIDTH,
    lr = LEARNING_RATE,
    iterations = ITERATIONS,
    save_progress = SAVE_PROGRESS
)

Feel free to include them in your copy if you'd like to.

Not sure if this is an easy fix or not

I've left my PC a few times but left it to keep rendering, but it stops sometimes, and only resumes when i press enter. Is there a way to circumvent this and have it run all the time?

Files are being overwritten after each epoch

The code for determining the number to append to the filename only considered the current iteration of the current epoch. Because of this, once a new epoch started running files will start overwriting themselves. I fixed this creating the number with total_iterations * current_epoch + current_iteration // save_every (pseudocode).

Here's a PR with a fix. Thanks.
#18

Need usage info for --theta_initial, --theta_hidden parameters.

Hey it looks like there's not any usage info for the new parameters. Here's what needs to be added to the README.md. Similar changes should be made to the docstring in cli.py so that --help presents useful descriptions.

FLAGS
...
    --theta_initial=THETA_INITIAL
        Default: 30.0
        Hyperparameter describing the frequency of the color space. Only applies to the first layer of the network.
    --theta_hidden=THETA_INITIAL
        Default: 30.0
        Hyperparameter describing the frequency of the color space. Only applies to the hidden layers of the network.

"RuntimeError: Method 'forward' is not defined."

I've tried to run the imagine command, but this is what I get every time I run the command.

(venv) C:\WINDOWS\system32>imagine "alone in the dark"
Traceback (most recent call last):
File "c:\program files\python38\lib\runpy.py", line 192, in _run_module_as_main
return run_code(code, main_globals, None,
File "c:\program files\python38\lib\runpy.py", line 85, in run_code
exec(code, run_globals)
File "C:\Program Files\Python38\Scripts\imagine.exe_main
.py", line 4, in
File "c:\program files\python38\lib\site-packages\deep_daze_init
.py", line 1, in
from deep_daze.deep_daze import DeepDaze, Imagine
File "c:\program files\python38\lib\site-packages\deep_daze\deep_daze.py", line 39, in
perceptor, normalize_image = load()
File "c:\program files\python38\lib\site-packages\deep_daze\clip.py", line 192, in load
model.apply(patch_device)
File "c:\program files\python38\lib\site-packages\torch\nn\modules\module.py", line 473, in apply
module.apply(fn)
File "c:\program files\python38\lib\site-packages\torch\nn\modules\module.py", line 473, in apply
module.apply(fn)
File "c:\program files\python38\lib\site-packages\torch\nn\modules\module.py", line 473, in apply
module.apply(fn)
[Previous line repeated 3 more times]
File "c:\program files\python38\lib\site-packages\torch\nn\modules\module.py", line 474, in apply
fn(self)
File "c:\program files\python38\lib\site-packages\deep_daze\clip.py", line 183, in patch_device
graphs = [module.graph] if hasattr(module, "graph") else []
File "c:\program files\python38\lib\site-packages\torch\jit_script.py", line 449, in graph
return self._c._get_method("forward").graph
RuntimeError: Method 'forward' is not defined.

I'm new to all of this so it's kind of confusing. Is there any fix for this RuntimeError: Method 'forward' is not defined. ?

p(image | text)

Hello,

is it possible to get the log propability of a generated image given text using this model?

Doesn't save any images to disk

The current version (pip) doesn't save any images, no matter which save_every value.

Tested locally (Win 10) and on the simplified Colab notebook.

[Discussion] Use image instead of or additionally to text features

Hey, thanks a lot for this repo! I've been playing around with this a lot and by reducing the img size to 256 pixels I can generate some amazing images using 8GB of VRAM.

I was thinking of a project in which I combine image and text features into a single feature vector to then generate an image using the SIREN network representing both at the same time. For this, we would need to extract the features from a given img instead of a text.

In general, there is quite some inefficiency in the current code, as the text encoding is recalculated during each DeepDaze train_step, even though the text does not change.

I would recommend that DeepDaze (or the Imagine class) can take a CLIP feature vector as an input. This feature vector can simply be saved and used in the SIREN loss calculation. Using some kind of set_feature_vector this vector could be overridden. Furthermore, there should of course be a backwards-compatible mode that simply takes a text as an input and saves the corresponding feature vector, or to now also add the option to add an image as an input.

If there is interesting from @lucidrains or other people I could submit a pull request for this once I'm done implementing it.
If, on the other hand, you have tested using image CLIP features instead of text features to generate SIREN images, please let me know!

[Suggestion and a bit of code] Forcing symmetry (mgrid adjustments)

In the original (CLIP & gradient ascent) notebook, there was a function named get_mgrid what set up a 2D tensor from -1 to 1 with the desired image size as the steps. I was able to modify this to enforce various layouts, such as horizontal symmetry. Admittedly, not super clear on how to do what I want with it since used to thinking about things like this as vectors in GLSL where you can just texcoord.x = abs((texcoord.x * 2.0) - 1.0); and call it a day.

For reference, my adjustment to make the horizontally mirrored canvas was to add this after mgrid = mgrid.reshape(-1,dim):
mgrid2 = torch.sub(torch.abs(torch.mul(mgrid, 2.0)), 1) to change -1, 0, 1 to -2, 0, 2, then to 2, 0, 2, then to 1, -1, 1.
mgrid = torch.add(torch.mul(mgrid, torch.tensor([1.0, 0.0])), torch.mul(mgrid2, torch.tensor([0.0, 1.0]))) in an attempt to mask out the X and Y coordinates and insert the mirrored one into the x coordinate, though looking at this it places it in the second slot so I have no idea why that even worked if X comes before Y. I have very little knowledge on how these things are stored / accessed / laid out.

An option for different layouts like this would be useful if not simply entertaining for aesthetic purposes. Horizontal symmetry, vertical symmetry, radial symmetry (couldn't figure out how to make it work, kept ending up as straight lines in X or Y direction when I tried to convert X,Y to Angle, Distance), polar layout, cylindrical layout, arc layout n-star symmetry, etc.

Bug - Fire can't inspect the new function I added?

[email protected]:~$ imagine "rectangle" --save_every=4 --save_progress=True
Starting up...
Imagined image already exists, do you want to overwrite? (y/n) y
Imagining "rectangle" from the depths of my weights...
iteration:   0%|                                                                                                                                                                                                                                                              | 0/1050 [00:10<?, ?it/s]
epochs:   0%|                                                                                                                                                                                                                                                                   | 0/20 [00:10<?, ?it/s]
Traceback (most recent call last):
  File "/opt/conda/bin/imagine", line 8, in <module>
    sys.exit(main())
  File "/opt/conda/lib/python3.7/site-packages/deep_daze/cli.py", line 75, in main
    fire.Fire(train)
  File "/opt/conda/lib/python3.7/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/opt/conda/lib/python3.7/site-packages/fire/core.py", line 471, in _Fire
    target=component.__name__)
  File "/opt/conda/lib/python3.7/site-packages/fire/core.py", line 681, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/deep_daze/cli.py", line 71, in train
    imagine()
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/deep_daze/deep_daze.py", line 315, in forward
    loss = self.train_step(epoch, i)
  File "/opt/conda/lib/python3.7/site-packages/deep_daze/deep_daze.py", line 301, in train_step
    self.generate_and_save_image(current_iteration=iteration)
  File "/opt/conda/lib/python3.7/site-packages/deep_daze/deep_daze.py", line 282, in generate_and_save_image
    self.replace_current_image(self)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 779, in __getattr__
    type(self).__name__, name))
torch.nn.modules.module.ModuleAttributeError: 'Imagine' object has no attribute 'replace_current_image'

I'm getting this on two different setups no matter what if I pass in --save_every and --save_progress=True. It looks like Fire is having trouble seeing the new method I added. @lucidrains I'm not super familiar with Fire. Do you know what I'm doing wrong here?

FileNotFoundError: [Errno 2] No such file or directory: '/usr/local/lib/python3.6/dist-packages/deep_daze/data/bpe_simple_vocab_16e6.txt'

I'm trying this out in colab and facing the above error.Here's the full stack:
Traceback (most recent call last):
File "/usr/local/bin/imagine", line 5, in
from deep_daze.cli import main
File "/usr/local/lib/python3.6/dist-packages/deep_daze/init.py", line 1, in
from deep_daze.deep_daze import DeepDaze, Imagine
File "/usr/local/lib/python3.6/dist-packages/deep_daze/deep_daze.py", line 11, in
from deep_daze.clip import load, tokenize, normalize_image
File "/usr/local/lib/python3.6/dist-packages/deep_daze/clip.py", line 223, in
_tokenizer = SimpleTokenizer()
File "/usr/local/lib/python3.6/dist-packages/deep_daze/clip.py", line 64, in init
merges = Path(bpe_path).read_text().split('\n')
File "/usr/lib/python3.6/pathlib.py", line 1196, in read_text
with self.open(mode='r', encoding=encoding, errors=errors) as f:
File "/usr/lib/python3.6/pathlib.py", line 1183, in open
opener=self._opener)
File "/usr/lib/python3.6/pathlib.py", line 1037, in _opener
return self._accessor.open(self, flags, mode)
File "/usr/lib/python3.6/pathlib.py", line 387, in wrapped
return strfunc(str(pathobj), *args)
FileNotFoundError: [Errno 2] No such file or directory: '/usr/local/lib/python3.6/dist-packages/deep_daze/data/bpe_simple_vocab_16e6.txt'

Possible to change text as you train?

Big-sleep seems to have a self.set_text function that's used in this notebook:

https://colab.research.google.com/github/lots-of-things/Story2Hallucination/blob/main/Story2Hallucination_GIF.ipynb

to train big-sleep on a "sliding phrase" through a larger paragraph of words. This produces mostly non-sense visually as it obviously needs far more time than the defaults to generate something accurate.

I'm curious if deep-daze would allow for this though. Is it possible to just change deep-daze's text parameter? Say, once per epoch you could feed it a different phrase?

There were no tensor arguments to this function

Not quite sure what I'm doing wrong yet. Any ideas?

imagine "A rainbow in the forest" --deeper --save_every=200 --open_folder=False --epochs=2

Starting up...
Imagining "A rainbow in the forest" from the depths of my weights...
loss: -44.91: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1050/1050 [12:29<00:00, 1.40it/s]
loss: -46.91: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▊| 1049/1050 [12:38<00:00, 1.38it/s]
epochs: 50%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████ | 1/2 [25:08<25:08, 1508.57s/it]
Traceback (most recent call last):
File "/home/nerdy/anaconda3/envs/big-sleep/bin/imagine", line 8, in
sys.exit(main())
File "/home/nerdy/anaconda3/envs/big-sleep/lib/python3.7/site-packages/deep_daze/cli.py", line 91, in main
fire.Fire(train)
File "/home/nerdy/anaconda3/envs/big-sleep/lib/python3.7/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/home/nerdy/anaconda3/envs/big-sleep/lib/python3.7/site-packages/fire/core.py", line 471, in _Fire
target=component.name)
File "/home/nerdy/anaconda3/envs/big-sleep/lib/python3.7/site-packages/fire/core.py", line 681, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/home/nerdy/anaconda3/envs/big-sleep/lib/python3.7/site-packages/deep_daze/cli.py", line 87, in train
imagine()
File "/home/nerdy/anaconda3/envs/big-sleep/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/nerdy/anaconda3/envs/big-sleep/lib/python3.7/site-packages/deep_daze/deep_daze.py", line 366, in forward
loss = self.train_step(epoch, i)
File "/home/nerdy/anaconda3/envs/big-sleep/lib/python3.7/site-packages/deep_daze/deep_daze.py", line 319, in train_step
loss = self.model(self.encoded_text)
File "/home/nerdy/anaconda3/envs/big-sleep/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/nerdy/anaconda3/envs/big-sleep/lib/python3.7/site-packages/deep_daze/deep_daze.py", line 150, in forward
image = torch.cat(pieces)
RuntimeError: There were no tensor arguments to this function (e.g., you passed an empty list of Tensors), but no fallback function is registered for schema aten::_cat. This usually means that this function requires a non-empty list of Tensors. Available functions are [CPU, CUDA, QuantizedCPU, BackendSelect, Named, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, Autocast, Batched, VmapMode].

CPU: registered at /pytorch/build/aten/src/ATen/CPUType.cpp:2127 [kernel]
CUDA: registered at /pytorch/build/aten/src/ATen/CUDAType.cpp:2983 [kernel]
QuantizedCPU: registered at /pytorch/build/aten/src/ATen/QuantizedCPUType.cpp:297 [kernel]
BackendSelect: fallthrough registered at /pytorch/aten/src/ATen/core/BackendSelectFallbackKernel.cpp:3 [backend fallback]
Named: registered at /pytorch/aten/src/ATen/core/NamedRegistrations.cpp:7 [backend fallback]
AutogradOther: registered at /pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:8078 [autograd kernel]
AutogradCPU: registered at /pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:8078 [autograd kernel]
AutogradCUDA: registered at /pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:8078 [autograd kernel]
AutogradXLA: registered at /pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:8078 [autograd kernel]
AutogradPrivateUse1: registered at /pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:8078 [autograd kernel]
AutogradPrivateUse2: registered at /pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:8078 [autograd kernel]
AutogradPrivateUse3: registered at /pytorch/torch/csrc/autograd/generated/VariableType_2.cpp:8078 [autograd kernel]
Tracer: registered at /pytorch/torch/csrc/autograd/generated/TraceType_2.cpp:9654 [kernel]
Autocast: registered at /pytorch/aten/src/ATen/autocast_mode.cpp:258 [kernel]
Batched: registered at /pytorch/aten/src/ATen/BatchingRegistrations.cpp:511 [backend fallback]
VmapMode: fallthrough registered at /pytorch/aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback]

Runtime Error

I'm running into an issue when using "imagine" in my prompt. This is my first time using something like python so bear with me.

RuntimeError: CUDA out of memory. Tried to allocate 128.00 MiB (GPU 0; 8.00 GiB total capacity; 5.98 GiB already allocated; 100.56 MiB free; 5.99 GiB reserved in total by PyTorch)

I'm not really understanding what the issue is here, I would think that 8GB is enough.

Faulty normalization?!

Sooooo I was working on returning the image from every train_step. I noticed that, so far, the images were saved using save_image. There, I noticed that the image is calculated using the following lines:

img = normalize_image(self.model(self.clip_encoding, return_loss=False).cpu())
img.clamp_(0., 1.)

But the normalize_image, that is used in this line is defined here:

perceptor, normalize_image = load()

That means, an image is returned, which is normalized to be used as the input for CLIP, at least as far as I understand. That does not make any sense. Surely, the output of the SIREN net needs to be normalized before extracting the image features to calculate the loss, but that should not happen for the final image.

I'd suggest to simply remove the normalize_image. I'm playing around with this and it seems the generated images are now brighter - which makes sense given that, before, we unnecessarily subtracted 0.34-something per image.

That seems to be a major bug (although I kind of like the darker images too) @lucidrains @afiaka87

Missing import `random` when trying to use a seed`

Traceback (most recent call last):
  File "/home/landon/.virtualenvs/deep-daze-afiaka/bin/imagine", line 11, in <module>
    load_entry_point('deep-daze==0.3.2', 'console_scripts', 'imagine')()
  File "/home/landon/.virtualenvs/deep-daze-afiaka/lib/python3.8/site-packages/deep_daze-0.3.2-py3.8.egg/deep_daze/cli.py", line 69, in main
    fire.Fire(train)
  File "/home/landon/.virtualenvs/deep-daze-afiaka/lib/python3.8/site-packages/fire-0.4.0-py3.8.egg/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/home/landon/.virtualenvs/deep-daze-afiaka/lib/python3.8/site-packages/fire-0.4.0-py3.8.egg/fire/core.py", line 466, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/home/landon/.virtualenvs/deep-daze-afiaka/lib/python3.8/site-packages/fire-0.4.0-py3.8.egg/fire/core.py", line 681, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/home/landon/.virtualenvs/deep-daze-afiaka/lib/python3.8/site-packages/deep_daze-0.3.2-py3.8.egg/deep_daze/cli.py", line 46, in train
    imagine = Imagine(
  File "/home/landon/.virtualenvs/deep-daze-afiaka/lib/python3.8/site-packages/deep_daze-0.3.2-py3.8.egg/deep_daze/deep_daze.py", line 216, in __init__
    random.seed(seed)
NameError: name 'random' is not defined

Not sure which random this is supposed to be. Is it python's random or torch.random? Happy to fix it.

Colab Notebook Overwrites Each Saved Image

As it stands, the colab notebook is saving each iteration's image as the same filename. This causes the previous iteration to be lost. A simple change of this line -

image = Image(f'./{TEXT}.png')

to

image = Image(f'./{TEXT}-{i}.png')

is a reasonable fix. This works bc the variable i holds the current iteration.

Add Fourier Feature Mapping for Improved Quality

A pal of mine discovered a fantastic notebook by a coder using the Github handle eps696. The notebook adds fourier feature mapping to CLIP+SIREN to great effect.

Here's the Original Notebook and here's
My Custom Version with some fun additions such as prompts to minimize (subtract) and a prompt for painting finer details. All due credit, those ideas are also from eps696 in another of their notebooks.

Anyway, it definitely seems to help. Using as few as 16 SIREN layers, I've gotten this output:

cosmic love and attention
https://user-images.githubusercontent.com/3994972/109033636-77a91200-768c-11eb-8d5c-265f745cf496.mp4

mist over green hills
vlcsnap-2021-02-24-11h49m43s175

mist over green hills with the fine_details="trees"
mist_fine_detail

The drawbacks seem to be that the learning rate is much more fickle and may even need to be changed slightly on a per-phrase basis (if you get really unlucky). Increasing the number of SIREN layers to anything more than 20 causes lots of issues as well and I'm pretty unclear as to why that is. I can't find a stable learning rate for those. Also, there is a fourier_scale parameter which eps696 left at 4, but I've found 2 to be a better result.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.