Giter Club home page Giter Club logo

face_diffusion's Introduction

Diffusion Models for content generation are exciting as hell. In this project, I created a fake face generation model that one could practically train on a personal computer. Just as most of us entered the realm of deep learning with the MNIST project, for me this project serves as a starting point for content generation models.

I'm running RTX 3060 on my PC, and I scraped 30,000 fake face images from thispersondoesnotexist to train my model. Here's what they look like

alt text z

The gist of diffusion models is essentially forward and backward diffusion processes. We turn images into pure noise by forward diffusion, and use a backward diffusion process to turn random noise into images. Our neural network is used in the backward diffusion process. Instead of learning about images like in GANs, our model learn about noise distribution and how to clear out said noise.

We start with the forward diffusion process, which gradually adds random noise to an image, making it a complete image of noise by the end of certain amount of timestep. This is a Markov process, where the state of the current image depends only on the previous image. We denote the process as q.

Let's denote x0 = original image, T = total time step. And x1 to xT are more and more noisy version of x0. The way noise is sampled at time t is described by

$$ \begin{align*} q(x_t|x_{t-1}) &=N(x_t;\sqrt{1-{\beta}t}x{t-1}, {{\beta}_t}I) \ &=\sqrt{1-{{\beta}t}}x{t-1}+\sqrt{{\beta}_t}\epsilon \end{align*} $$

$$ {\text{Where }} x_t=\text{output}, \sqrt{1-{{\beta}_t}}=\text{mean}, {{\beta}_t}I =\text{fixed variance}, {\epsilon\sim}N(0,1) \text{ meaning avg 0 and std 1} $$

Variance at time t is basically how much noise we'd like to generate. Further, we define the following

$$ \begin{align*} &{\alpha}_t=1-{\beta}_t \ &{\overline{{\alpha}}t} = \prod{i=1}^{t} {\alpha}_t \end{align*} $$

Combining this into our definition of q, we have

$$ \begin{align*} q(x_t|x_{t-1})&=\sqrt{1-{{\beta}t}}x{t-1}+\sqrt{{\beta}t}\epsilon \ &=\sqrt{{\alpha}t}x{t-1}+\sqrt{1-{\alpha}t}\epsilon \ &=\sqrt{{\alpha}t{\alpha}{t-1}}x{t-2}+\sqrt{1-{\alpha}t{\alpha}{t-1}}\epsilon \ &=\sqrt{{\alpha}t{\alpha}{t-1}{\alpha}{t-2}}x_{t-3}+\sqrt{1-{\alpha}t{\alpha}{t-1}{\alpha}_{t-2}}\epsilon \ &=\sqrt{\overline{{\alpha}}_t}x_0+\sqrt{1-\overline{{\alpha}}_t}\epsilon \ \end{align*} $$

This formula gives us the ability to calculate the noised image at any time step, which allows us to not have to loop over each timestep a.k.a O(N) to O(1). Here's what forward diffusion process looks like on an actual image

alt text z

I trained for 50 epochs on the previously mentioned 30,000 face images, which took about 6 to 7 hours. I also sampled the image generation process every 5 epochs to observe the training progression. Here's what it looks like

alt text z

And here are some images it can generate after the 50-epoch training

alt text z

Again, keep in mind that this is only a starting point for these types of generative diffusion models. There are many improvements to the current version; some of the ones I can quickly think of include

  • More epochs in larger batches with more powerful hardware
  • Using bigger U-Nets
  • Adding Residual Blocks
  • Incorporating Attention Unit
  • Adding other conditionals such as text descriptions

That being said, considering the small amount of effort I made in creating this model versus what it can do, I think we've seen the efficacy of diffusion models.

face_diffusion's People

Contributors

emericen avatar

Watchers

 avatar Kostas Georgiou avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.