Giter Club home page Giter Club logo

reading-diffusion's Introduction

Paper Collection on Diffusion Models

Awesome

This repository contains a collection of papers on Diffusion Models.

It's a simple way for me to keep track of the papers I've read and the ones I want to read. Every paper will be summarized in a few sentences. Read papers are marked with a ⚡.

Feel free to browse through the papers and summaries. Happy reading!

Contributions

If you'd like to contribute to this repository by adding papers or improving summaries, please submit a pull request. Your contributions are greatly appreciated!

To add a new paper to the repository, follow these steps:

  • Summarize the paper in a few sentences.
  • Add the paper title, authors, and a link to the paper.

Contents

Resources

Introductory Posts

What are Diffusion Models?
Lilian Weng
[Website]
Jul 2021

DiffusionFastForward: 01-Diffusion-Theory
Mikolaj Czerkawski (@mikonvergence)
[Website]
Feb 2023

Diffusion Models as a kind of VAE
Angus Turner
[Website]
Jun 2021

Generative Modeling by Estimating Gradients of the Data Distribution
Yang Song
[Website]
May 2021

Introductory Papers

Understanding Diffusion Models: A Unified Perspective
Calvin Luo
arXiv 2022. [Paper]
A good intro to Diffusion Models from VAE to DDPM with the math from scratch.

How to Train Your Energy-Based Models
Yang Song, Diederik P. Kingma
arXiv 2022. [Paper]
A good intro focused on maximum likelihood estimation with MCMC sampling, Score Matching, and Noise Contrastive Estimation.

Diffusion Models: A Comprehensive Survey of Methods and Applications
Ling Yang, Zhilong Zhang, Shenda Hong, Wentao Zhang, Bin Cui
arXiv 2022. [Paper] [Collection]
Survey.

Introductory Videos

Diffusion Models | Paper Explanation | Math Explained
Outlier
[Video]
Explains the derivation of the DDPM loss well.

What are Diffusion Models?
Ari Seff
[Video]
Nice connection to VAE.

DiffusionFastForward
Mikolaj Czerkawski (@mikonvergence)
[Video]
A series of videos on Diffusion from DDPM to High-Resolution.

Must-Read

Deep Unsupervised Learning using Nonequilibrium Thermodynamics
Jascha Sohl-Dickstein, Eric A. Weiss, Niru Maheswaranathan, Surya Ganguli
ICML 2015. [Paper] [Github]
The first paper. Proposes a method to recover data through reverse diffusion process. Worth reading in order to understand the connection with statistical physics.

Denoising Diffusion Probabilistic Models
Jonathan Ho, Ajay Jain, Pieter Abbeel
NeurIPS 2020. [Paper] [Github] [Github2]
The comeback. Proposed a method of sampling $x_t$ directly from $x_0$, and simplified the diffusion loss.

Improved Denoising Diffusion Probabilistic Models
Alex Nichol1, Prafulla Dhariwal1
ICLR 2021. [Paper] [Github]
Main contribution of this paper is a method to learning the variance in the backward process as an interpolation between $\beta_t$ and $\bar{\beta}_t$.

Variational Diffusion Models
Diederik P. Kingma, Tim Salimans, Ben Poole, Jonathan Ho
NeurIPS 2021. [Paper] [Github]
1 Jul 2021
A method to learn the noise schedule by modeling the $SNR=\frac{\mu^2}{\sigma^2}$.

Diffusion Models Beat GANs on Image Synthesis
Prafulla Dhariwal1, Alex Nichol1
arXiv 2021. [Paper] [Github]
Proposed Classifier Guidance and other improvments on architecture.

Denoising Diffusion Implicit Models
Jiaming Song, Chenlin Meng, Stefano Ermon
ICLR 2021. [Paper] [Github]
Brakes the Markov chain constraint and makes the reverse process deterministic. This allows to skip steps and make sampling a lot faster. A bit mathematically complex but is a must read.

Generative Modeling by Estimating Gradients of the Data Distribution
Yang Song, Stefano Ermon
NeurIPS 2019. [Paper] [Project] [Github]
First paper of Score-Based Models.

Score-Based Generative Modeling through Stochastic Differential Equations
Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, Ben Poole
ICLR 2021 (Oral). [Paper] [Github]
A paper that combines DDPM and Score-Based models under an SDE framework.

Elucidating the Design Space of Diffusion-Based Generative Models
Tero Karras, Miika Aittala, Timo Aila, Samuli Laine
arXiv 2022. [Paper]
Implementation improvements

Classifier-Free Diffusion Guidance
Jonathan Ho, Tim Salimans
NeurIPS Workshop 2021. [Paper]
Instead of providing guidance from an external classifier (Classifier Guidance), conditions are plugged into the UNet.

High-Resolution Image Synthesis with Latent Diffusion Models
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, Björn Ommer
CVPR 2022. [Paper]
Perform the diffusion process on a latent space instead of the image space.

Personalization

An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion
Rinon Gal, Yuval Alaluf, Yuval Atzmon, Or Patashnik, Amit H. Bermano, Gal Chechik, Daniel Cohen-Or
ICLR 2023 Spotlight [Paper] [Code] [Project Page]
Learn to generate and manipulate specific concepts by training word embedding.

DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation
Nataniel Ruiz, Yuanzhen Li, Varun Jampani, Yael Pritch, Michael Rubinstein, Kfir Aberman
CVPR 2023 [Paper] [Code] [Project Page]
Learn to generate and manipulate specific concepts by finetuning the hole model with an additional prior preservation loss.

Editing

DIFFUSION MODELS ALREADY HAVE A SEMANTIC LATENT SPACE
Mingi Kwon, Jaeseok Jeong, Youngjung Uh
ICLR 2023 Spotlight [Paper] \

Propose an asymetric reverse process by modifying only the "prediction" term of DDIM without affecting the "direction" term. Find that the deepest feature maps of the UNet function as a semantic latent space.

Plug-and-Play Diffusion Features for Text-Driven Image-to-Image Translation
Narek Tumanyan Michal Geyer Shai Bagon Tali Dekel
CVPR 2023 [Paper]
Text driven image to image translation. Save the feature maps of the 4th layer and the self-attention from the 4th-11th layers from the denoising steps of the source image. Then during the generation of the target image they inject the self attention and the feature maps from the source. This way they maintain the stracture and layout.

SEGA: Instructing Text-to-Image Models using Semantic Guidance
Manuel Brack, Felix Friedrich, Dominik Hintersdorf, Lukas Struppek, Patrick Schramowski, Kristian Kersting
NeurIPS 2023 [Paper] [Diffusers]
Extention of CFG for more finegrained prompt2prompt control.

Discovering Interpretable Directions in the Semantic Latent Space of Diffusion Models
René Haas, Inbar Huberman-Spiegelglas, Rotem Mulayoff, Tomer Michaeli
[Paper]
PCA on h-space (like in GANSpace papar) can provide interpretable directions. Further directions for a specific sample can be found by geting the right singular values of the SVD of Jicobian $J_t = \frac{\partial e_\theta}{h_t}$

Self-Discovering Interpretable Diffusion Latent Directions for Responsible Text-to-Image Generation
Hang Li, Chengzhi Shen, Philip Torr, Volker Tresp, Jindong Gu
[Paper]
Add a learnable vector to h-space. This vector is assosiated with an attribute of intrest (e.g. smile).

Concept Sliders: LoRA Adaptors for Precise Control in Diffusion Models
Rohit Gandikota, Joanna Materzynska, Tingrui Zhou, Antonio Torralba, David Bau
[Paper] [Code] \

A Latent Space of Stochastic Diffusion Models for Zero-Shot Image Editing and Guidance
Chen Henry Wu, Fernando De la Torre
CVPR 2023 [Paper] [Code]
Definining a Latent Space for Stohastic Diffusion Models.

Understanding the Latent Space of Diffusion Models through the Lens of Riemannian Geometry
Yong-Hyun Park, Mingi Kwon, Jaewoong Choi, Junghyo Jo, Youngjung Uh
NeurIPS 2023 [Paper]
Since h-space exibits local linearity and since there is a mapping $f:\mathcal{X} \rightarrow \mathcal{H}$ the pullback metric can be used to define a mapping back to $\mathcal{X}$. Then intrperetable directions in $\mathcal{X}$ can be found through SVD in $\mathcal{H}$.

Zero-shot Image-to-Image Translation
Gaurav Parmar, Krishna Kumar Singh, Richard Zhang, Yijun Li, Jingwan Lu, Jun-Yan Zhu
SIGGRAPH 2023 [Paper]
???

Unsupervised Discovery of Interpretable Directions in h-space of Pre-trained Diffusion Models
Zijian Zhang, Luping Liu. Zhijie Lin, Yichen Zhu, Zhou Zhao
[Paper]
They same idea as in the classic gan paper but for diffusion.

DeltaSpace: A Semantic-aligned Feature Space for Flexible Text-guided Image Editing
Yueming Lyu, Kang Zhao, Bo Peng, Yue Jiang, Yingya Zhang, Jing Dong
[Paper] [Code]

Inversion

*Null-text Inversion for Editing Real Images using Guided Diffusion Models
Ron Mokady, Amir Hertz, Kfir Aberman, Yael Pritch, Daniel Cohen-Or
[paper]
Improve DDIM inversion with CFG > 1 by pushing the tranjectory towards a pivot obtained with CFG = 1. Also optimize a null text embedding that can be used for editing like prompt2prompt.

DIRECT INVERSION: BOOSTING DIFFUSION-BASED EDITING WITH 3 LINES OF CODE
Xuan Ju, Ailing Zeng, Yuxuan Bian, Shaoteng Liu, Qiang Xu
[Paper] [Code]
???

An Edit Friendly DDPM Noise Space: Inversion and Manipulations \ Inbar Huberman-Spiegelglas, Vladimir Kulikov, Tomer Michaeli
[Paper] [Code]
???

Art

Visual Anagrams: Generating Multi-View Optical Illusions with Diffusion Models
Daniel Geng, Inbum Park, Andrew Owens
[Page][Paper] [Code]
Diffusion models make optical illusions. A pretrained diffusion model is used estimate the noise in different views or transformations of an image. The noise estimates are then aligned by applying the inverse view and averaged together. This averaged noise estimate is then used to take a diffusion step.

Video

Align Your Latents: High-Resolution Video Synthesis With Latent Diffusion Models
Andreas Blattmann, Robin Rombach, Huan Ling, Tim Dockhorn, Seung Wook Kim, Sanja Fidler
CVPR 2023 [Paper] \

Other

In-Context Learning Unlocked for Diffusion Models
Zhendong Wang, Yifan Jiang, Yadong Lu, Yelong Shen, Pengcheng He, Weizhu Chen, Zhangyang Wang, Mingyuan Zhou
NeurIPS 2023 [Paper]
???

Geometric

Geometric Latent Diffusion Models for 3D Molecule Generation
Minkai Xu, Alexander S. Powers, Ron O. Dror, Stefano Ermon, Jure Leskovec NeurIPS 2023 [Paper]

Riemannian Geometry

Understanding the Latent Space of Diffusion Models through the Lens of Riemannian Geometry
Yong-Hyun Park, Mingi Kwon, Jaewoong Choi, Junghyo Jo, Youngjung Uh
NeurIPS 2023 [Paper]
Since h-space exibits local linearity and since there is a mapping $f:\mathcal{X} \rightarrow \mathcal{H}$ the pullback metric can be used to define a mapping back to $\mathcal{X}$. Then intrperetable directions in $\mathcal{X}$ can be found through SVD in $\mathcal{H}$.

The Riemannian Geometry of Deep Generative Models
Hang Shao, Abhishek Kumar, P. Thomas Fletcher
CVPR 2018 [Paper] [Code]
Investigate the geomtry of latent and data space of VAE. (1) Costruct three algorithms. Geodeisc paths in image data space allowing interpolation. (2) Parallel transport from a sample to another. (3) Geodesic shooting, a way to create analogies like "$\alpha$ is for $\beta$ what is $\gamma$ is to $\mathcal{x}$" while remaining tnagent to the data manifold.

Unpaired Image-to-Image Translation with Shortest Path Regularization \ Shaoan Xie, Yanwu Xu, Mingming Gong, Kun Zhang
CVPR 2023 [Paper] [[Code] (https://github.com/Mid-Push/santa)] \

reading-diffusion's People

Contributors

zelaki avatar

Stargazers

 avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.