cabralpinto / modular-diffusion Goto Github PK

Python library for designing and training your own Diffusion Models with PyTorch.

Home Page: https://cabralpinto.github.io/modular-diffusion/

License: MIT License

Python 100.00%

diffusion-models modular-design python audio-generation deep-learning image-generation machine-learning pytorch text-generation transformer

modular-diffusion's Introduction

Modular Diffusion

Modular Diffusion provides an easy-to-use modular API to design and train custom Diffusion Models with PyTorch. Whether you're an enthusiast exploring Diffusion Models or a hardcore ML researcher, this framework is for you.

Features

⚙️ Highly Modular Design: Effortlessly swap different components of the diffusion process, including noise type, schedule type, denoising network, and loss function.
📚 Growing Library of Pre-built Modules: Get started right away with our comprehensive selection of pre-built modules.
🔨 Custom Module Creation Made Easy: Craft your own original modules by inheriting from a base class and implementing the required methods.
🤝 Integration with PyTorch: Built on top of PyTorch, Modular Diffusion enables you to develop custom modules using a familiar syntax.
🌈 Broad Range of Applications: From generating high-quality images to implementing non-autoregressive text synthesis pipelines, the possiblities are endless.

Installation

Modular Diffusion officially supports Python 3.10+ and is available on PyPI:

pip install modular-diffusion

You also need to install the correct PyTorch distribution for your system.

Note: Although Modular Diffusion works with later Python versions, we currently recommend using Python 3.10. This is because torch.compile, which significantly improves the speed of the models, is not currently available for versions above Python 3.10.

Usage

With Modular Diffusion, you can build and train a custom Diffusion Model in just a few lines. First, load and normalize your dataset. We are using the dog pictures from AFHQ.

x, _ = zip(*ImageFolder("afhq", ToTensor()))
x = resize(x, [h, w], antialias=False)
x = torch.stack(x) * 2 - 1

Next, build your custom model using either Modular Diffusion's prebuilt modules or your custom modules.

model = diffusion.Model(
   data=Identity(x, batch=128, shuffle=True),
   schedule=Cosine(steps=1000),
   noise=Gaussian(parameter="epsilon", variance="fixed"),
   net=UNet(channels=(1, 64, 128, 256)),
   loss=Simple(parameter="epsilon"),
)

Now, train and sample from the model.

losses = [*model.train(epochs=400)]
z = model.sample(batch=10)
z = z[torch.linspace(0, z.shape[0] - 1, 10).long()]
z = rearrange(z, "t b c h w -> c (b h) (t w)")
save_image((z + 1) / 2, "output.png")

Finally, marvel at the results.

Check out the Getting Started Guide to learn more and find more examples here.

Contributing

We appreciate your support and welcome your contributions! Please feel free to submit pull requests if you found a bug or typo you want to fix. If you want to contribute a new prebuilt module or feature, please start by opening an issue and discussing it with us. If you don't know where to begin, take a look at the open issues. Please read our Contributing Guide for more details.

License

This project is licensed under the MIT License.

modular-diffusion's People

Contributors

Stargazers

Watchers

Forkers

abdellahi-brahim hbcbh1999 neverstoplearning99 dorucioclea 1nuno yuanmengzhixing jaedukseo isaacsheidlower dhaarani28 giovanni-bejar

modular-diffusion's Issues

Add support for very large datasets

Right now, datasets have to be loaded into RAM to be used in a Data object, which is an issue when dealing with very large datasets.

Add image generation guide

Add a documentation page explaining how standard image generation practices with Diffusion Models translate to Modular Diffusion.

Support lower Python versions

The library currently only works for Python 3.10+ which might pose an inconvenience to some users.

Add unit testing

As the library grows, it is important to make sure that nothing is broken when new features are added. After doing some research, I think a good way of doing this would be to create a Pytest class for each prebuilt module with a test function for each of the module's methods. As an example, I wrote this test for the Normal distribution module:

class TestNormal:

    @pytest.fixture(params=[(2, 2), (3, 3, 3), (4, 4, 4, 4)])
    def distribution(self, request: FixtureRequest) -> Normal:
        size = request.param
        mu, sigma = torch.zeros(size), torch.ones(size)
        return Normal(mu, sigma)

    def test_sample(self, distribution: Normal) -> None:
        z, epsilon = distribution.sample()
        assert z.shape == distribution.mu.shape
        assert epsilon.shape == distribution.mu.shape
        assert torch.allclose(z, distribution.mu + distribution.sigma * epsilon)

    @pytest.mark.parametrize("x", [0.0, 1.0, -1.0, 1e6])
    def test_nll(self, distribution: Normal, x: float) -> None:
        nll = distribution.nll(torch.full(distribution.mu.shape, x))
        assert nll.shape == distribution.mu.shape
        assert nll.shape == distribution.sigma.shape
        assert torch.allclose(
            nll, 0.5 * ((x - distribution.mu) / distribution.sigma)**2 +
            (distribution.sigma * 2.5066282746310002).log())

    @pytest.mark.parametrize("mu, sigma", [(0.0, 1.0), (-1.0, 3.0), (1e6, 2e6)])
    def test_dkl(self, distribution: Normal, mu: float, sigma: float) -> None:
        other = Normal(
            torch.full(distribution.mu.shape, mu),
            torch.full(distribution.sigma.shape, sigma),
        )
        dkl = distribution.dkl(other)
        assert dkl.shape == distribution.mu.shape
        assert torch.allclose(
            dkl,
            torch.log(other.sigma / distribution.sigma) +
            (distribution.sigma**2 +
             (distribution.mu - other.mu)**2) / (2 * other.sigma**2) - 0.5,
        )

I have very little experience with unit testing, so I appreciate all the help I can get with this.

Unconditional Diffusion Example without Grayscale

How to run the Unconditional Diffusion example on custom dataset without grayscale? It returns error whenever the dataset is not grayscaled

Add dark mode to docs

It'd be nice to have a dark button switch at the top for those more sensitive to bright lights.

Improve U-Net and Transformer implementations

As the documentation states, the current implementations are neither the most effective or efficient. The U-Net implementation was adapted from the The Annotated Diffusion Model and the Transformer implementation was adapted from Peebles & Xie (2022) (adaptive layer norm block). Although these produce good enough results, ideally the library would provide the best implementations out there for general use.

From what I've read, I think a good choice for the U-Net implementation would be the one used in Imagen for the Text-to-Image model, but there may well be other more recent architectures that would be a better fit. For the Transformer I'm really not sure right now. Any input on this would be greatly appreciated.