diffusart-pytorch's Introduction

Diffusart-pytorch

1. About the project

The project's goal is to develop a neural network that given a sketch of an image and some partial information about the image can generate a full-colour image.

We are using a medium sized dataset of around 40000 images, taken from various anime (see Dataset ). The main source of inspiration (architecture and approach) was the paper Diffusart: Enhancing Line Art Colorization with Conditional Diffusion Models . The architecture was a direct copy of the implementation of the paper by Ho et al. , throught the code implementation by Niels Rogge and Kashif Rasul .

2. How to run the code

The main training loop is under train.py, and all of the config files are in the config.yaml. The dataset that is to be used for the task is assumed to have three columns where each row represents a training triplet. The columns are "full_colour", "sketch", "sketch_and_scribbles_merged". Look at Dataset , for an example training dataset.

3. Overview of the model

Link to higher resolution version

The model uses an Unet architecture. The explicit conditional information is concatenated to the noisy input, and the implicit partial colour information is introduced via cross-attention.

4. Evaluation

output.mp4

Training of the model took around 80 hours on a single RTX 3090 chip. The average LPIPS score on the test set (300 examples) was measured to be 0.1632 after sampling with 100 DDPM steps.

5. HuggingFace space

See link to the HuggingFace space for a demo of the model.

5. Notes

This is a project that was done as part of the "Theory And Practice of Deep Learning" undergraduate course at Yonsei University. This is still just a rough implementation of the ideas mentioned in the original Diffusart paper, and as such, it may contain some bugs and errors. You are welcome to propose any changes via GitHub.

Recommend Projects

pawelpiwowarski / diffusart-pytorch Goto Github PK