The project's goal is to develop a neural network that given a sketch of an image and some partial information about the image can generate a full-colour image.
We are using a medium sized dataset of around 40000 images, taken from various anime (see Dataset ). The main source of inspiration (architecture and approach) was the paper Diffusart: Enhancing Line Art Colorization with Conditional Diffusion Models . The architecture was a direct copy of the implementation of the paper by Ho et al. , throught the code implementation by Niels Rogge and Kashif Rasul .
The main training loop is under train.py, and all of the config files are in the config.yaml. The dataset that is to be used for the task is assumed to have three columns where each row represents a training triplet. The columns are "full_colour", "sketch", "sketch_and_scribbles_merged". Look at Dataset , for an example training dataset.
The model uses an Unet architecture. The explicit conditional information is concatenated to the noisy input, and the implicit partial colour information is introduced via cross-attention.output.mp4
Training of the model took around 80 hours on a single RTX 3090 chip. The average LPIPS score on the test set (300 examples) was measured to be 0.1632 after sampling with 100 DDPM steps.
This is a project that was done as part of the "Theory And Practice of Deep Learning" undergraduate course at Yonsei University. This is still just a rough implementation of the ideas mentioned in the original Diffusart paper, and as such, it may contain some bugs and errors. You are welcome to propose any changes via GitHub.