Giter Club home page Giter Club logo

distill-sd's Introduction

PyPI - Python Version GitHub Workflow Status (event) GitHub Workflow Status (event) PyPI - Downloads PyPI GitHub tag (latest SemVer)

Installation

pip3 install -r requirements.txt

Code style

Python

We adopt PEP8 as the preferred code style.

We use the following tools for linting and formatting:

Style configurations of yapf and isort can be found in setup.cfg.

We use pre-commit hook that checks and formats for flake8, yapf, isort, trailing whitespaces, fixes end-of-files, sorts requirments.txt automatically on every commit. The config for a pre-commit hook is stored in .pre-commit-config.

After you clone the repository, you will need to install initialize pre-commit hook.

pip install -U pre-commit

From the repository folder

pre-commit install

After this on every commit check code linters and formatter will be enforced.

Run locally

cd .

pip install virtualenv

virtualenv venv

source ./venv/bin/activate

pip install -r requirements.txt pip install -e .

distill-sd's People

Contributors

gothos avatar harishsegmind avatar shreyas269 avatar warlord-k avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

distill-sd's Issues

poor result

I use the model from https://huggingface.co/segmind/small-sd, but i found most of the result generated are poor
So what's wrong with my usage

import torch
from diffusers import DiffusionPipeline

pipeline = DiffusionPipeline.from_pretrained(
    "segmind/small-sd",
    torch_dtype=torch.float16)
pipeline.to("cuda")
# pipeline.scheduler = DPMSolverMultistepScheduler.from_config(pipeline.scheduler.config, use_karras_sigmas=False)

pipeline("A man staring ahead at the camera with a neutral expression",
         num_inference_steps=20,
         guidance_scale=7.5).images[0].save("my_image.png")

And this is the result, which is un-natural at all
image

ControlNet?

Hi, will you be training ControlNets with this approach? Initially when I tried it for SD 1.5 many months ago, I found that you'd get much more interesting outputs when you guide the composition than when you let SD handle it based on the raw prompt. I think in the same spirit this could improve the performance of small models by allowing it to focus on low frequency details.

distill_training error

Hi,

When I ran the distill_training by following your command, I encountered this error.

Traceback (most recent call last): [73/1822]
File "/home/user/sdm-kd/distill_training.py", line 1199, in
main()
File "/home/user/sdm-kd/distill_training.py", line 999, in main
cast_hook(unet,KD_student,args.distill_level,False)
File "/home/user/sdm-kd/distill_training.py", line 992, in cast_hook
unet.down_blocks[i].register_forward_hook(getActivation(dicts,'d'+str(i),True))
File "/home/user/anaconda3/envs/uvit/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1265, in getattr
raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'DistributedDataParallel' object has no attribute 'down_blocks'

Could you check this?

Thanks in advance.

Encountered a very confusing issue while resuming from checkpoint.

When I resumed training from a checkpoint, I encountered an error with load_state_dict() indicating that the loaded checkpoint is incompatible with the current UNet model structure, causing the loading process to fail. However, I am certain that the saved checkpoint is sd_small, and the UNet model used for resuming training is also sd_small, so this is really a very confusing issue.

The log is as follows:
Traceback (most recent call last):
File "distill_training.py", line 1140, in
main()
File "distill_training.py", line 899, in main
accelerator.load_state(os.path.join(args.output_dir, path))
File "/opt/anaconda3/lib/python3.7/site-packages/accelerate/accelerator.py", line 2347, in load_state
hook(models, input_dir)
File "distill_training.py", line 656, in load_model_hook
model.load_state_dict(load_model.state_dict())
File "/opt/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1498, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for UNet2DConditionModel:
Unexpected key(s) in state_dict: "down_blocks.0.attentions.1.norm.weight", "down_blocks.0.attentions.1.norm.bias",

Could you check this?

Thanks in advance.

multi-gpu training

Hi, I'm impressed by your work.

Does the distill_training.py support multi-GPU training?

How to infer using the trained model?

Hi, thanks for your great job! I want to test the trained model by distill_train.py. The inference code is as follows.

import torch
from diffusers import DiffusionPipeline
from diffusers import DPMSolverMultistepScheduler
from torch import Generator


path = "sd-laion-art/"
# Insert your prompt below.
prompt = "Faceshot Portrait of pretty young (18-year-old) Caucasian wearing a high neck sweater, (masterpiece, extremely detailed skin, photorealistic, heavy shadow, dramatic and cinematic lighting, key light, fill light), sharp focus, BREAK epicrealism"
# Insert negative prompt below. We recommend using this negative prompt for best results.
negative_prompt = "(deformed iris, deformed pupils, semi-realistic, cgi, 3d, render, sketch, cartoon, drawing, anime:1.4), text, close up, cropped, out of frame, worst quality, low quality, jpeg artifacts, ugly, duplicate, morbid, mutilated, extra fingers, mutated hands, poorly drawn hands, poorly drawn face, mutation, deformed, blurry, dehydrated, bad anatomy, bad proportions, extra limbs, cloned face, disfigured, gross proportions, malformed limbs, missing arms, missing legs, extra arms, extra legs, fused fingers, too many fingers, long neck" 

torch.set_grad_enabled(False)
torch.backends.cudnn.benchmark = True

# Below code will run on gpu, please pass cpu everywhere as the device and set 'dtype' to torch.float32 for cpu inference.
with torch.inference_mode():
    gen = Generator("cuda")
    gen.manual_seed(1674753452)
    pipe = DiffusionPipeline.from_pretrained(path, torch_dtype=torch.float16, safety_checker=None, requires_safety_checker=False)
    pipe.to('cuda')
    pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
    pipe.unet.to(device='cuda', dtype=torch.float16, memory_format=torch.channels_last)

    img = pipe(prompt=prompt,negative_prompt=negative_prompt, width=512, height=512, num_inference_steps=25, guidance_scale = 7, num_images_per_prompt=1, generator = gen).images[0]
    img.save("image.png")

However ,the following error occurs.

ValueError: Cannot load <class 'diffusers.models.unet_2d_condition.UNet2DConditionModel'> from sd-laion-art/unet because the following keys are missing: 
 down_blocks.2.resnets.1.conv2.bias, up_blocks.2.resnets.2.norm2.weight, up_blocks.1.attentions.2.transformer_blocks.0.attn1.to_out.0.bias, up_blocks.2.attentions.2.transformer_blocks.0.attn2.to_k.weight, down_blocks.0.attentions.1.transformer_blocks.0.attn1.to_k.weight, down_blocks.t...

Could you please tell me how to infer with the model trained by distill_train.py ? Thanks!

Getting broadcast error when trying to run distill_training.py

When I try to run the accelerate training script, I am facing with a broadcast error.

The log is as follows:

08/19/2023 05:24:35 - INFO - __main__ - ***** Running training *****
08/19/2023 05:24:35 - INFO - __main__ -   Num examples = 20072
08/19/2023 05:24:35 - INFO - __main__ -   Num Epochs = 12
08/19/2023 05:24:35 - INFO - __main__ -   Instantaneous batch size per device = 1
08/19/2023 05:24:35 - INFO - __main__ -   Total train batch size (w. parallel, distributed & accumulation) = 16
08/19/2023 05:24:35 - INFO - __main__ -   Gradient Accumulation steps = 4
08/19/2023 05:24:35 - INFO - __main__ -   Total optimization steps = 15000
Steps:   0%|                                                                          | 0/15000 [00:01<?, ?it/s, lr=1e-5, step_loss=44.7]Traceback (most recent call last):
  File "/home/shkulkarni/segmind/distill-sd/distill_training.py", line 1208, in <module>
Traceback (most recent call last):
  File "/home/shkulkarni/segmind/distill-sd/distill_training.py", line 1208, in <module>
Traceback (most recent call last):
Traceback (most recent call last):
  File "/home/shkulkarni/segmind/distill-sd/distill_training.py", line 1208, in <module>
  File "/home/shkulkarni/segmind/distill-sd/distill_training.py", line 1208, in <module>
    main()
  File "/home/shkulkarni/segmind/distill-sd/distill_training.py", line 1100, in main
    main()
  File "/home/shkulkarni/segmind/distill-sd/distill_training.py", line 1100, in main
    main()    
main()      File "/home/shkulkarni/segmind/distill-sd/distill_training.py", line 1100, in main

ema_unet.step(unet.parameters())
  File "/home/shkulkarni/segmind/distill-sd/distill_training.py", line 1100, in main
  File "/opt/conda/envs/segmind/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/opt/conda/envs/segmind/lib/python3.10/site-packages/diffusers/training_utils.py", line 194, in step
    ema_unet.step(unet.parameters())
  File "/opt/conda/envs/segmind/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    s_param.sub_(one_minus_decay * (s_param - param))
RuntimeError: output with shape [320, 320, 1, 1] doesn't match the broadcast shape [320, 320, 3, 3]
    return func(*args, **kwargs)
  File "/opt/conda/envs/segmind/lib/python3.10/site-packages/diffusers/training_utils.py", line 194, in step
        ema_unet.step(unet.parameters())ema_unet.step(unet.parameters())

  File "/opt/conda/envs/segmind/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
  File "/opt/conda/envs/segmind/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    s_param.sub_(one_minus_decay * (s_param - param))
RuntimeError: output with shape [320, 320, 1, 1] doesn't match the broadcast shape [320, 320, 3, 3]
    return func(*args, **kwargs)    
return func(*args, **kwargs)
  File "/opt/conda/envs/segmind/lib/python3.10/site-packages/diffusers/training_utils.py", line 194, in step
  File "/opt/conda/envs/segmind/lib/python3.10/site-packages/diffusers/training_utils.py", line 194, in step
        s_param.sub_(one_minus_decay * (s_param - param))s_param.sub_(one_minus_decay * (s_param - param))

RuntimeErrorRuntimeError: : output with shape [320, 320, 1, 1] doesn't match the broadcast shape [320, 320, 3, 3]output with shape [320, 320, 1, 1] doesn't match the broadcast shape [320, 320, 3, 3]

@Gothos I suspect this might be related to LoRA version, or its extensions. The error seems to originate from the line ema_unet.step(unet.parameters()). Could you check this?

Thanks in advance.

Thank You, and Typo Correction Request for Hugging Face Blog

Hi!

I’m the author of BK-SDM, and I would like to express my appreciation for utilizing our work and releasing the codes and models. I believe your research definitely contributes to the community of small generative models. We have also mentioned your work in our GitHub repository :)

I have just noticed the blog post, https://huggingface.co/blog/sd_distillation, written by Yatharth Gupta @Warlord-K
There was a typo in the below sentence, and would you please revise it?

Original: Image taken from the paper “On Architectural Compression of Text-to-Image Diffusion Models” by Shinkook. et. al

Request to revise the authorship mention (by Shinkook. et. al) using

We prefer (1), but (2) would be also appreciated. Thank you for considering this request.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.