Giter Club home page Giter Club logo

comfyui_tgate's Introduction

ComfyUI_TGate

English | 简体中文

ComfyUI reference implementation for T-GATE.

T-GATE could brings 10%-50% speed up for different diffusion models, only slightly reduces the quality of the generated images and maintains the original composition.

Some monkey patch is used for current implementation. If any error occurs, make sure you have the latest version.

If my work helps you, consider giving it a star.

Some of my other projects that may help you.

🌟 Changelog

  • [2024.5.23]: updated to latest ComfyUI version
  • [2024.5.15]:
    • Fix sdxl x, y batch error. If you encounter some errors, you can try TGate Apply(Deprecated) node.
  • [2024.5.06]:
    • Add use_cpu_cache, reduce some GPU OOM promblem.
    • Refactor apply node, Deprecated old monkey patching method node. The legacy one will be removed after few version.
    • Add a simple node and advanced node.
  • [2024.4.30] 🔧 Fixed an cond-only sampling bug that caused animatediff error. Thanks pamparamm.
  • [2024.4.29] 🔧 TL,DR: Improved performance and T-GATE only works where it needs to work.
    • Fixed a bug that caused TGateApply to affect other places where the model is used, even if TGateApply is turned off.
    • Fixed cross attntion results not being cached correctly causing performance to be slightly lower than the git patch version.
  • [2024.4.26] 🎉 Native version release(NO NEED git patch anymore!).
  • [2024.4.18] Initial repo.

📚 Example workflows

The examples directory has workflow example. There are images generated with and without T-GATE in the assets folder.

example

Origin result T-GATE result
origin_result tgate_result

T-GATE result image comes from the workflow included in the example image.

Compare to AutomaticCFG

AutomaticCFG is another ComfyUI plugin: Your CFG won't be your CFG anymore. It is turned into a way to guide the CFG/final intensity/brightness/saturation, and it adds a 30% speed increase.

env: T4-8G

Origin T-GATE 0.5 AutomaticCFG T-GATE 0.35 AutomaticCFG fatest
result origin_result tgate_result auto_cfg_boost tgate_0_35 auto_cfg_fatest
speed 4.59it/s 5.68it/s 5.62it/s 6.13it/s 6.13it/s

T-GATE performs best when maintaining the original composition. However, if you don't need to maintain composition, AutomaticCFG fatest also brings about the same performance improvement.

📗 INSTALL

git clone https://github.com/JettHu/ComfyUI_TGate
# that's all!

📙 Major Features

  • Training-Free.
  • Friendly support CNN-based U-Net, Transformer, and Consistency Model
  • 10%-50% speed up for different diffusion models.

📖 Nodes reference

TGate Apply

Inputs

  • model, model loaded by Load Checkpoint and other MODEL loaders.

Configuration parameters

  • start_at, this is the percentage of steps. Defines at what percentage point of the generation to start use the T-GATE cache.
  • use_cpu_cache: If multiple batches (animatediff) cause GPU OOM, you can set it to true, and T-GATE performance will decrease.

TGate Apply Advanced

Inputs

  • model, model loaded by Load Checkpoint and other MODEL loaders.

Configuration parameters

  • start_at, this is the percentage of steps. Defines at what percentage point of the generation to start use the T-GATE cache.
  • only_cross_attention, [RECOMMEND] default is True, the effect is to cache only the output of cross-attention, ref to issues
  • use_cpu_cache: If multiple batches (animatediff) cause GPU OOM, you can set it to true, and T-GATE performance will decrease.

Optional configuration

  • self_attn_start_at, only takes effect when only_cross_attention is false, percentage of steps too. Defines at what percentage point of the generation to start use the T-GATE cache on latent self attnention.

TGate Apply(Deprecated)

This node is already deprecated, and will be removed after few version.

Inputs

  • model, model loaded by Load Checkpoint and other MODEL loaders.

Configuration parameters

  • start_at, this is the percentage of steps. Defines at what percentage point of the generation to start use the T-GATE cache.
  • only_cross_attention, [RECOMMEND] default is True, the effect is to cache only the output of cross-attention, ref to issues
  • use_cpu_cache: If multiple batches (animatediff) cause GPU OOM, you can set it to true, and T-GATE performance will decrease.

Optional configuration

  • self_attn_start_at, only takes effect when only_cross_attention is false, percentage of steps too. Defines at what percentage point of the generation to start use the T-GATE cache on latent self attnention.

🚀 Performance (from T-GATE)

Model MACs Param Latency Zero-shot 10K-FID on MS-COCO
SD-1.5 16.938T 859.520M 7.032s 23.927
SD-1.5 w/ TGATE 9.875T 815.557M 4.313s 20.789
SD-2.1 38.041T 865.785M 16.121s 22.609
SD-2.1 w/ TGATE 22.208T 815.433 M 9.878s 19.940
SD-XL 149.438T 2.570B 53.187s 24.628
SD-XL w/ TGATE 84.438T 2.024B 27.932s 22.738
Pixart-Alpha 107.031T 611.350M 61.502s 38.669
Pixart-Alpha w/ TGATE 65.318T 462.585M 37.867s 35.825
DeepCache (SD-XL) 57.888T - 19.931s 23.755
DeepCache w/ TGATE 43.868T - 14.666s 23.999
LCM (SD-XL) 11.955T 2.570B 3.805s 25.044
LCM w/ TGATE 11.171T 2.024B 3.533s 25.028
LCM (Pixart-Alpha) 8.563T 611.350M 4.733s 36.086
LCM w/ TGATE 7.623T 462.585M 4.543s 37.048

The latency is tested on a 1080ti commercial card.

The MACs and Params are calculated by calflops.

The FID is calculated by PytorchFID.

📝 TODO

  • Result image quality is inconsistent with origin. Now cache attn2 (cross_attention) only.
  • Implement a native version and no longer rely on git patch
  • Fully compatible with animatediff. Currently, both plugins hook comfy.samplers.sampling_function, T-Gate does not perform correctly. refer to
  • compatible with TiledDiffusion. issue #11

🔍 Common promblem

  • For apple silicon users using the mps backend, torch and macos versions may cause some problems. refer to issue comment.

  • Fixed in 2024.4.29. Unable to properly remove T-Gate effects. The situation in the picture below is bypass the node after apply.

2024.4.26-29 Updated on 2024.4.29
before_fixed after_fixed

comfyui_tgate's People

Contributors

jetthu avatar pamparamm avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.