gameltb / comfyui_stable_fast Goto Github PK

View Code? Open in Web Editor NEW

195.0 195.0 12.0 936 KB

Experimental usage of stable-fast and TensorRT.

License: MIT License

Python 100.00%

comfyui_stable_fast's People

Contributors

Stargazers

Watchers

Forkers

chengzeyi jags111 mexicanamerican keyman9848 soliver84 cian0 flyinsky222 jeyamir vietbeu linecode xmyx night1099

comfyui_stable_fast's Issues

Test TensorRT VAEDecode failed !

python main.py --listen 0.0.0.0 --port 5000 --disable-xformers

workflow :

workflow_VAEDecode_trt.json

an error thrown :

Building TensorRT engine for : /app/custom_nodes/ComfyUI_stable_fast/tensorrt_engine_cache/73be59a4c039b7874c93c67755495ca5abe002922633db0e26a24b1e2146724a.trt
[W] 'colored' module is not installed, will not use colors when logging. To enable colors, please install the 'colored' module: python3 -m pip install colored
[E] In node 457 with name: /decoder/block_1/norm1/InstanceNormalization and operator: InstanceNormalization (importInstanceNormalization): INVALID_NODE: Assertion failed: (inputDataType == DataType::kFLOAT || inputDataType == DataType::kHALF): Inputs must be either FLOAT or FLOAT16. Input type is BF16.
[!] Could not parse ONNX correctly
ERROR:root:!!! Exception during processing !!!
ERROR:root:Traceback (most recent call last):
  File "/app/execution.py", line 153, in recursive_execute
    output_data, output_ui = get_output_data(obj, input_data_all)
  File "/app/execution.py", line 83, in get_output_data
    return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True)
  File "/app/execution.py", line 76, in map_node_over_list
    results.append(getattr(obj, func)(**slice_dict(input_data_all, i)))
  File "/app/nodes.py", line 267, in decode
    return (vae.decode(samples["samples"]), )
  File "/app/comfy/sd.py", line 237, in decode
    pixel_samples[x:x+batch_number] = torch.clamp((self.first_stage_model.decode(samples).to(self.output_device).float() + 1.0) / 2.0, min=0.0, max=1.0)
  File "/app/custom_nodes/ComfyUI_stable_fast/tensorrt_node.py", line 350, in __call__
    self.warmup(samples_in)
  File "/app/custom_nodes/ComfyUI_stable_fast/tensorrt_node.py", line 343, in warmup
    self(warmup_samples)
  File "/app/custom_nodes/ComfyUI_stable_fast/tensorrt_node.py", line 368, in __call__
    pixel_samples[x : x + batch_number] = self.tensorrt_module(
  File "/app/custom_nodes/ComfyUI_stable_fast/module/tensorrt_wrapper.py", line 152, in __call__
    engine = gen_engine(
  File "/app/custom_nodes/ComfyUI_stable_fast/module/tensorrt_wrapper.py", line 294, in gen_engine
    engine.build(
  File "/app/custom_nodes/ComfyUI_stable_fast/module/tensorrt_utilities.py", line 284, in build
    network = network_from_onnx_bytes(
  File "<string>", line 3, in network_from_onnx_bytes
  File "/opt/conda/lib/python3.10/site-packages/polygraphy/backend/base/loader.py", line 40, in __call__
    return self.call_impl(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/polygraphy/util/util.py", line 710, in wrapped
    return func(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/polygraphy/backend/trt/loader.py", line 185, in call_impl
    trt_util.check_onnx_parser_errors(parser, success)
  File "/opt/conda/lib/python3.10/site-packages/polygraphy/backend/trt/util.py", line 83, in check_onnx_parser_errors
    G_LOGGER.critical("Could not parse ONNX correctly")
  File "/opt/conda/lib/python3.10/site-packages/polygraphy/logger/logger.py", line 605, in critical
    raise ExceptionType(message) from None
polygraphy.exception.exception.PolygraphyException: Could not parse ONNX correctly

besides TensorRT Unet run is OK !

No longer works on new comfy..?

Hey there. Recently I've made an XL model engine and it worked, however, after updating my comfyui to the latest version, I encounter this error regardless of the model base.
TensorRTEngineOriginModelPatcherWrapper_UnetPatch.patch_model() got an unexpected keyword argument 'patch_weights'
Simpliest workflow, nothing custom.

ModuleNotFoundError: No module named 'sfast'

Wheels have build correctly, but the extension fails to import...
I am running on Windows. There is not any other relevant error (not connected)

stable-fast version 1.0.0 does not support Windows system？

win10 system
installation environment：
triton==2.0.0
torch==2.1.0+cu118
xformers==0.0.22.post7+cu118
CUDA==11.8
Python==3.10.6

ComfyUI_stable_fast
After updating the latest version：

Using xformers attention in VAE
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
Using xformers attention in VAE
missing {'cond_stage_model.clip_l.text_projection', 'cond_stage_model.clip_l.logit_scale'}
left over keys: dict_keys(['cond_stage_model.clip_l.transformer.text_model.embeddings.position_ids'])
ERROR:root:!!! Exception during processing !!!
ERROR:root:Traceback (most recent call last):
File "D:\0AI\0StableDiffusion\ComfyUI\ComfyUI\execution.py", line 153, in recursive_execute
output_data, output_ui = get_output_data(obj, input_data_all)
File "D:\0AI\0StableDiffusion\ComfyUI\ComfyUI\execution.py", line 83, in get_output_data
return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True)
File "D:\0AI\0StableDiffusion\ComfyUI\ComfyUI\execution.py", line 76, in map_node_over_list
results.append(getattr(obj, func)(**slice_dict(input_data_all, i)))
File "D:\0AI\0StableDiffusion\ComfyUI\ComfyUI\custom_nodes\ComfyUI_stable_fast\node.py", line 101, in apply_stable_fast
config = gen_stable_fast_config()
File "D:\0AI\0StableDiffusion\ComfyUI\ComfyUI\custom_nodes\ComfyUI_stable_fast\node.py", line 22, in gen_stable_fast_config
import triton
File "D:\0AI\0StableDiffusion\ComfyUI\python_embeded\lib\site-packages\triton_init_.py", line 13, in
from . import language
File "D:\0AI\0StableDiffusion\ComfyUI\python_embeded\lib\site-packages\triton\language_init_.py", line 2, in
from . import core, extern, libdevice, random
File "D:\0AI\0StableDiffusion\ComfyUI\python_embeded\lib\site-packages\triton\language\core.py", line 1141, in
def abs(x):
File "D:\0AI\0StableDiffusion\ComfyUI\python_embeded\lib\site-packages\triton\runtime\jit.py", line 386, in jit
return JITFunction(args[0], **kwargs)
File "D:\0AI\0StableDiffusion\ComfyUI\python_embeded\lib\site-packages\triton\runtime\jit.py", line 315, in init
self.run = self._make_launcher()
File "D:\0AI\0StableDiffusion\ComfyUI\python_embeded\lib\site-packages\triton\runtime\jit.py", line 282, in _make_launcher
scope = {"version_key": version_key(), "get_cuda_stream": get_cuda_stream,
File "D:\0AI\0StableDiffusion\ComfyUI\python_embeded\lib\site-packages\triton\runtime\jit.py", line 82, in version_key with open(triton._C.libtriton.file, "rb") as f:
AttributeError: partially initialized module 'triton' has no attribute '_C' (most likely due to a circular import)

Upscale problem (two devices, cuda:0 and cpu!)

Hi guys, I have problem with UltimateSDUpscale, i cant use my model, i am in Manjaro linux with https://github.com/chengzeyi/stable-fast/releases/download/nightly/stable_fast-1.0.5.dev20240509+torch230cu121-cp311-cp311-manylinux2014_x86_64.whl and pytorch 2.3.0

python 3.11 with --disable-cuda-malloc

I am trying a upscale with SDXL, should it work? (Not LORA, not controlnet, CUDA_GRAPH in FALSE)

0%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00,  5.69it/s]
  0%|                                                                                                                                                       | 0/5 [00:00<?, ?it/s]
!!! Exception during processing!!! The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
/home/noe/.pyenv/versions/comfy7_311/lib/python3.11/site-packages/sfast/jit/overrides.py(21): __torch_function__
/home/noe/Documentos/ComfyUI/comfy/model_sampling.py(102): timestep
/home/noe/Documentos/ComfyUI/comfy/model_base.py(135): apply_model
/home/noe/Documentos/ComfyUI/custom_nodes/ComfyUI-Advanced-ControlNet/adv_control/utils.py(64): apply_model_uncond_cleanup_wrapper
/home/noe/Documentos/ComfyUI/custom_nodes/ComfyUI_stable_fast/module/comfy_trace/model_base.py(43): forward
/home/noe/.pyenv/versions/comfy7_311/lib/python3.11/site-packages/torch/nn/modules/module.py(1522): _slow_forward
/home/noe/.pyenv/versions/comfy7_311/lib/python3.11/site-packages/torch/nn/modules/module.py(1541): _call_impl
/home/noe/.pyenv/versions/comfy7_311/lib/python3.11/site-packages/torch/nn/modules/module.py(1532): _wrapped_call_impl
/home/noe/.pyenv/versions/comfy7_311/lib/python3.11/site-packages/sfast/jit/trace_helper.py(154): forward
/home/noe/.pyenv/versions/comfy7_311/lib/python3.11/site-packages/torch/nn/modules/module.py(1522): _slow_forward
/home/noe/.pyenv/versions/comfy7_311/lib/python3.11/site-packages/torch/nn/modules/module.py(1541): _call_impl
/home/noe/.pyenv/versions/comfy7_311/lib/python3.11/site-packages/torch/nn/modules/module.py(1532): _wrapped_call_impl
/home/noe/.pyenv/versions/comfy7_311/lib/python3.11/site-packages/torch/jit/_trace.py(1088): trace_module
/home/noe/.pyenv/versions/comfy7_311/lib/python3.11/site-packages/torch/jit/_trace.py(820): trace
/home/noe/.pyenv/versions/comfy7_311/lib/python3.11/site-packages/sfast/jit/utils.py(35): better_trace
/home/noe/.pyenv/versions/comfy7_311/lib/python3.11/site-packages/sfast/jit/trace_helper.py(25): trace_with_kwargs
/home/noe/Documentos/ComfyUI/custom_nodes/ComfyUI_stable_fast/module/sfast_pipeline_compiler.py(94): __call__
/home/noe/Documentos/ComfyUI/custom_nodes/ComfyUI_stable_fast/node.py(65): __call__
/home/noe/Documentos/ComfyUI/comfy/samplers.py(226): calc_cond_batch
/home/noe/Documentos/ComfyUI/comfy/samplers.py(279): sampling_function
/home/noe/Documentos/ComfyUI/comfy/samplers.py(685): predict_noise
/home/noe/Documentos/ComfyUI/comfy/samplers.py(682): __call__
/home/noe/Documentos/ComfyUI/comfy/samplers.py(299): __call__
/home/noe/Documentos/ComfyUI/comfy/k_diffusion/sampling.py(635): sample_dpmpp_2m_sde
/home/noe/.pyenv/versions/comfy7_311/lib/python3.11/site-packages/torch/utils/_contextlib.py(115): decorate_context
/home/noe/Documentos/ComfyUI/comfy/k_diffusion/sampling.py(732): sample_dpmpp_2m_sde_gpu
/home/noe/.pyenv/versions/comfy7_311/lib/python3.11/site-packages/torch/utils/_contextlib.py(115): decorate_context
/home/noe/Documentos/ComfyUI/comfy/samplers.py(600): sample
/home/noe/Documentos/ComfyUI/comfy/samplers.py(695): inner_sample
/home/noe/Documentos/ComfyUI/comfy/samplers.py(716): sample
/home/noe/Documentos/ComfyUI/comfy/samplers.py(729): sample
/home/noe/Documentos/ComfyUI/comfy/sample.py(48): sample_custom
/home/noe/Documentos/ComfyUI/custom_nodes/comfyui-diffusion-cg/recenter.py(36): sample_center
/home/noe/Documentos/ComfyUI/custom_nodes/ComfyUI-Advanced-ControlNet/adv_control/utils.py(112): uncond_multiplier_check_cn_sample
/home/noe/Documentos/ComfyUI/custom_nodes/ComfyUI-Advanced-ControlNet/adv_control/control_reference.py(47): refcn_sample
/home/noe/Documentos/ComfyUI/custom_nodes/ComfyUI-AnimateDiff-Evolved/animatediff/sampling.py(434): motion_sample
/home/noe/Documentos/ComfyUI/comfy_extras/nodes_custom_sampler.py(455): sample
/home/noe/Documentos/ComfyUI/custom_nodes/ComfyUI_UltimateSDUpscale/modules/processing.py(95): sample
/home/noe/Documentos/ComfyUI/custom_nodes/ComfyUI_UltimateSDUpscale/modules/processing.py(173): process_images
/home/noe/Documentos/ComfyUI/custom_nodes/ComfyUI_UltimateSDUpscale/repositories/ultimate_sd_upscale/scripts/ultimate-upscale.py(180): linear_process
/home/noe/Documentos/ComfyUI/custom_nodes/ComfyUI_UltimateSDUpscale/repositories/ultimate_sd_upscale/scripts/ultimate-upscale.py(245): start
/home/noe/Documentos/ComfyUI/custom_nodes/ComfyUI_UltimateSDUpscale/repositories/ultimate_sd_upscale/scripts/ultimate-upscale.py(138): process
/home/noe/Documentos/ComfyUI/custom_nodes/ComfyUI_UltimateSDUpscale/repositories/ultimate_sd_upscale/scripts/ultimate-upscale.py(565): run
/home/noe/Documentos/ComfyUI/custom_nodes/ComfyUI_UltimateSDUpscale/nodes.py(151): upscale
/home/noe/Documentos/ComfyUI/custom_nodes/ComfyUI_UltimateSDUpscale/nodes.py(213): upscale
/home/noe/Documentos/ComfyUI/execution.py(75): map_node_over_list
/home/noe/Documentos/ComfyUI/execution.py(82): get_output_data
/home/noe/Documentos/ComfyUI/execution.py(152): recursive_execute
/home/noe/Documentos/ComfyUI/execution.py(135): recursive_execute
/home/noe/Documentos/ComfyUI/execution.py(403): execute
/home/noe/Documentos/ComfyUI/custom_nodes/rgthree-comfy/__init__.py(217): rgthree_execute
/home/noe/Documentos/ComfyUI/main.py(121): prompt_worker
/home/noe/.pyenv/versions/3.11.7/lib/python3.11/threading.py(982): run
/home/noe/.pyenv/versions/3.11.7/lib/python3.11/threading.py(1045): _bootstrap_inner
/home/noe/.pyenv/versions/3.11.7/lib/python3.11/threading.py(1002): _bootstrap
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

Traceback (most recent call last):
  File "/home/noe/Documentos/ComfyUI/execution.py", line 152, in recursive_execute
    output_data, output_ui = get_output_data(obj, input_data_all)
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/noe/Documentos/ComfyUI/execution.py", line 82, in get_output_data
    return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/noe/Documentos/ComfyUI/execution.py", line 75, in map_node_over_list
    results.append(getattr(obj, func)(**slice_dict(input_data_all, i)))
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/noe/Documentos/ComfyUI/custom_nodes/ComfyUI_UltimateSDUpscale/nodes.py", line 213, in upscale
    return super().upscale(image, model, positive, negative, vae, upscale_by, seed,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/noe/Documentos/ComfyUI/custom_nodes/ComfyUI_UltimateSDUpscale/nodes.py", line 151, in upscale
    processed = script.run(p=self.sdprocessing, _=None, tile_width=self.tile_width, tile_height=self.tile_height,
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/noe/Documentos/ComfyUI/custom_nodes/ComfyUI_UltimateSDUpscale/repositories/ultimate_sd_upscale/scripts/ultimate-upscale.py", line 565, in run
    upscaler.process()
  File "/home/noe/Documentos/ComfyUI/custom_nodes/ComfyUI_UltimateSDUpscale/repositories/ultimate_sd_upscale/scripts/ultimate-upscale.py", line 138, in process
    self.image = self.redraw.start(self.p, self.image, self.rows, self.cols)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/noe/Documentos/ComfyUI/custom_nodes/ComfyUI_UltimateSDUpscale/repositories/ultimate_sd_upscale/scripts/ultimate-upscale.py", line 245, in start
    return self.linear_process(p, image, rows, cols)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/noe/Documentos/ComfyUI/custom_nodes/ComfyUI_UltimateSDUpscale/repositories/ultimate_sd_upscale/scripts/ultimate-upscale.py", line 180, in linear_process
    processed = processing.process_images(p)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/noe/Documentos/ComfyUI/custom_nodes/ComfyUI_UltimateSDUpscale/modules/processing.py", line 173, in process_images
    samples = sample(p.model, p.seed, p.steps, p.cfg, p.sampler_name, p.scheduler, positive_cropped,
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/noe/Documentos/ComfyUI/custom_nodes/ComfyUI_UltimateSDUpscale/modules/processing.py", line 95, in sample
    (samples, _) = getattr(custom_sample, custom_sample.FUNCTION)(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/noe/Documentos/ComfyUI/comfy_extras/nodes_custom_sampler.py", line 455, in sample
    samples = comfy.sample.sample_custom(model, noise, cfg, sampler, sigmas, positive, negative, latent_image, noise_mask=noise_mask, callback=callback, disable_pbar=disable_pbar, seed=noise_seed)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/noe/Documentos/ComfyUI/custom_nodes/ComfyUI-AnimateDiff-Evolved/animatediff/sampling.py", line 434, in motion_sample
    return orig_comfy_sample(model, noise, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/noe/Documentos/ComfyUI/custom_nodes/ComfyUI-Advanced-ControlNet/adv_control/control_reference.py", line 47, in refcn_sample
    return orig_comfy_sample(model, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/noe/Documentos/ComfyUI/custom_nodes/ComfyUI-Advanced-ControlNet/adv_control/utils.py", line 112, in uncond_multiplier_check_cn_sample
    return orig_comfy_sample(model, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/noe/Documentos/ComfyUI/custom_nodes/comfyui-diffusion-cg/recenter.py", line 36, in sample_center
    return SAMPLE(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/noe/Documentos/ComfyUI/comfy/sample.py", line 48, in sample_custom
    samples = comfy.samplers.sample(model, noise, positive, negative, cfg, model.load_device, sampler, sigmas, model_options=model.model_options, latent_image=latent_image, denoise_mask=noise_mask, callback=callback, disable_pbar=disable_pbar, seed=seed)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/noe/Documentos/ComfyUI/comfy/samplers.py", line 729, in sample
    return cfg_guider.sample(noise, latent_image, sampler, sigmas, denoise_mask, callback, disable_pbar, seed)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/noe/Documentos/ComfyUI/comfy/samplers.py", line 716, in sample
    output = self.inner_sample(noise, latent_image, device, sampler, sigmas, denoise_mask, callback, disable_pbar, seed)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/noe/Documentos/ComfyUI/comfy/samplers.py", line 695, in inner_sample
    samples = sampler.sample(self, sigmas, extra_args, callback, noise, latent_image, denoise_mask, disable_pbar)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/noe/Documentos/ComfyUI/comfy/samplers.py", line 600, in sample
    samples = self.sampler_function(model_k, noise, sigmas, extra_args=extra_args, callback=k_callback, disable=disable_pbar, **self.extra_options)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/noe/.pyenv/versions/comfy7_311/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/noe/Documentos/ComfyUI/comfy/k_diffusion/sampling.py", line 732, in sample_dpmpp_2m_sde_gpu
    return sample_dpmpp_2m_sde(model, x, sigmas, extra_args=extra_args, callback=callback, disable=disable, eta=eta, s_noise=s_noise, noise_sampler=noise_sampler, solver_type=solver_type)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/noe/.pyenv/versions/comfy7_311/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/noe/Documentos/ComfyUI/comfy/k_diffusion/sampling.py", line 635, in sample_dpmpp_2m_sde
    denoised = model(x, sigmas[i] * s_in, **extra_args)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/noe/Documentos/ComfyUI/comfy/samplers.py", line 299, in __call__
    out = self.inner_model(x, sigma, model_options=model_options, seed=seed)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/noe/Documentos/ComfyUI/comfy/samplers.py", line 682, in __call__
    return self.predict_noise(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/noe/Documentos/ComfyUI/comfy/samplers.py", line 685, in predict_noise
    return sampling_function(self.inner_model, x, timestep, self.conds.get("negative", None), self.conds.get("positive", None), self.cfg, model_options=model_options, seed=seed)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/noe/Documentos/ComfyUI/comfy/samplers.py", line 279, in sampling_function
    out = calc_cond_batch(model, conds, x, timestep, model_options)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/noe/Documentos/ComfyUI/comfy/samplers.py", line 226, in calc_cond_batch
    output = model_options['model_function_wrapper'](model.apply_model, {"input": input_x, "timestep": timestep_, "c": c, "cond_or_uncond": cond_or_uncond}).chunk(batch_chunks)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/noe/Documentos/ComfyUI/custom_nodes/ComfyUI_stable_fast/node.py", line 65, in __call__
    return self.stable_fast_model(
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/noe/Documentos/ComfyUI/custom_nodes/ComfyUI_stable_fast/module/sfast_pipeline_compiler.py", line 107, in __call__
    return traced_module(**kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/noe/.pyenv/versions/comfy7_311/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/noe/.pyenv/versions/comfy7_311/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/noe/.pyenv/versions/comfy7_311/lib/python3.11/site-packages/sfast/jit/trace_helper.py", line 133, in forward
    outputs = self.module(*self.convert_inputs(args, kwargs))
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/noe/.pyenv/versions/comfy7_311/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/noe/.pyenv/versions/comfy7_311/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
/home/noe/.pyenv/versions/comfy7_311/lib/python3.11/site-packages/sfast/jit/overrides.py(21): __torch_function__
/home/noe/Documentos/ComfyUI/comfy/model_sampling.py(102): timestep
/home/noe/Documentos/ComfyUI/comfy/model_base.py(135): apply_model
/home/noe/Documentos/ComfyUI/custom_nodes/ComfyUI-Advanced-ControlNet/adv_control/utils.py(64): apply_model_uncond_cleanup_wrapper
/home/noe/Documentos/ComfyUI/custom_nodes/ComfyUI_stable_fast/module/comfy_trace/model_base.py(43): forward
/home/noe/.pyenv/versions/comfy7_311/lib/python3.11/site-packages/torch/nn/modules/module.py(1522): _slow_forward
/home/noe/.pyenv/versions/comfy7_311/lib/python3.11/site-packages/torch/nn/modules/module.py(1541): _call_impl
/home/noe/.pyenv/versions/comfy7_311/lib/python3.11/site-packages/torch/nn/modules/module.py(1532): _wrapped_call_impl
/home/noe/.pyenv/versions/comfy7_311/lib/python3.11/site-packages/sfast/jit/trace_helper.py(154): forward
/home/noe/.pyenv/versions/comfy7_311/lib/python3.11/site-packages/torch/nn/modules/module.py(1522): _slow_forward
/home/noe/.pyenv/versions/comfy7_311/lib/python3.11/site-packages/torch/nn/modules/module.py(1541): _call_impl
/home/noe/.pyenv/versions/comfy7_311/lib/python3.11/site-packages/torch/nn/modules/module.py(1532): _wrapped_call_impl
/home/noe/.pyenv/versions/comfy7_311/lib/python3.11/site-packages/torch/jit/_trace.py(1088): trace_module
/home/noe/.pyenv/versions/comfy7_311/lib/python3.11/site-packages/torch/jit/_trace.py(820): trace
/home/noe/.pyenv/versions/comfy7_311/lib/python3.11/site-packages/sfast/jit/utils.py(35): better_trace
/home/noe/.pyenv/versions/comfy7_311/lib/python3.11/site-packages/sfast/jit/trace_helper.py(25): trace_with_kwargs
/home/noe/Documentos/ComfyUI/custom_nodes/ComfyUI_stable_fast/module/sfast_pipeline_compiler.py(94): __call__
/home/noe/Documentos/ComfyUI/custom_nodes/ComfyUI_stable_fast/node.py(65): __call__
/home/noe/Documentos/ComfyUI/comfy/samplers.py(226): calc_cond_batch
/home/noe/Documentos/ComfyUI/comfy/samplers.py(279): sampling_function
/home/noe/Documentos/ComfyUI/comfy/samplers.py(685): predict_noise
/home/noe/Documentos/ComfyUI/comfy/samplers.py(682): __call__
/home/noe/Documentos/ComfyUI/comfy/samplers.py(299): __call__
/home/noe/Documentos/ComfyUI/comfy/k_diffusion/sampling.py(635): sample_dpmpp_2m_sde
/home/noe/.pyenv/versions/comfy7_311/lib/python3.11/site-packages/torch/utils/_contextlib.py(115): decorate_context
/home/noe/Documentos/ComfyUI/comfy/k_diffusion/sampling.py(732): sample_dpmpp_2m_sde_gpu
/home/noe/.pyenv/versions/comfy7_311/lib/python3.11/site-packages/torch/utils/_contextlib.py(115): decorate_context
/home/noe/Documentos/ComfyUI/comfy/samplers.py(600): sample
/home/noe/Documentos/ComfyUI/comfy/samplers.py(695): inner_sample
/home/noe/Documentos/ComfyUI/comfy/samplers.py(716): sample
/home/noe/Documentos/ComfyUI/comfy/samplers.py(729): sample
/home/noe/Documentos/ComfyUI/comfy/sample.py(48): sample_custom
/home/noe/Documentos/ComfyUI/custom_nodes/comfyui-diffusion-cg/recenter.py(36): sample_center
/home/noe/Documentos/ComfyUI/custom_nodes/ComfyUI-Advanced-ControlNet/adv_control/utils.py(112): uncond_multiplier_check_cn_sample
/home/noe/Documentos/ComfyUI/custom_nodes/ComfyUI-Advanced-ControlNet/adv_control/control_reference.py(47): refcn_sample
/home/noe/Documentos/ComfyUI/custom_nodes/ComfyUI-AnimateDiff-Evolved/animatediff/sampling.py(434): motion_sample
/home/noe/Documentos/ComfyUI/comfy_extras/nodes_custom_sampler.py(455): sample
/home/noe/Documentos/ComfyUI/custom_nodes/ComfyUI_UltimateSDUpscale/modules/processing.py(95): sample
/home/noe/Documentos/ComfyUI/custom_nodes/ComfyUI_UltimateSDUpscale/modules/processing.py(173): process_images
/home/noe/Documentos/ComfyUI/custom_nodes/ComfyUI_UltimateSDUpscale/repositories/ultimate_sd_upscale/scripts/ultimate-upscale.py(180): linear_process
/home/noe/Documentos/ComfyUI/custom_nodes/ComfyUI_UltimateSDUpscale/repositories/ultimate_sd_upscale/scripts/ultimate-upscale.py(245): start
/home/noe/Documentos/ComfyUI/custom_nodes/ComfyUI_UltimateSDUpscale/repositories/ultimate_sd_upscale/scripts/ultimate-upscale.py(138): process
/home/noe/Documentos/ComfyUI/custom_nodes/ComfyUI_UltimateSDUpscale/repositories/ultimate_sd_upscale/scripts/ultimate-upscale.py(565): run
/home/noe/Documentos/ComfyUI/custom_nodes/ComfyUI_UltimateSDUpscale/nodes.py(151): upscale
/home/noe/Documentos/ComfyUI/custom_nodes/ComfyUI_UltimateSDUpscale/nodes.py(213): upscale
/home/noe/Documentos/ComfyUI/execution.py(75): map_node_over_list
/home/noe/Documentos/ComfyUI/execution.py(82): get_output_data
/home/noe/Documentos/ComfyUI/execution.py(152): recursive_execute
/home/noe/Documentos/ComfyUI/execution.py(135): recursive_execute
/home/noe/Documentos/ComfyUI/execution.py(403): execute
/home/noe/Documentos/ComfyUI/custom_nodes/rgthree-comfy/__init__.py(217): rgthree_execute
/home/noe/Documentos/ComfyUI/main.py(121): prompt_worker
/home/noe/.pyenv/versions/3.11.7/lib/python3.11/threading.py(982): run
/home/noe/.pyenv/versions/3.11.7/lib/python3.11/threading.py(1045): _bootstrap_inner
/home/noe/.pyenv/versions/3.11.7/lib/python3.11/threading.py(1002): _bootstrap
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!


Prompt executed in 144.99 seconds
No pending upload

Licensing

Is there a particular license you have in mind for this? I noticed https://github.com/chengzeyi/stable-fast is using the MIT license, and I'd prefer that one if it works for you.

ImportError: cannot import name '_enable_xformers' from 'sfast.compilers.stable_diffusion_pipeline_compiler'

`sfast.compilers.stable_diffusion_pipeline_compiler` is deprecated. Please use `sfast.compilers.diffusion_pipeline_compiler` instead.
ComfyUI_stable_fast: StableFast node import failed.
Traceback (most recent call last):
  File "/app/custom_nodes/ComfyUI_stable_fast/__init__.py", line 10, in <module>
    from .node import ApplyStableFastUnet
  File "/app/custom_nodes/ComfyUI_stable_fast/node.py", line 4, in <module>
    from .module.sfast_pipeline_compiler import build_lazy_trace_module
  File "/app/custom_nodes/ComfyUI_stable_fast/module/sfast_pipeline_compiler.py", line 6, in <module>
    from sfast.compilers.stable_diffusion_pipeline_compiler import (
ImportError: cannot import name '_enable_xformers' from 'sfast.compilers.stable_diffusion_pipeline_compiler' (/opt/conda/lib/python3.10/site-packages/sfast/compilers/stable_diffusion_pipeline_compiler.py)

as it said "sfast.compilers.stable_diffusion_pipeline_compiler is deprecated. Please use sfast.compilers.diffusion_pipeline_compiler "

Demo for Apply TensorRT ControlNet

Hi,

I am wondering if there is a demo picture for "Apply TensorRT ControlNet" because in the asset folder, there is only for lora and pure unet inference. Thanks

Jason

When loading the graph, the following node types were not found: PatchModelAddDownscale

When loading the graph, the following node types were not found:
PatchModelAddDownscale
Nodes that have failed to load will show as red on the graph.

how to fix it ?

Dynamic thresholding

When I use CFG Scale Fix, the model is reloaded when loading the sampler, and there is also a "FaceDetailer" facial repair plugin that causes the model to be reloaded

ERROR IMPORTING sfast

I don't know what this means. I think this repository is too old to make use of the new wheels and touch.
ERROR IMPORTING sfast._C
Unable to load stable-fast C extension.
Is it compatible with your PyTorch installation?
Or is it compatible with your CUDA version?

ComfyUI_stable_fast: StableFast node import failed.
Traceback (most recent call last):
File "H:\ComfyUI\custom_nodes\ComfyUI_stable_fast_init_.py", line 10, in
from .node import ApplyStableFastUnet
File "H:\ComfyUI\custom_nodes\ComfyUI_stable_fast\node.py", line 2, in
from sfast.compilers.diffusion_pipeline_compiler import CompilationConfig
File "C:\comfyVENV\Lib\site-packages\sfast_init_.py", line 23, in
import sfast._C as _C
ImportError: DLL load failed while importing _C: The specified procedure could not be found.

Force quits but doesn't throw any useful error

After applying a simple stable-fast Unet like this

ComfyUI force quits with no apparent error except a few warnings

Using xformers attention in VAE
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
Using xformers attention in VAE
missing {'cond_stage_model.clip_l.text_projection', 'cond_stage_model.clip_l.logit_scale'}
left over keys: dict_keys(['alphas_cumprod', 'alphas_cumprod_prev', 'betas', 'log_one_minus_alphas_cumprod', 'model_ema.decay', 'model_ema.num_updates', 'posterior_log_variance_clipped', 'posterior_mean_coef1', 'posterior_mean_coef2', 'posterior_variance', 'sqrt_alphas_cumprod', 'sqrt_one_minus_alphas_cumprod', 'sqrt_recip_alphas_cumprod', 'sqrt_recipm1_alphas_cumprod', 'cond_stage_model.clip_l.transformer.text_model.embeddings.position_ids'])
triton not installed, skip
Requested to load SD1ClipModel
Loading 1 new model
Requested to load AutoencoderKL
Loading 1 new model
Requested to load BaseModel
Loading 1 new model
  0%|                                                                                           | 0/30 [00:00<?, ?it/s]enable_xformers_memory_efficient_attention() is not available. If you have enabled xformers by other means, ignore this warning.
F:\automatic\venv\lib\site-packages\sfast\jit\overrides.py:21: TracerWarning: Converting a tensor to a Python number might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  return func(*args, **kwargs)
F:\automatic\venv\lib\site-packages\sfast\jit\overrides.py:21: TracerWarning: Converting a tensor to a Python list might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  return func(*args, **kwargs)
F:\automatic\venv\lib\site-packages\sfast\jit\overrides.py:21: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  return func(*args, **kwargs)
F:\automatic\venv\lib\site-packages\sfast\jit\overrides.py:21: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.

Removing stable-fast and inference works correctly

Torch 2.1.2 cu118
Windows 11
xformers installed

An error related to TracerWarning has occurred during generation.

triton is skipped because it is Windows.
The stable-fast used was stable_fast-0.0.9+torch210cu118-cp310-cp310-win_amd64.whl

used_workflow.json
comfyui.prev.log

** ComfyUI start up time: 2023-11-13 12:24:24.636300

Prestartup times for custom nodes:
   0.0 seconds: D:\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-Manager

Total VRAM 10240 MB, total RAM 64664 MB
xformers version: 0.0.22.post7+cu118
Set vram state to: NORMAL_VRAM
Device: cuda:0 NVIDIA GeForce RTX 3080 : cudaMallocAsync
VAE dtype: torch.bfloat16
Using xformers cross attention
### Loading: ComfyUI-Manager (V0.30.4)
### ComfyUI Revision: 1677 [4aeef781]

Import times for custom nodes:
   0.0 seconds: D:\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI_stable_fast
   0.0 seconds: D:\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-WD14-Tagger
   0.4 seconds: D:\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-Manager

Starting server

To see the GUI go to: http://127.0.0.1:8188
FETCH DATA from: D:\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-Manager\extension-node-map.json
got prompt
model_type EPS
adm 2816
Using xformers attention in VAE
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
Using xformers attention in VAE
missing {'cond_stage_model.clip_l.logit_scale', 'cond_stage_model.clip_l.text_projection'}
left over keys: dict_keys(['cond_stage_model.clip_l.transformer.text_model.embeddings.position_ids'])
triton not installed, skip
Requested to load SDXLClipModel
Loading 1 new model
D:\ComfyUI_windows_portable\python_embeded\lib\site-packages\torch\_utils.py:831: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  return self.fget.__get__(instance, owner)()
Requested to load SDXL
Loading 1 new model
D:\ComfyUI_windows_portable\python_embeded\lib\site-packages\sfast\utils\flat_tensors.py:157: TracerWarning: Converting a tensor to a Python number might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  obj_type = tensors[start].item()
D:\ComfyUI_windows_portable\python_embeded\lib\site-packages\sfast\utils\flat_tensors.py:216: TracerWarning: Converting a tensor to a Python number might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  size = tensors[start].item()
D:\ComfyUI_windows_portable\python_embeded\lib\site-packages\sfast\utils\flat_tensors.py:226: TracerWarning: Converting a tensor to a Python number might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  size = tensors[start].item()
D:\ComfyUI_windows_portable\python_embeded\lib\site-packages\sfast\utils\flat_tensors.py:212: TracerWarning: Converting a tensor to a Python list might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  return bytes(tensors[start].tolist()), start + 1
D:\ComfyUI_windows_portable\python_embeded\lib\site-packages\sfast\utils\flat_tensors.py:203: TracerWarning: Converting a tensor to a Python number might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  return int(tensors[start].item()), start + 1
D:\ComfyUI_windows_portable\ComfyUI\comfy\ldm\modules\diffusionmodules\openaimodel.py:619: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert y.shape[0] == x.shape[0]
D:\ComfyUI_windows_portable\ComfyUI\comfy\ldm\modules\diffusionmodules\openaimodel.py:125: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert x.shape[1] == self.channels
D:\ComfyUI_windows_portable\ComfyUI\comfy\ldm\modules\diffusionmodules\openaimodel.py:83: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert x.shape[1] == self.channels
D:\ComfyUI_windows_portable\python_embeded\lib\site-packages\sfast\utils\flat_tensors.py:21: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  return torch.tensor([num], dtype=torch.int64)

Commit cb7c3a2 in ComfyUI broke this node

Getting the following error during execution in SF node:

'PatchUNetModel' object has no attribute 'default_image_only_indicator'

This commit changed the possible values of image_only_indicator:

comfyanonymous/ComfyUI@cb7c3a2

How to use it in the workflow of SVD

Could do you provide a screeshot how to use in SVD? Thanks

How to use controlNet

I have installed stable_fast and executed the text-to-image process by incorporating the 'Apply StableFast Unet' node. Initially, the workflow runs successfully when I bypass the 'Apply ControlNet' step. However, upon enabling ControlNet, the workflow fails during the Ksample run, resulting in an exception.

Error occurred when executing KSampler:

The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):

graph(%1, %2, %3, %4, %5, %6, %7, %8, %9, %10, %11, %12, %13, %14, %15):
%x = sfast::cudnn_convolution_bias_add(%1, %2, %3, %14, %15, %4, %5, %6, %7, %8, %9)
~~~~~ <--- HERE
return (%x)

RuntimeError: Expected tensor for argument #1 'input' to have the same type as tensor for argument #3 'z'; but type torch.cuda.HalfTensor does not equal torch.cuda.FloatTensor (while checking arguments for cudnn_convolution_bias_add_activation)


File "/app/execution.py", line 153, in recursive_execute
output_data, output_ui = get_output_data(obj, input_data_all)
File "/app/execution.py", line 83, in get_output_data
return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True)
File "/app/execution.py", line 76, in map_node_over_list
results.append(getattr(obj, func)(**slice_dict(input_data_all, i)))
File "/app/nodes.py", line 1371, in sample
return common_ksampler(model, seed, steps, cfg, sampler_name, scheduler, positive, negative, latent_image, denoise=denoise)
File "/app/nodes.py", line 1341, in common_ksampler
samples = comfy.sample.sample(model, noise, steps, cfg, sampler_name, scheduler, positive, negative, latent_image,
File "/app/comfy/sample.py", line 100, in sample
samples = sampler.sample(noise, positive_copy, negative_copy, cfg=cfg, latent_image=latent_image, start_step=start_step, last_step=last_step, force_full_denoise=force_full_denoise, denoise_mask=noise_mask, sigmas=sigmas, callback=callback, disable_pbar=disable_pbar, seed=seed)
File "/app/comfy/samplers.py", line 703, in sample
return sample(self.model, noise, positive, negative, cfg, self.device, sampler, sigmas, self.model_options, latent_image=latent_image, denoise_mask=denoise_mask, callback=callback, disable_pbar=disable_pbar, seed=seed)
File "/app/comfy/samplers.py", line 608, in sample
samples = sampler.sample(model_wrap, sigmas, extra_args, callback, noise, latent_image, denoise_mask, disable_pbar)
File "/app/comfy/samplers.py", line 547, in sample
samples = self.sampler_function(model_k, noise, sigmas, extra_args=extra_args, callback=k_callback, disable=disable_pbar, **self.extra_options)
File "/opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/app/comfy/k_diffusion/sampling.py", line 580, in sample_dpmpp_2m
denoised = model(x, sigmas[i] * s_in, **extra_args)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/app/comfy/samplers.py", line 285, in forward
out = self.inner_model(x, sigma, cond=cond, uncond=uncond, cond_scale=cond_scale, model_options=model_options, seed=seed)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/app/comfy/samplers.py", line 272, in forward
return self.apply_model(*args, **kwargs)
File "/app/comfy/samplers.py", line 269, in apply_model
out = sampling_function(self.inner_model, x, timestep, uncond, cond, cond_scale, model_options=model_options, seed=seed)
File "/app/comfy/samplers.py", line 249, in sampling_function
cond_pred, uncond_pred = calc_cond_uncond_batch(model, cond, uncond_, x, timestep, model_options)
File "/app/comfy/samplers.py", line 221, in calc_cond_uncond_batch
output = model_options['model_function_wrapper'](model.apply_model, {"input": input_x, "timestep": timestep_, "c": c, "cond_or_uncond": cond_or_uncond}).chunk(batch_chunks)
File "/app/custom_nodes/ComfyUI_stable_fast/node.py", line 65, in __call__
return self.stable_fast_model(
File "/app/custom_nodes/ComfyUI_stable_fast/module/sfast_pipeline_compiler.py", line 104, in __call__
return traced_module(**kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/sfast/jit/trace_helper.py", line 133, in forward
outputs = self.module(*self.convert_inputs(args, kwargs))
File "/opt/conda/lib/python3.10/site-packages/sfast/cuda/graphs.py", line 40, in dynamic_graphed_callable
cached_callable = simple_make_graphed_callable(
File "/opt/conda/lib/python3.10/site-packages/sfast/cuda/graphs.py", line 61, in simple_make_graphed_callable
return make_graphed_callable(func,
File "/opt/conda/lib/python3.10/site-packages/sfast/cuda/graphs.py", line 90, in make_graphed_callable
func(*tree_copy(example_inputs, detach=True),
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)

How can I use stable_fast with controlnet? thank you very much.

RuntimeError: Failed to execute cutlass gemm: Error Internal

Error occurred when executing KSampler:

The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):

graph(%input, %weight, %bias, %chunks, %dim, %approximate):
%output = sfast::cutlass_linear_geglu_unified(%input, %weight, %bias)
~~~~~ <--- HERE
return (%output)
RuntimeError: Failed to execute cutlass gemm: Error Internal


File "/content/drive/MyDrive/SDUI/execution.py", line 152, in recursive_execute
output_data, output_ui = get_output_data(obj, input_data_all)
File "/content/drive/MyDrive/SDUI/execution.py", line 82, in get_output_data
return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True)
File "/content/drive/MyDrive/SDUI/execution.py", line 75, in map_node_over_list
results.append(getattr(obj, func)(**slice_dict(input_data_all, i)))
File "/content/drive/MyDrive/SDUI/nodes.py", line 1368, in sample
return common_ksampler(model, seed, steps, cfg, sampler_name, scheduler, positive, negative, latent_image, denoise=denoise)
File "/content/drive/MyDrive/SDUI/nodes.py", line 1338, in common_ksampler
samples = comfy.sample.sample(model, noise, steps, cfg, sampler_name, scheduler, positive, negative, latent_image,
File "/content/drive/MyDrive/SDUI/comfy/sample.py", line 100, in sample
samples = sampler.sample(noise, positive_copy, negative_copy, cfg=cfg, latent_image=latent_image, start_step=start_step, last_step=last_step, force_full_denoise=force_full_denoise, denoise_mask=noise_mask, sigmas=sigmas, callback=callback, disable_pbar=disable_pbar, seed=seed)
File "/content/drive/MyDrive/SDUI/comfy/samplers.py", line 703, in sample
return sample(self.model, noise, positive, negative, cfg, self.device, sampler, sigmas, self.model_options, latent_image=latent_image, denoise_mask=denoise_mask, callback=callback, disable_pbar=disable_pbar, seed=seed)
File "/content/drive/MyDrive/SDUI/comfy/samplers.py", line 608, in sample
samples = sampler.sample(model_wrap, sigmas, extra_args, callback, noise, latent_image, denoise_mask, disable_pbar)
File "/content/drive/MyDrive/SDUI/comfy/samplers.py", line 547, in sample
samples = self.sampler_function(model_k, noise, sigmas, extra_args=extra_args, callback=k_callback, disable=disable_pbar, **self.extra_options)
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/content/drive/MyDrive/SDUI/comfy/k_diffusion/sampling.py", line 137, in sample_euler
denoised = model(x, sigma_hat * s_in, **extra_args)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/content/drive/MyDrive/SDUI/comfy/samplers.py", line 285, in forward
out = self.inner_model(x, sigma, cond=cond, uncond=uncond, cond_scale=cond_scale, model_options=model_options, seed=seed)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/content/drive/MyDrive/SDUI/comfy/samplers.py", line 272, in forward
return self.apply_model(*args, **kwargs)
File "/content/drive/MyDrive/SDUI/comfy/samplers.py", line 269, in apply_model
out = sampling_function(self.inner_model, x, timestep, uncond, cond, cond_scale, model_options=model_options, seed=seed)
File "/content/drive/MyDrive/SDUI/comfy/samplers.py", line 249, in sampling_function
cond_pred, uncond_pred = calc_cond_uncond_batch(model, cond, uncond_, x, timestep, model_options)
File "/content/drive/MyDrive/SDUI/comfy/samplers.py", line 221, in calc_cond_uncond_batch
output = model_options['model_function_wrapper'](model.apply_model, {"input": input_x, "timestep": timestep_, "c": c, "cond_or_uncond": cond_or_uncond}).chunk(batch_chunks)
File "/content/drive/MyDrive/SDUI/custom_nodes/ComfyUI_stable_fast/node.py", line 65, in __call__
return self.stable_fast_model(
File "/content/drive/MyDrive/SDUI/custom_nodes/ComfyUI_stable_fast/module/sfast_pipeline_compiler.py", line 104, in __call__
return traced_module(**kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/sfast/jit/trace_helper.py", line 133, in forward
outputs = self.module(*self.convert_inputs(args, kwargs))
File "/usr/local/lib/python3.10/dist-packages/sfast/cuda/graphs.py", line 40, in dynamic_graphed_callable
cached_callable = simple_make_graphed_callable(
File "/usr/local/lib/python3.10/dist-packages/sfast/cuda/graphs.py", line 61, in simple_make_graphed_callable
return make_graphed_callable(func,
File "/usr/local/lib/python3.10/dist-packages/sfast/cuda/graphs.py", line 90, in make_graphed_callable
func(*tree_copy(example_inputs, detach=True),
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)

compare cuda graph true or false

Hi. I just wondering why I choose cuda graph as false, then I compared without stable_fast node.. then the last one is faster??

this is cuda graph as false:

this is without stable_fast node:

this is my json:
workflow (43).json

Tensorrt error with SDXL

I am having trouble using tensorrt to load a larger model. I've tried SDXL and sd1b and I get these 2 errors

AssertionError: must specify y if and only if the model is class-conditional

And

RuntimeError: The serialized model is larger than the 2GiB limit imposed by the protobuf library. Therefore the output file must be a file path, so that the ONNX external data can be written to the same directory. Please specify the output file name.

No module named 'sfast.utils.xformers_attention

    from sfast.utils.xformers_attention import (
ModuleNotFoundError: No module named 'sfast.utils.xformers_attention'

Prompt executed in 2.32 seconds

No module named 'sfast' use https://github.com/chengzeyi/stable-fast/releases/tag/v1.0.1

stable_fast-1.0.1+torch211cu118-cp310-cp310-win_amd64.whl

stable_fast-1.0.1+torch211cu121-cp310-cp310-win_amd64.whl

stable_fast-1.0.1+torch212cu121-cp310-cp310-win_amd64.whl

REQUEST: More Official Node for Comfy

I wish you guys would develop this further. I love to use this, but it its not keeping up with the development of ONNX and tensorrt?
Please.!!
Current the node does not work if I enable Cuda Graph?

Other wise it works...

too slow

The image generation is too slow with 3090

Model Quantization

def quantize_unet(m):
from diffusers.utils import USE_PEFT_BACKEND
assert USE_PEFT_BACKEND
m = torch.quantization.quantize_dynamic(m, {torch.nn.Linear},
dtype=torch.qint8,
inplace=True)
return m

model.unet = quantize_unet(model.unet)
if hasattr(model, 'controlnet'):
model.controlnet = quantize_unet(model.controlnet)

significant VRAM reduction for transformers
How should this be used?
What is the author's opinion on Model Quantization?

Good job! I have helped make stable-fast work with it.

Hi, friend. I am the maintainer of stable-fast.

I occasionally found this project and tried it.
Sadly it does not work out of the box because stable-fast is incompatible with it.
I have adjusted my implementation to make it work and seen a noticeable performance improvement of 30% with batch size 4, 512x512 on SD 1.5.
Is is not fine-tuned so there is definitely large room to improve the speed.

Hope we can work together to better integrate it.

Runtime error: module 'xformers.ops' has no attribute 'MemoryEfficientAttentionTritonFwdFlashBwOp'

I'm trying to Apply Stable Fast UNet but I've got this error on Xformers library.

Am I missing any requirement?

I followed the process on Stable fast and this node
I'm using CUDA 12.1 on Linux
My model is a custom TurboXL, but the issue seems to be related to Xformers or dependencies on install
Also using LCM lora

Here is the full error description:
`Error occurred when executing KSamplerAdvanced:

module 'xformers.ops' has no attribute 'MemoryEfficientAttentionTritonFwdFlashBwOp'

File "/workspace/ComfyUI/execution.py", line 151, in recursive_execute
output_data, output_ui = get_output_data(obj, input_data_all)
File "/workspace/ComfyUI/execution.py", line 81, in get_output_data
return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True)
File "/workspace/ComfyUI/execution.py", line 74, in map_node_over_list
results.append(getattr(obj, func)(**slice_dict(input_data_all, i)))
File "/workspace/ComfyUI/nodes.py", line 1403, in sample
return common_ksampler(model, noise_seed, steps, cfg, sampler_name, scheduler, positive, negative, latent_image, denoise=denoise, disable_noise=disable_noise, start_step=start_at_step, last_step=end_at_step, force_full_denoise=force_full_denoise)
File "/workspace/ComfyUI/nodes.py", line 1339, in common_ksampler
samples = comfy.sample.sample(model, noise, steps, cfg, sampler_name, scheduler, positive, negative, latent_image,
File "/workspace/ComfyUI/comfy/sample.py", line 100, in sample
samples = sampler.sample(noise, positive_copy, negative_copy, cfg=cfg, latent_image=latent_image, start_step=start_step, last_step=last_step, force_full_denoise=force_full_denoise, denoise_mask=noise_mask, sigmas=sigmas, callback=callback, disable_pbar=disable_pbar, seed=seed)
File "/workspace/ComfyUI/comfy/samplers.py", line 704, in sample
return sample(self.model, noise, positive, negative, cfg, self.device, sampler, sigmas, self.model_options, latent_image=latent_image, denoise_mask=denoise_mask, callback=callback, disable_pbar=disable_pbar, seed=seed)
File "/workspace/ComfyUI/comfy/samplers.py", line 609, in sample
samples = sampler.sample(model_wrap, sigmas, extra_args, callback, noise, latent_image, denoise_mask, disable_pbar)
File "/workspace/ComfyUI/comfy/samplers.py", line 548, in sample
samples = self.sampler_function(model_k, noise, sigmas, extra_args=extra_args, callback=k_callback, disable=disable_pbar, **self.extra_options)
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/workspace/ComfyUI/comfy/k_diffusion/sampling.py", line 745, in sample_lcm
denoised = model(x, sigmas[i] * s_in, **extra_args)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/workspace/ComfyUI/comfy/samplers.py", line 286, in forward
out = self.inner_model(x, sigma, cond=cond, uncond=uncond, cond_scale=cond_scale, model_options=model_options, seed=seed)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/workspace/ComfyUI/comfy/samplers.py", line 273, in forward
return self.apply_model(*args, **kwargs)
File "/workspace/ComfyUI/comfy/samplers.py", line 270, in apply_model
out = sampling_function(self.inner_model, x, timestep, uncond, cond, cond_scale, model_options=model_options, seed=seed)
File "/workspace/ComfyUI/comfy/samplers.py", line 250, in sampling_function
cond_pred, uncond_pred = calc_cond_uncond_batch(model, cond, uncond_, x, timestep, model_options)
File "/workspace/ComfyUI/comfy/samplers.py", line 222, in calc_cond_uncond_batch
output = model_options['model_function_wrapper'](model.apply_model, {"input": input_x, "timestep": timestep_, "c": c, "cond_or_uncond": cond_or_uncond}).chunk(batch_chunks)
File "/workspace/ComfyUI/custom_nodes/ComfyUI_stable_fast/node.py", line 59, in __call__
self.stable_fast_model = build_lazy_trace_module(
File "/workspace/ComfyUI/custom_nodes/ComfyUI_stable_fast/module/sfast_pipeline_compiler.py", line 116, in build_lazy_trace_module
_enable_xformers(None)
File "/usr/local/lib/python3.10/dist-packages/sfast/compilers/diffusion_pipeline_compiler.py", line 321, in _enable_xformers
from sfast.libs.xformers.xformers_attention import xformers_memory_efficient_attention
File "/usr/local/lib/python3.10/dist-packages/sfast/libs/xformers/xformers_attention.py", line 14, in
ops.MemoryEfficientAttentionTritonFwdFlashBwOp:`

Another note, I thought it could be related to Xformer library was an issue from 2022, but it should not be the case now:
facebookresearch/xformers#611

main library versions installed:
xformers 0.0.25.dev761
triton 2.2.0
torch 2.2.0+cu121

No speed difference with the plugin installed

I have an updated ComfyUI setup with a 6GB GTX 1660 Super, and the speed is exactly the same in every generation. Should I "enable" the extension somehow? I only did git clone it into the custom_nodes folder.

To see the GUI go to: http://127.0.0.1:8188
FETCH DATA from: C:\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-Manager\extension-node-map.json
got prompt
model_type EPS
adm 0
Using xformers attention in VAE
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
Using xformers attention in VAE
missing {'cond_stage_model.clip_l.text_projection', 'cond_stage_model.clip_l.logit_scale'}
left over keys: dict_keys(['model_ema.decay', 'model_ema.num_updates'])
Requested to load SD1ClipModel
Loading 1 new model
Requested to load BaseModel
Loading 1 new model
100%|███████████████████████████████████████████████████████| 25/25 [00:23<00:00,  1.09it/s]
Global Step: 300167
Using xformers attention in VAE
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
Using xformers attention in VAE

That was the first generation after starting Comfy, the subsequent ones all take 23 seconds as well.

Very slow compilation of SDXL model

Hi,
Very slow compilation of SDXL model (on first run). It takes 7 minutes (RTX 4060 16GB), while it compiles SD1.5 model faster than a minute. Is this normal or is something wrong?

RuntimeError: RuntimeError: Failed to find C compiler. Please specify via CC environment variable.

Just followed the instructions to install on confyui on WSL, I recieve an error that seens related to sfast_triton
Edit: I removed triton lib, it's working now.

  0%|          | 0/4 [00:00<?, ?it/s]/root/miniconda3/lib/python3.11/site-packages/sfast/utils/flat_tensors.py:157: TracerWarning: Converting a tensor to a Python number might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  obj_type = tensors[start].item()
/root/miniconda3/lib/python3.11/site-packages/sfast/utils/flat_tensors.py:216: TracerWarning: Converting a tensor to a Python number might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  size = tensors[start].item()
/root/miniconda3/lib/python3.11/site-packages/sfast/utils/flat_tensors.py:226: TracerWarning: Converting a tensor to a Python number might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  size = tensors[start].item()
/root/miniconda3/lib/python3.11/site-packages/sfast/utils/flat_tensors.py:212: TracerWarning: Converting a tensor to a Python list might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  return bytes(tensors[start].tolist()), start + 1
/root/miniconda3/lib/python3.11/site-packages/sfast/utils/flat_tensors.py:203: TracerWarning: Converting a tensor to a Python number might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  return int(tensors[start].item()), start + 1
/mnt/g/ComfyUI_windows_portable/ComfyUI/comfy/ldm/modules/diffusionmodules/openaimodel.py:619: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert y.shape[0] == x.shape[0]
/mnt/g/ComfyUI_windows_portable/ComfyUI/comfy/ldm/modules/diffusionmodules/openaimodel.py:125: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert x.shape[1] == self.channels
/mnt/g/ComfyUI_windows_portable/ComfyUI/comfy/ldm/modules/diffusionmodules/openaimodel.py:83: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert x.shape[1] == self.channels
/root/miniconda3/lib/python3.11/site-packages/sfast/utils/flat_tensors.py:21: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  return torch.tensor([num], dtype=torch.int64)
  0%|          | 0/4 [03:07<?, ?it/s]
ERROR:root:!!! Exception during processing !!!
ERROR:root:Traceback (most recent call last):
  File "/mnt/g/ComfyUI_windows_portable/ComfyUI/execution.py", line 153, in recursive_execute
    output_data, output_ui = get_output_data(obj, input_data_all)
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/g/ComfyUI_windows_portable/ComfyUI/execution.py", line 83, in get_output_data
    return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/g/ComfyUI_windows_portable/ComfyUI/execution.py", line 76, in map_node_over_list
    results.append(getattr(obj, func)(**slice_dict(input_data_all, i)))
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/g/ComfyUI_windows_portable/ComfyUI/nodes.py", line 1237, in sample
    return common_ksampler(model, seed, steps, cfg, sampler_name, scheduler, positive, negative, latent_image, denoise=denoise)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/g/ComfyUI_windows_portable/ComfyUI/nodes.py", line 1207, in common_ksampler
    samples = comfy.sample.sample(model, noise, steps, cfg, sampler_name, scheduler, positive, negative, latent_image,
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/g/ComfyUI_windows_portable/ComfyUI/custom_nodes/ComfyUI-Impact-Pack/modules/impact/hacky.py", line 22, in informative_sample
    raise e
  File "/mnt/g/ComfyUI_windows_portable/ComfyUI/custom_nodes/ComfyUI-Impact-Pack/modules/impact/hacky.py", line 9, in informative_sample
    return original_sample(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/g/ComfyUI_windows_portable/ComfyUI/comfy/sample.py", line 100, in sample
    samples = sampler.sample(noise, positive_copy, negative_copy, cfg=cfg, latent_image=latent_image, start_step=start_step, last_step=last_step, force_full_denoise=force_full_denoise, denoise_mask=noise_mask, sigmas=sigmas, callback=callback, disable_pbar=disable_pbar, seed=seed)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/g/ComfyUI_windows_portable/ComfyUI/comfy/samplers.py", line 692, in sample
    return sample(self.model, noise, positive, negative, cfg, self.device, sampler(), sigmas, self.model_options, latent_image=latent_image, denoise_mask=denoise_mask, callback=callback, disable_pbar=disable_pbar, seed=seed)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/g/ComfyUI_windows_portable/ComfyUI/comfy/samplers.py", line 598, in sample
    samples = sampler.sample(model_wrap, sigmas, extra_args, callback, noise, latent_image, denoise_mask, disable_pbar)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/g/ComfyUI_windows_portable/ComfyUI/comfy/samplers.py", line 558, in sample
    samples = getattr(k_diffusion_sampling, "sample_{}".format(sampler_name))(model_k, noise, sigmas, extra_args=extra_args, callback=k_callback, disable=disable_pbar, **extra_options)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/g/ComfyUI_windows_portable/ComfyUI/comfy/k_diffusion/sampling.py", line 745, in sample_lcm
    denoised = model(x, sigmas[i] * s_in, **extra_args)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/g/ComfyUI_windows_portable/ComfyUI/comfy/samplers.py", line 275, in forward
    out = self.inner_model(x, sigma, cond=cond, uncond=uncond, cond_scale=cond_scale, model_options=model_options, seed=seed)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/g/ComfyUI_windows_portable/ComfyUI/comfy/samplers.py", line 265, in forward
    return self.apply_model(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/g/ComfyUI_windows_portable/ComfyUI/comfy/samplers.py", line 262, in apply_model
    out = sampling_function(self.inner_model, x, timestep, uncond, cond, cond_scale, model_options=model_options, seed=seed)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/g/ComfyUI_windows_portable/ComfyUI/comfy/samplers.py", line 250, in sampling_function
    cond, uncond = calc_cond_uncond_batch(model, cond, uncond, x, timestep, model_options)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/g/ComfyUI_windows_portable/ComfyUI/comfy/samplers.py", line 226, in calc_cond_uncond_batch
    output = model_options['model_function_wrapper'](model.apply_model, {"input": input_x, "timestep": timestep_, "c": c, "cond_or_uncond": cond_or_uncond}).chunk(batch_chunks)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/g/ComfyUI_windows_portable/ComfyUI/custom_nodes/ComfyUI_stable_fast/node.py", line 65, in __call__
    return self.stable_fast_model(input_x, timestep_, **c)(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.11/site-packages/sfast/jit/trace_helper.py", line 108, in forward
    outputs = self.module(*convert_to_flat_tensors((args, kwargs)))
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.11/site-packages/sfast/cuda/graphs.py", line 29, in dynamic_graphed_callable
    cached_callable = simple_make_graphed_callable(
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.11/site-packages/sfast/cuda/graphs.py", line 44, in simple_make_graphed_callable
    return make_graphed_callable(callable,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.11/site-packages/sfast/cuda/graphs.py", line 73, in make_graphed_callable
    callable(*copy.deepcopy(example_inputs),
  File "/root/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):

graph(%input, %num_groups, %weight, %bias, %eps, %cudnn_enabled):
    %y : Tensor = sfast_triton::group_norm_silu(%input, %num_groups, %weight, %bias, %eps)
                  ~~~~~~~~~~~~ <--- HERE
    return (%y)
RuntimeError: RuntimeError: Failed to find C compiler. Please specify via CC environment variable.

At:
  /root/miniconda3/lib/python3.11/site-packages/triton/common/build.py(73): _build
  /root/miniconda3/lib/python3.11/site-packages/triton/compiler/make_launcher.py(39): make_stub
  /root/miniconda3/lib/python3.11/site-packages/triton/compiler/compiler.py(425): compile
  <string>(63): group_norm_4d_channels_last_forward_collect_stats_kernel
  /root/miniconda3/lib/python3.11/site-packages/sfast/triton/__init__.py(35): new_func
  /root/miniconda3/lib/python3.11/site-packages/triton/runtime/autotuner.py(81): kernel_call
  /root/miniconda3/lib/python3.11/site-packages/triton/testing.py(104): do_bench
  /root/miniconda3/lib/python3.11/site-packages/triton/runtime/autotuner.py(84): _bench
  /root/miniconda3/lib/python3.11/site-packages/triton/runtime/autotuner.py(100): <dictcomp>
  /root/miniconda3/lib/python3.11/site-packages/triton/runtime/autotuner.py(100): run
  /root/miniconda3/lib/python3.11/site-packages/sfast/triton/__init__.py(35): new_func
  /root/miniconda3/lib/python3.11/site-packages/sfast/triton/ops/group_norm.py(318): group_norm_forward
  /root/miniconda3/lib/python3.11/site-packages/sfast/triton/torch_ops.py(186): forward
  /root/miniconda3/lib/python3.11/site-packages/torch/autograd/function.py(539): apply
  /root/miniconda3/lib/python3.11/site-packages/sfast/triton/torch_ops.py(224): group_norm_silu
  /root/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py(1527): _call_impl
  /root/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py(1518): _wrapped_call_impl
  /root/miniconda3/lib/python3.11/site-packages/sfast/cuda/graphs.py(73): make_graphed_callable
  /root/miniconda3/lib/python3.11/site-packages/sfast/cuda/graphs.py(44): simple_make_graphed_callable
  /root/miniconda3/lib/python3.11/site-packages/sfast/cuda/graphs.py(29): dynamic_graphed_callable
  /root/miniconda3/lib/python3.11/site-packages/sfast/jit/trace_helper.py(108): forward
  /root/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py(1527): _call_impl
  /root/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py(1518): _wrapped_call_impl
  /mnt/g/ComfyUI_windows_portable/ComfyUI/custom_nodes/ComfyUI_stable_fast/node.py(65): __call__
  /mnt/g/ComfyUI_windows_portable/ComfyUI/comfy/samplers.py(226): calc_cond_uncond_batch
  /mnt/g/ComfyUI_windows_portable/ComfyUI/comfy/samplers.py(250): sampling_function
  /mnt/g/ComfyUI_windows_portable/ComfyUI/comfy/samplers.py(262): apply_model
  /mnt/g/ComfyUI_windows_portable/ComfyUI/comfy/samplers.py(265): forward
  /root/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py(1527): _call_impl
  /root/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py(1518): _wrapped_call_impl
  /mnt/g/ComfyUI_windows_portable/ComfyUI/comfy/samplers.py(275): forward
  /root/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py(1527): _call_impl
  /root/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py(1518): _wrapped_call_impl
  /mnt/g/ComfyUI_windows_portable/ComfyUI/comfy/k_diffusion/sampling.py(745): sample_lcm
  /root/miniconda3/lib/python3.11/site-packages/torch/utils/_contextlib.py(115): decorate_context
  /mnt/g/ComfyUI_windows_portable/ComfyUI/comfy/samplers.py(558): sample
  /mnt/g/ComfyUI_windows_portable/ComfyUI/comfy/samplers.py(598): sample
  /mnt/g/ComfyUI_windows_portable/ComfyUI/comfy/samplers.py(692): sample
  /mnt/g/ComfyUI_windows_portable/ComfyUI/comfy/sample.py(100): sample
  /mnt/g/ComfyUI_windows_portable/ComfyUI/custom_nodes/ComfyUI-Impact-Pack/modules/impact/hacky.py(9): informative_sample
  /mnt/g/ComfyUI_windows_portable/ComfyUI/nodes.py(1207): common_ksampler
  /mnt/g/ComfyUI_windows_portable/ComfyUI/nodes.py(1237): sample
  /mnt/g/ComfyUI_windows_portable/ComfyUI/execution.py(76): map_node_over_list
  /mnt/g/ComfyUI_windows_portable/ComfyUI/execution.py(83): get_output_data
  /mnt/g/ComfyUI_windows_portable/ComfyUI/execution.py(153): recursive_execute
  /mnt/g/ComfyUI_windows_portable/ComfyUI/execution.py(136): recursive_execute
  /mnt/g/ComfyUI_windows_portable/ComfyUI/execution.py(136): recursive_execute
  /mnt/g/ComfyUI_windows_portable/ComfyUI/execution.py(377): execute
  /mnt/g/ComfyUI_windows_portable/ComfyUI/main.py(95): prompt_worker
  /root/miniconda3/lib/python3.11/threading.py(975): run
  /root/miniconda3/lib/python3.11/threading.py(1038): _bootstrap_inner
  /root/miniconda3/lib/python3.11/threading.py(995): _bootstrap



Prompt executed in 190.37 seconds```