Chains stable-diffusion-webui instances together to facilitate faster image generation.

Python 99.91% JavaScript 0.09%

stable-diffusion stable-diffusion-webui stable-diffusion-webui-plugin automatic1111 distributed-computing multi-gpu

stable-diffusion-webui-distributed's Introduction

stable-diffusion-webui-distributed

This extension enables you to chain multiple webui instances together for txt2img and img2img generation tasks.

For those with multi-gpu setups, yes this can be used for generation across all of those devices.

The main goal is minimizing the lag of (high batch size) requests from the main sdwui instance.

Diagram shows Master/slave architecture of the extension

Contributions and feedback are much appreciated!

Installation

On the master instance:

Go to the extensions tab, and swap to the "available" sub-tab. Then, search "Distributed", and hit install on this extension.

On each slave instance:

enable the api by passing --api and ensure it is listening by using --listen
ensure all of the models, scripts, and whatever else you think you might request from them is present
Ie. if you're using sd-1.5 on the controlling instance, then the sd-1.5 model should also be present on each slave instance. Otherwise, the remote will fallback to some other model that is present.

if you want to easily sync models between your nodes, you might want to use something like rclone

Tips

If benchmarking fails, try hitting the Redo benchmark button under the script's Util tab.
If any remote is taking far too long to returns its share of the batch, you can hit the Interrupt button in the Util tab.
If you think that a worker is being under-utilized, you can adjust the job timeout setting to be higher. However, doing this may be suboptimal in cases where the "slow" machine is actually really slow. Alternatively, you may just need to do a re-benchmark or manually edit the config.

Command-line arguments

--distributed-skip-verify-remotes Disable verification of remote worker TLS certificates (useful for if you are using self-signed certs like with auto tls-https)
--distributed-remotes-autosave Enable auto-saving of remote worker generations
--distributed-debug Enable debug information

stable-diffusion-webui-distributed's People

Contributors

Stargazers

Watchers

Forkers

solareon skyheros001 bencoster asll666 aria1th haorand mayermakes hebing0426 sm-da erickech kustomzone bobby20180331 wangshuniguang

stable-diffusion-webui-distributed's Issues

Grid generation doesn't include slave output

When running with loads that span across multiple slaves the output doesn't include a grid with all of the combined images. Only the grid from a single slave is shown. Example from my 4 instances below.

Is there a way to have this extension enabled by default (on startup)?

Is there a way to always enable this extension?
I am running a free service to generate images, so I have created three GPUs to do this, so I hope this extension can always be enabled so that others do not need to manually enable plugins

error with multiprompt generation

*** Error running before_process: D:\non-docker\illyforge\extensions\stable-diffusion-webui-distributed\scripts\distributed.py
    Traceback (most recent call last):
      File "D:\non-docker\illyforge\modules\scripts.py", line 795, in before_process
        script.before_process(p, *script_args)
      File "D:\non-docker\illyforge\extensions\stable-diffusion-webui-distributed\scripts\distributed.py", line 342, in before_process
        payload_temp['subseed'] += prior_images
    TypeError: 'int' object is not iterable

I have a script generating multiple images using different prompts (different seeds, subseeds, negative prompt). I want to distribute the script over different gpus and devices.
unfortunately, I get the above error. is there an easy way to hook into distributed and add a queue of requests which it will automatically split based on the estimated ipm, when distributed is enabled?

ips calculation error

Idk if it's an error or on purpose.
Master is doing 60it and slave 80it.

both run on same gpu, both have same it/s when generating on single mode.

Running out of linux memory and OOM when attempting benchmark

Master: 32GB RAM system running 3080Ti (12GB VRAM) on Windows 11 under docker with nvidia container toolkit in WSL2. Also fails when WSL2 is configured for 24GB. This gives it 6GB swap also. I'm wondering why 30GB total memory isn't enough.

Two slaves: both with 3090s. one with 96GB RAM running Ubuntu 20.04, and another with also 32GB RAM running WSL2 in Windows 11.

It's not clear how much memory is needed for the benchmark to take place. I guess I can give it a comical amount of swap, or, just make up workers config file also.

about distributed-config.json

This is a summary from DISCORD.

Configuration of remote workers is now done using distributed-config.json in the root of this extension directory instead of from command-line arguments.

if just empty json, throw ERROR config is corrupt or invalid JSON, unable to load

JSON example is as like as follow

{
   "workers": [
      {
         "master": {
            "avg_ipm": 18.7014339699366,
            "master": true,
            "address": "localhost",
            "port": 7860,
            "eta_percent_error": [],
            "tls": false,
            "state": 1
         }
      },
      {
         "laptop": {
            "avg_ipm": 19.332550555969075,
            "master": false,
            "address": "192.168.1.83",
            "port": 7861,
            "eta_percent_error": [
               -6.019789958534711,
               -0.9360936846472658,
               7.7202976971435096,
               2.5322154055319075,
               -53.415720437075485
            ],
            "tls": true,
            "state": 1
         }
      }
   ],
   "benchmark_payload": {
      "prompt": "A herd of cows grazing at the bottom of a sunny valley",
      "negative_prompt": "",
      "steps": 20,
      "width": 512,
      "height": 512,
      "batch_size": 1
   },
   "job_timeout": 6
}

Issues with a few other extensions authored by Haoming02

Benchmark attempts:

https://github.com/Haoming02/sd-webui-resharpen

Exception in thread master_benchmark:
Traceback (most recent call last):
  File "C:\Program Files\Python310\lib\threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "C:\Program Files\Python310\lib\threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "F:\stablediffusion\stable-diffusion-webui-1-8-0-rc\extensions\stable-diffusion-webui-distributed\scripts\spartan\world.py", li
ne 217, in benchmark_wrapped
    worker.avg_ipm = bench_func()
  File "F:\stablediffusion\stable-diffusion-webui-1-8-0-rc\extensions\stable-diffusion-webui-distributed\scripts\spartan\world.py", li
ne 374, in benchmark_master
    process_images(master_bench_payload)
  File "F:\stablediffusion\stable-diffusion-webui-1-8-0-rc\modules\processing.py", line 785, in process_images
    res = process_images_inner(p)
  File "F:\stablediffusion\stable-diffusion-webui-1-8-0-rc\extensions\sd-webui-controlnet\scripts\batch_hijack.py", line 59, in proces
sing_process_images_hijack
    return getattr(processing, '__controlnet_original_process_images_inner')(p, *args, **kwargs)
  File "F:\stablediffusion\stable-diffusion-webui-1-8-0-rc\modules\processing.py", line 921, in process_images_inner
    samples_ddim = p.sample(conditioning=p.c, unconditional_conditioning=p.uc, seeds=p.seeds, subseeds=p.subseeds, subseed_strength=p.
subseed_strength, prompts=p.prompts)
  File "F:\stablediffusion\stable-diffusion-webui-1-8-0-rc\modules\processing.py", line 1257, in sample
    samples = self.sampler.sample(self, x, conditioning, unconditional_conditioning, image_conditioning=self.txt2img_image_conditionin
g(x))
  File "F:\stablediffusion\stable-diffusion-webui-1-8-0-rc\modules\sd_samplers_kdiffusion.py", line 234, in sample
    samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args=self.sampler_extra_args, disable=False,
 callback=self.callback_state, **extra_params_kwargs))
  File "F:\stablediffusion\stable-diffusion-webui-1-8-0-rc\modules\sd_samplers_common.py", line 272, in launch_sampling
    return func()
  File "F:\stablediffusion\stable-diffusion-webui-1-8-0-rc\modules\sd_samplers_kdiffusion.py", line 234, in <lambda>
    samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args=self.sampler_extra_args, disable=False,
 callback=self.callback_state, **extra_params_kwargs))
  File "F:\stablediffusion\stable-diffusion-webui\venv\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context   
    return func(*args, **kwargs)
  File "F:\stablediffusion\stable-diffusion-webui-1-8-0-rc\repositories\k-diffusion\k_diffusion\sampling.py", line 596, in sample_dpmp
p_2m
    callback({'x': x, 'i': i, 'sigma': sigmas[i], 'sigma_hat': sigmas[i], 'denoised': denoised})
  File "F:\stablediffusion\stable-diffusion-webui-1-8-0-rc\extensions\sd-webui-resharpen\scripts\resharpen.py", line 13, in hijack_cal
lback
    if not self.trajectory_enable:
AttributeError: 'KDiffusionSampler' object has no attribute 'trajectory_enable'

https://github.com/Haoming02/sd-webui-vectorscope-cc

Exception in thread master_benchmark:
Traceback (most recent call last):
  File "C:\Program Files\Python310\lib\threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "C:\Program Files\Python310\lib\threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "F:\stablediffusion\stable-diffusion-webui-1-8-0-rc\extensions\stable-diffusion-webui-distributed\scripts\spartan\world.py", li
ne 217, in benchmark_wrapped
    worker.avg_ipm = bench_func()
  File "F:\stablediffusion\stable-diffusion-webui-1-8-0-rc\extensions\stable-diffusion-webui-distributed\scripts\spartan\world.py", li
ne 374, in benchmark_master
    process_images(master_bench_payload)
  File "F:\stablediffusion\stable-diffusion-webui-1-8-0-rc\modules\processing.py", line 785, in process_images
    res = process_images_inner(p)
  File "F:\stablediffusion\stable-diffusion-webui-1-8-0-rc\extensions\sd-webui-controlnet\scripts\batch_hijack.py", line 59, in proces
sing_process_images_hijack
    return getattr(processing, '__controlnet_original_process_images_inner')(p, *args, **kwargs)
  File "F:\stablediffusion\stable-diffusion-webui-1-8-0-rc\modules\processing.py", line 921, in process_images_inner
    samples_ddim = p.sample(conditioning=p.c, unconditional_conditioning=p.uc, seeds=p.seeds, subseeds=p.subseeds, subseed_strength=p.
subseed_strength, prompts=p.prompts)
  File "F:\stablediffusion\stable-diffusion-webui-1-8-0-rc\modules\processing.py", line 1257, in sample
    samples = self.sampler.sample(self, x, conditioning, unconditional_conditioning, image_conditioning=self.txt2img_image_conditionin
g(x))
  File "F:\stablediffusion\stable-diffusion-webui-1-8-0-rc\modules\sd_samplers_kdiffusion.py", line 234, in sample
    samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args=self.sampler_extra_args, disable=False,
 callback=self.callback_state, **extra_params_kwargs))
  File "F:\stablediffusion\stable-diffusion-webui-1-8-0-rc\modules\sd_samplers_common.py", line 272, in launch_sampling
    return func()
  File "F:\stablediffusion\stable-diffusion-webui-1-8-0-rc\modules\sd_samplers_kdiffusion.py", line 234, in <lambda>
    samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args=self.sampler_extra_args, disable=False,
 callback=self.callback_state, **extra_params_kwargs))
  File "F:\stablediffusion\stable-diffusion-webui\venv\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context   
    return func(*args, **kwargs)
  File "F:\stablediffusion\stable-diffusion-webui-1-8-0-rc\repositories\k-diffusion\k_diffusion\sampling.py", line 596, in sample_dpmp
p_2m
    callback({'x': x, 'i': i, 'sigma': sigmas[i], 'sigma_hat': sigmas[i], 'denoised': denoised})
  File "F:\stablediffusion\stable-diffusion-webui\venv\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context   
    return func(*args, **kwargs)
  File "F:\stablediffusion\stable-diffusion-webui-1-8-0-rc\extensions\sd-webui-vectorscope-cc\scripts\cc_callback.py", line 77, in cc_
callback
    if not self.vec_cc["enable"]:
AttributeError: 'KDiffusionSampler' object has no attribute 'vec_cc'
DISTRIBUTED | INFO     benchmarking finished                                                                              world.py:256

https://github.com/Haoming02/sd-webui-diffusion-cg

Exception in thread master_benchmark:
Traceback (most recent call last):
  File "C:\Program Files\Python310\lib\threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "C:\Program Files\Python310\lib\threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "F:\stablediffusion\stable-diffusion-webui-1-8-0-rc\extensions\stable-diffusion-webui-distributed\scripts\spartan\world.py", li
ne 217, in benchmark_wrapped
    worker.avg_ipm = bench_func()
  File "F:\stablediffusion\stable-diffusion-webui-1-8-0-rc\extensions\stable-diffusion-webui-distributed\scripts\spartan\world.py", li
ne 374, in benchmark_master
    process_images(master_bench_payload)
  File "F:\stablediffusion\stable-diffusion-webui-1-8-0-rc\modules\processing.py", line 785, in process_images
    res = process_images_inner(p)
  File "F:\stablediffusion\stable-diffusion-webui-1-8-0-rc\extensions\sd-webui-controlnet\scripts\batch_hijack.py", line 59, in proces
sing_process_images_hijack
    return getattr(processing, '__controlnet_original_process_images_inner')(p, *args, **kwargs)
  File "F:\stablediffusion\stable-diffusion-webui-1-8-0-rc\modules\processing.py", line 921, in process_images_inner
    samples_ddim = p.sample(conditioning=p.c, unconditional_conditioning=p.uc, seeds=p.seeds, subseeds=p.subseeds, subseed_strength=p.
subseed_strength, prompts=p.prompts)
  File "F:\stablediffusion\stable-diffusion-webui-1-8-0-rc\modules\processing.py", line 1257, in sample
    samples = self.sampler.sample(self, x, conditioning, unconditional_conditioning, image_conditioning=self.txt2img_image_conditionin
g(x))
  File "F:\stablediffusion\stable-diffusion-webui-1-8-0-rc\modules\sd_samplers_kdiffusion.py", line 234, in sample
    samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args=self.sampler_extra_args, disable=False,
 callback=self.callback_state, **extra_params_kwargs))
  File "F:\stablediffusion\stable-diffusion-webui-1-8-0-rc\modules\sd_samplers_common.py", line 272, in launch_sampling
    return func()
  File "F:\stablediffusion\stable-diffusion-webui-1-8-0-rc\modules\sd_samplers_kdiffusion.py", line 234, in <lambda>
    samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args=self.sampler_extra_args, disable=False,
 callback=self.callback_state, **extra_params_kwargs))
  File "F:\stablediffusion\stable-diffusion-webui\venv\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context   
    return func(*args, **kwargs)
  File "F:\stablediffusion\stable-diffusion-webui-1-8-0-rc\repositories\k-diffusion\k_diffusion\sampling.py", line 596, in sample_dpmp
p_2m
    callback({'x': x, 'i': i, 'sigma': sigmas[i], 'sigma_hat': sigmas[i], 'denoised': denoised})
  File "F:\stablediffusion\stable-diffusion-webui-1-8-0-rc\extensions\sd-webui-diffusion-cg\scripts\diffusion_cg.py", line 29, in cent
er_callback
    if not self.diffcg_enable or getattr(self.p, "_ad_inner", False):
AttributeError: 'KDiffusionSampler' object has no attribute 'diffcg_enable'
DISTRIBUTED | INFO     benchmarking finished                                                                              world.py:256

Not 100% sure if I'm barking up the right tree here given these are all written by the same author (@Haoming02) but figured I'd try here first as it's the first time I've seen a problem with them.

Compatibility with the control net extension

Is this script compatible with the control net extension? because i get this issue:
ERROR Failed to serialize payload: �]8;id=135739;file:///kaggle/working/stable-diffusion-webui/extensions/stable-diffusion-webui-distributed/scripts/spartan/Worker.py�\Worker.py�]8;;�:�]8;id=804663;file:///kaggle/working/stable-diffusion-webui/extensions/stable-diffusion-webui-distributed/scripts/spartan/Worker.py#331�\331�]8;;�
{'outpath_samples':
'outputs/txt2img-images',
'outpath_grids':
'outputs/txt2img-grids', 'prompt':
'hjj', 'prompt_for_display': None,
'negative_prompt': '', 'styles': [],
'seed': -1.0, 'subseed': -1,
'subseed_strength': 0,
'seed_resize_from_h': 0,
'seed_resize_from_w': 0,
'sampler_name': 'Euler a',
'batch_size': 1, 'n_iter': 2,
'steps': 20, 'cfg_scale': 7, 'width':
512, 'height': 512, 'restore_faces':
False, 'tiling': False,
'do_not_save_samples': False,
'do_not_save_grid': False,
'extra_generation_params': {},
'overlay_images': None, 'eta': None,
'do_not_reload_embeddings': False,
'paste_to': None,
'color_corrections': None,
'denoising_strength': None,
'sampler_noise_scheduler_override':
None, 'ddim_discretize': 'uniform',
's_min_uncond': 0, 's_churn': 0.0,
's_tmin': 0.0, 's_tmax': 1e+308,
's_noise': 1.0, 'override_settings':
{},
'override_settings_restore_afterwards
': True,
'is_using_inpainting_conditioning':
False, 'disable_extra_networks':
False, 'scripts': None,
'script_args': (6, False, False,
'LoRA', 'None', 1, 1, 'LoRA', 'None',
1, 1, 'LoRA', 'None', 1, 1, 'LoRA',
'None', 1, 1, 'LoRA', 'None', 1, 1,
None, 'Refresh models',
<controlnet.py.UiControlNetUnit
object at 0x7adb935bca90>, False,
False, 'positive', 'comma', 0, False,
False, '', '', 1, '', [], 0, '', [],
0, '', [], True, False, False, False,
0, None, False, 50), 'all_prompts':
None, 'all_negative_prompts': None,
'all_seeds': None, 'all_subseeds':
None, 'iteration': 0, 'is_hr_pass':
False, 'enable_hr': False,
'hr_scale': 2, 'hr_upscaler':
'Latent', 'hr_second_pass_steps': 0,
'hr_resize_x': 0, 'hr_resize_y': 0,
'hr_upscale_to_x': 0,
'hr_upscale_to_y': 0, 'truncate_x':
0, 'truncate_y': 0,
'applied_old_hires_behavior_to':
None}
Exception in thread Thread-24 (request):
Traceback (most recent call last):
File "/kaggle/working/stable-diffusion-webui/extensions/stable-diffusion-webui-distributed/scripts/spartan/Worker.py", line 332, in request
raise e
File "/kaggle/working/stable-diffusion-webui/extensions/stable-diffusion-webui-distributed/scripts/spartan/Worker.py", line 329, in request
json.dumps(payload)
File "/opt/conda/lib/python3.10/json/init.py", line 231, in dumps
return _default_encoder.encode(obj)
File "/opt/conda/lib/python3.10/json/encoder.py", line 199, in encode
chunks = self.iterencode(o, _one_shot=True)
File "/opt/conda/lib/python3.10/json/encoder.py", line 257, in iterencode
return _iterencode(o, 0)
File "/opt/conda/lib/python3.10/json/encoder.py", line 179, in default
raise TypeError(f'Object of type {o.class.name} '
TypeError: Object of type UiControlNetUnit is not JSON serializable

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/conda/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
self.run()
File "/opt/conda/lib/python3.10/threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "/kaggle/working/stable-diffusion-webui/extensions/stable-diffusion-webui-distributed/scripts/spartan/Worker.py", line 369, in request
raise InvalidWorkerResponse(e)
scripts.spartan.Worker.InvalidWorkerResponse: Object of type UiControlNetUnit is not JSON serializable

option to use master instance in "thin-client" mode where images are only generated on remotes

TypeError: unsupported operand type(s) for /: 'int' and 'NoneType'

Ever since trying to set this up, I've gotten this error. The console doesn't show anything as of now, but before, it would show that error and that some extensions are not supported (I've disabled them since then)

Never mind that "nothing in the console", managed to get the console log:

To create a public link, set `share=True` in `launch()`.
Startup time: 69.3s (import torch: 6.3s, import gradio: 1.5s, import ldm: 0.7s, other imports: 1.2s, setup codeformer: 0.1s, load scripts: 44.0s, load SD checkpoint: 2.8s, create ui: 10.0s, gradio launch: 2.5s).
                                  WARNING  config reports invalid speed (0 ipm) for worker 'argon', setting default of 1 ipm.                 World.py:378
         please re-benchmark
Error completing request

Traceback (most recent call last):
  File "Y:\AI-GIT-CLONE\REPAIR\stable-diffusion-webui\modules\call_queue.py", line 56, in f
    res = list(func(*args, **kwargs))
  File "Y:\AI-GIT-CLONE\REPAIR\stable-diffusion-webui\modules\call_queue.py", line 37, in f
    res = func(*args, **kwargs)
  File "Y:\AI-GIT-CLONE\REPAIR\stable-diffusion-webui\modules\txt2img.py", line 53, in txt2img
    processed = modules.scripts.scripts_txt2img.run(p, *args)
  File "Y:\AI-GIT-CLONE\REPAIR\stable-diffusion-webui\modules\scripts.py", line 407, in run
    processed = script.run(p, *script_args)
  File "Y:\AI-GIT-CLONE\REPAIR\stable-diffusion-webui\extensions\stable-diffusion-webui-distributed\scripts\extension.py", line 291, in run
    Script.world.optimize_jobs(payload)  # optimize work assignment before dispatching
  File "Y:\AI-GIT-CLONE\REPAIR\stable-diffusion-webui\extensions\stable-diffusion-webui-distributed\scripts\spartan\World.py", line 402, in optimize_jobs
    lag = self.job_stall(job.worker, payload=payload)
  File "Y:\AI-GIT-CLONE\REPAIR\stable-diffusion-webui\extensions\stable-diffusion-webui-distributed\scripts\spartan\World.py", line 330, in job_stall
    lag = worker.batch_eta(payload=payload, quiet=True) - fastest_worker.batch_eta(payload=payload, quiet=True)
  File "Y:\AI-GIT-CLONE\REPAIR\stable-diffusion-webui\extensions\stable-diffusion-webui-distributed\scripts\spartan\Worker.py", line 210, in batch_eta
    eta = (num_images / self.avg_ipm) * 60
TypeError: unsupported operand type(s) for /: 'int' and 'NoneType'

Edit: Might want to specify more, I'm running python 3.10.10. If any info is required further, let me know.

ERROR config is corrupt or invalid JSON

When attempting to run stable diffusion with this extension enabled, it gives me an error stating "config is corrupt or invalid JSON World.py:527". I am attempting to run a slave instance on localhost with a different GPU than master. Unsure if I am specifying the slave GPU correctly, but my guess is it should not affect the "World.py" error that was thrown.

ips calculation impacted by model loading time

I have two 3090s, the slightly faster one (420w power limit) is on the slave instance and the master instance has the slower one (currently 300w power limit).

Well, first, I encountered a repeated error

webui-docker-auto-1  | Traceback (most recent call last):
webui-docker-auto-1  |   File "/stable-diffusion-webui/modules/call_queue.py", line 57, in f
webui-docker-auto-1  |     res = list(func(*args, **kwargs))
webui-docker-auto-1  |   File "/stable-diffusion-webui/modules/call_queue.py", line 37, in f
webui-docker-auto-1  |     res = func(*args, **kwargs)
webui-docker-auto-1  |   File "/stable-diffusion-webui/modules/txt2img.py", line 54, in txt2img
webui-docker-auto-1  |     processed = modules.scripts.scripts_txt2img.run(p, *args)
webui-docker-auto-1  |   File "/stable-diffusion-webui/modules/scripts.py", line 441, in run
webui-docker-auto-1  |     processed = script.run(p, *script_args)
webui-docker-auto-1  |   File "/stable-diffusion-webui/extensions/stable-diffusion-webui-distributed/scripts/extension.py", line 291, in run
webui-docker-auto-1  |     Script.world.optimize_jobs(payload)  # optimize work assignment before dispatching
webui-docker-auto-1  |   File "/stable-diffusion-webui/extensions/stable-diffusion-webui-distributed/scripts/spartan/World.py", line 402, in optimize_jobs
webui-docker-auto-1  |     lag = self.job_stall(job.worker, payload=payload)
webui-docker-auto-1  |   File "/stable-diffusion-webui/extensions/stable-diffusion-webui-distributed/scripts/spartan/World.py", line 330, in job_stall
webui-docker-auto-1  |     lag = worker.batch_eta(payload=payload, quiet=True) - fastest_worker.batch_eta(payload=payload, quiet=True)
webui-docker-auto-1  |   File "/stable-diffusion-webui/extensions/stable-diffusion-webui-distributed/scripts/spartan/Worker.py", line 210, in batch_eta
webui-docker-auto-1  |     eta = (num_images / self.avg_ipm) * 60
webui-docker-auto-1  | TypeError: unsupported operand type(s) for /: 'int' and 'NoneType'

I read this error message and got clued in that avg_ipm was None, so I manually ran the benchmark, and it resolved the issue.

However I noticed that with the benchmark my slave was getting a worse ipm rating:

webui-docker-auto-1 | 1. 'master'(0.0.0.0:7860) - 51.36 ipm
webui-docker-auto-1 | 2. 'testrig_ftw3'(192.168.1.41:7860) - 39.91 ipm

This is surely because the slave instance is on a much inferior NVMe disk and i could watch as it was loading the model that it was way slower.

Then in the course of actual generation of images and using the same model, no model loading wait is incurred, and both rigs crank out images at the same rate.

I just wonder if we can address this somehow. Or maybe it's a non-issue... Regardless, even though they have quite different ipm ratings the image allocation isn't really being impacted (I tested with 32 images via 4 batches of 8 and the speed is awesome, 24 seconds).

Is it possible to use the extension with Deforum?

Is it possible to use this extension with Deforum?

Job counts that are not even multiples of number of slaves behaves oddly

If you use a batch size that is not evenly divisible by the number of workers available the remainder of the batch is not generated. With 4 workers and a batch size of 25 you should see a generation task like 7/6/6/6 instead it does 6/6/6/6 and drops the remaining batch count.

Compatibility with SD Forge (stable-diffusion-webui-forge)

First of all, thank you for this incredible extension. It seems very well designed and worked right away with very little configuration needed.

Assuming this is not already in the works, have you considered making the extension compatible with lllyasviel/stable-diffusion-webui-forge? That distribution is quite a bit faster than automatic1111, especially for lower RAM GPUs (50-75% speedup) which makes it a great platform to use on worker nodes.

As far as I can tell most everything is already working well, but there are some incompatibilities with ControlNet. One quick fix was to change the expected name from UiControlNetUnit to ControlNetUnit in distributed.py so that the ControlNet units are detected. After making that small change, a master server can now successfully send jobs that rely on ControlNet to workers running automatic1111. But there are still some issues sending those jobs to workers running on SD Forge, which I have not been able to figure out.

Anyway, I just wanted to see if this was on your radar. Thanks again for your hard work on this!

interrupt slave

http://localhost:7860/docs#/default/interruptapi_sdapi_v1_interrupt_post
when you hit interrupt, it only interrupts the local process.

papuspartan / stable-diffusion-webui-distributed Goto Github PK