Chains stable-diffusion-webui instances together to facilitate faster image generation.

Python 99.92% JavaScript 0.08%

stable-diffusion stable-diffusion-webui stable-diffusion-webui-plugin automatic1111 distributed-computing multi-gpu

stable-diffusion-webui-distributed's Issues

Is there a way to have this extension enabled by default (on startup)?

Is there a way to always enable this extension?
I am running a free service to generate images, so I have created three GPUs to do this, so I hope this extension can always be enabled so that others do not need to manually enable plugins

ips calculation error

Idk if it's an error or on purpose.
Master is doing 60it and slave 80it.

both run on same gpu, both have same it/s when generating on single mode.

Issues with a few other extensions authored by Haoming02

Benchmark attempts:

https://github.com/Haoming02/sd-webui-resharpen

Exception in thread master_benchmark:
Traceback (most recent call last):
  File "C:\Program Files\Python310\lib\threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "C:\Program Files\Python310\lib\threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "F:\stablediffusion\stable-diffusion-webui-1-8-0-rc\extensions\stable-diffusion-webui-distributed\scripts\spartan\world.py", li
ne 217, in benchmark_wrapped
    worker.avg_ipm = bench_func()
  File "F:\stablediffusion\stable-diffusion-webui-1-8-0-rc\extensions\stable-diffusion-webui-distributed\scripts\spartan\world.py", li
ne 374, in benchmark_master
    process_images(master_bench_payload)
  File "F:\stablediffusion\stable-diffusion-webui-1-8-0-rc\modules\processing.py", line 785, in process_images
    res = process_images_inner(p)
  File "F:\stablediffusion\stable-diffusion-webui-1-8-0-rc\extensions\sd-webui-controlnet\scripts\batch_hijack.py", line 59, in proces
sing_process_images_hijack
    return getattr(processing, '__controlnet_original_process_images_inner')(p, *args, **kwargs)
  File "F:\stablediffusion\stable-diffusion-webui-1-8-0-rc\modules\processing.py", line 921, in process_images_inner
    samples_ddim = p.sample(conditioning=p.c, unconditional_conditioning=p.uc, seeds=p.seeds, subseeds=p.subseeds, subseed_strength=p.
subseed_strength, prompts=p.prompts)
  File "F:\stablediffusion\stable-diffusion-webui-1-8-0-rc\modules\processing.py", line 1257, in sample
    samples = self.sampler.sample(self, x, conditioning, unconditional_conditioning, image_conditioning=self.txt2img_image_conditionin
g(x))
  File "F:\stablediffusion\stable-diffusion-webui-1-8-0-rc\modules\sd_samplers_kdiffusion.py", line 234, in sample
    samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args=self.sampler_extra_args, disable=False,
 callback=self.callback_state, **extra_params_kwargs))
  File "F:\stablediffusion\stable-diffusion-webui-1-8-0-rc\modules\sd_samplers_common.py", line 272, in launch_sampling
    return func()
  File "F:\stablediffusion\stable-diffusion-webui-1-8-0-rc\modules\sd_samplers_kdiffusion.py", line 234, in <lambda>
    samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args=self.sampler_extra_args, disable=False,
 callback=self.callback_state, **extra_params_kwargs))
  File "F:\stablediffusion\stable-diffusion-webui\venv\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context   
    return func(*args, **kwargs)
  File "F:\stablediffusion\stable-diffusion-webui-1-8-0-rc\repositories\k-diffusion\k_diffusion\sampling.py", line 596, in sample_dpmp
p_2m
    callback({'x': x, 'i': i, 'sigma': sigmas[i], 'sigma_hat': sigmas[i], 'denoised': denoised})
  File "F:\stablediffusion\stable-diffusion-webui-1-8-0-rc\extensions\sd-webui-resharpen\scripts\resharpen.py", line 13, in hijack_cal
lback
    if not self.trajectory_enable:
AttributeError: 'KDiffusionSampler' object has no attribute 'trajectory_enable'

https://github.com/Haoming02/sd-webui-vectorscope-cc

Exception in thread master_benchmark:
Traceback (most recent call last):
  File "C:\Program Files\Python310\lib\threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "C:\Program Files\Python310\lib\threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "F:\stablediffusion\stable-diffusion-webui-1-8-0-rc\extensions\stable-diffusion-webui-distributed\scripts\spartan\world.py", li
ne 217, in benchmark_wrapped
    worker.avg_ipm = bench_func()
  File "F:\stablediffusion\stable-diffusion-webui-1-8-0-rc\extensions\stable-diffusion-webui-distributed\scripts\spartan\world.py", li
ne 374, in benchmark_master
    process_images(master_bench_payload)
  File "F:\stablediffusion\stable-diffusion-webui-1-8-0-rc\modules\processing.py", line 785, in process_images
    res = process_images_inner(p)
  File "F:\stablediffusion\stable-diffusion-webui-1-8-0-rc\extensions\sd-webui-controlnet\scripts\batch_hijack.py", line 59, in proces
sing_process_images_hijack
    return getattr(processing, '__controlnet_original_process_images_inner')(p, *args, **kwargs)
  File "F:\stablediffusion\stable-diffusion-webui-1-8-0-rc\modules\processing.py", line 921, in process_images_inner
    samples_ddim = p.sample(conditioning=p.c, unconditional_conditioning=p.uc, seeds=p.seeds, subseeds=p.subseeds, subseed_strength=p.
subseed_strength, prompts=p.prompts)
  File "F:\stablediffusion\stable-diffusion-webui-1-8-0-rc\modules\processing.py", line 1257, in sample
    samples = self.sampler.sample(self, x, conditioning, unconditional_conditioning, image_conditioning=self.txt2img_image_conditionin
g(x))
  File "F:\stablediffusion\stable-diffusion-webui-1-8-0-rc\modules\sd_samplers_kdiffusion.py", line 234, in sample
    samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args=self.sampler_extra_args, disable=False,
 callback=self.callback_state, **extra_params_kwargs))
  File "F:\stablediffusion\stable-diffusion-webui-1-8-0-rc\modules\sd_samplers_common.py", line 272, in launch_sampling
    return func()
  File "F:\stablediffusion\stable-diffusion-webui-1-8-0-rc\modules\sd_samplers_kdiffusion.py", line 234, in <lambda>
    samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args=self.sampler_extra_args, disable=False,
 callback=self.callback_state, **extra_params_kwargs))
  File "F:\stablediffusion\stable-diffusion-webui\venv\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context   
    return func(*args, **kwargs)
  File "F:\stablediffusion\stable-diffusion-webui-1-8-0-rc\repositories\k-diffusion\k_diffusion\sampling.py", line 596, in sample_dpmp
p_2m
    callback({'x': x, 'i': i, 'sigma': sigmas[i], 'sigma_hat': sigmas[i], 'denoised': denoised})
  File "F:\stablediffusion\stable-diffusion-webui\venv\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context   
    return func(*args, **kwargs)
  File "F:\stablediffusion\stable-diffusion-webui-1-8-0-rc\extensions\sd-webui-vectorscope-cc\scripts\cc_callback.py", line 77, in cc_
callback
    if not self.vec_cc["enable"]:
AttributeError: 'KDiffusionSampler' object has no attribute 'vec_cc'
DISTRIBUTED | INFO     benchmarking finished                                                                              world.py:256

https://github.com/Haoming02/sd-webui-diffusion-cg

Exception in thread master_benchmark:
Traceback (most recent call last):
  File "C:\Program Files\Python310\lib\threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "C:\Program Files\Python310\lib\threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "F:\stablediffusion\stable-diffusion-webui-1-8-0-rc\extensions\stable-diffusion-webui-distributed\scripts\spartan\world.py", li
ne 217, in benchmark_wrapped
    worker.avg_ipm = bench_func()
  File "F:\stablediffusion\stable-diffusion-webui-1-8-0-rc\extensions\stable-diffusion-webui-distributed\scripts\spartan\world.py", li
ne 374, in benchmark_master
    process_images(master_bench_payload)
  File "F:\stablediffusion\stable-diffusion-webui-1-8-0-rc\modules\processing.py", line 785, in process_images
    res = process_images_inner(p)
  File "F:\stablediffusion\stable-diffusion-webui-1-8-0-rc\extensions\sd-webui-controlnet\scripts\batch_hijack.py", line 59, in proces
sing_process_images_hijack
    return getattr(processing, '__controlnet_original_process_images_inner')(p, *args, **kwargs)
  File "F:\stablediffusion\stable-diffusion-webui-1-8-0-rc\modules\processing.py", line 921, in process_images_inner
    samples_ddim = p.sample(conditioning=p.c, unconditional_conditioning=p.uc, seeds=p.seeds, subseeds=p.subseeds, subseed_strength=p.
subseed_strength, prompts=p.prompts)
  File "F:\stablediffusion\stable-diffusion-webui-1-8-0-rc\modules\processing.py", line 1257, in sample
    samples = self.sampler.sample(self, x, conditioning, unconditional_conditioning, image_conditioning=self.txt2img_image_conditionin
g(x))
  File "F:\stablediffusion\stable-diffusion-webui-1-8-0-rc\modules\sd_samplers_kdiffusion.py", line 234, in sample
    samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args=self.sampler_extra_args, disable=False,
 callback=self.callback_state, **extra_params_kwargs))
  File "F:\stablediffusion\stable-diffusion-webui-1-8-0-rc\modules\sd_samplers_common.py", line 272, in launch_sampling
    return func()
  File "F:\stablediffusion\stable-diffusion-webui-1-8-0-rc\modules\sd_samplers_kdiffusion.py", line 234, in <lambda>
    samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args=self.sampler_extra_args, disable=False,
 callback=self.callback_state, **extra_params_kwargs))
  File "F:\stablediffusion\stable-diffusion-webui\venv\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context   
    return func(*args, **kwargs)
  File "F:\stablediffusion\stable-diffusion-webui-1-8-0-rc\repositories\k-diffusion\k_diffusion\sampling.py", line 596, in sample_dpmp
p_2m
    callback({'x': x, 'i': i, 'sigma': sigmas[i], 'sigma_hat': sigmas[i], 'denoised': denoised})
  File "F:\stablediffusion\stable-diffusion-webui-1-8-0-rc\extensions\sd-webui-diffusion-cg\scripts\diffusion_cg.py", line 29, in cent
er_callback
    if not self.diffcg_enable or getattr(self.p, "_ad_inner", False):
AttributeError: 'KDiffusionSampler' object has no attribute 'diffcg_enable'
DISTRIBUTED | INFO     benchmarking finished                                                                              world.py:256

Not 100% sure if I'm barking up the right tree here given these are all written by the same author (@Haoming02) but figured I'd try here first as it's the first time I've seen a problem with them.

interrupt slave

http://localhost:7860/docs#/default/interruptapi_sdapi_v1_interrupt_post
when you hit interrupt, it only interrupts the local process.

Getting stuck at "Distributed - injecting images 100%"

Hi, I first want to thank you for this project. I'm running into an issue where after finishing to generate the image, stable diffusion gets stuck at: Distributed - injecting images 100%.

I currently have 2 GPUs installed. One instance is running on device 0, the other on device 1 and I can confirm they are both being used through nvtop and nvidia-smi. Both instances are ran from the same folder, on different ports; unsure if this is how it's supposed to be used.

I have installed the extension and judging by the log, it seems to work. It generates an image, but gets stuck at the mentioned status. There are no errors I can see. When that status is displayed, the log reports the 2nd instance is idle.

Am I doing something wrong? If so, can you expand a bit on what the proper usage of this extension looks like? Please let me know if you need more information. Thank you.

Output from the main instance:

DISTRIBUTED | INFO     Job distribution:                                                                                  world.py:554                       1 * 1 iteration(s) + 1 complementary: 2 images total                                                           
                       'master' - 1 image(s) @ 2.97 ipm                                                                               
                       'slave1' - 1 image(s) @ 3.40 ipm                                                                               
                                                                                                                                      
DISTRIBUTED | WARNING  local script(s): [Hypertile], [Comments] seem to be unsupported by worker 'slave1'                worker.py:393
                                                                                                                                      
100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:14<00:00,  1.34it/s]Total progress: 100%|█████████████████████████████████████████████████████████████████████████████████| 20/20 [00:13<00:00,  1.38it/s]

Running out of linux memory and OOM when attempting benchmark

Master: 32GB RAM system running 3080Ti (12GB VRAM) on Windows 11 under docker with nvidia container toolkit in WSL2. Also fails when WSL2 is configured for 24GB. This gives it 6GB swap also. I'm wondering why 30GB total memory isn't enough.

Two slaves: both with 3090s. one with 96GB RAM running Ubuntu 20.04, and another with also 32GB RAM running WSL2 in Windows 11.

It's not clear how much memory is needed for the benchmark to take place. I guess I can give it a comical amount of swap, or, just make up workers config file also.

about distributed-config.json

This is a summary from DISCORD.

Configuration of remote workers is now done using distributed-config.json in the root of this extension directory instead of from command-line arguments.

if just empty json, throw ERROR config is corrupt or invalid JSON, unable to load

JSON example is as like as follow

{
   "workers": [
      {
         "master": {
            "avg_ipm": 18.7014339699366,
            "master": true,
            "address": "localhost",
            "port": 7860,
            "eta_percent_error": [],
            "tls": false,
            "state": 1
         }
      },
      {
         "laptop": {
            "avg_ipm": 19.332550555969075,
            "master": false,
            "address": "192.168.1.83",
            "port": 7861,
            "eta_percent_error": [
               -6.019789958534711,
               -0.9360936846472658,
               7.7202976971435096,
               2.5322154055319075,
               -53.415720437075485
            ],
            "tls": true,
            "state": 1
         }
      }
   ],
   "benchmark_payload": {
      "prompt": "A herd of cows grazing at the bottom of a sunny valley",
      "negative_prompt": "",
      "steps": 20,
      "width": 512,
      "height": 512,
      "batch_size": 1
   },
   "job_timeout": 6
}

Compatibility with SD Forge (stable-diffusion-webui-forge)

First of all, thank you for this incredible extension. It seems very well designed and worked right away with very little configuration needed.

Assuming this is not already in the works, have you considered making the extension compatible with lllyasviel/stable-diffusion-webui-forge? That distribution is quite a bit faster than automatic1111, especially for lower RAM GPUs (50-75% speedup) which makes it a great platform to use on worker nodes.

As far as I can tell most everything is already working well, but there are some incompatibilities with ControlNet. One quick fix was to change the expected name from UiControlNetUnit to ControlNetUnit in distributed.py so that the ControlNet units are detected. After making that small change, a master server can now successfully send jobs that rely on ControlNet to workers running automatic1111. But there are still some issues sending those jobs to workers running on SD Forge, which I have not been able to figure out.

Anyway, I just wanted to see if this was on your radar. Thanks again for your hard work on this!

Job counts that are not even multiples of number of slaves behaves oddly

If you use a batch size that is not evenly divisible by the number of workers available the remainder of the batch is not generated. With 4 workers and a batch size of 25 you should see a generation task like 7/6/6/6 instead it does 6/6/6/6 and drops the remaining batch count.

Can't run Redo Benchmarks

DISTRIBUTED | INFO     Redoing benchmarks...                                                                                 ui.py:56
DISTRIBUTED | DEBUG    failed to load options for worker 'Pinbot-Bride-VM'                                              worker.py:684
DISTRIBUTED | DEBUG    failed to load options for worker 'Pinbot-VM'                                                    worker.py:684
Traceback (most recent call last):
  File "C:\Users\Peter\stable-diffusion-webui\venv\lib\site-packages\gradio\routes.py", line 488, in run_predict
    output = await app.get_blocks().process_api(
  File "C:\Users\Peter\stable-diffusion-webui\venv\lib\site-packages\gradio\blocks.py", line 1431, in process_api
    result = await self.call_function(
  File "C:\Users\Peter\stable-diffusion-webui\venv\lib\site-packages\gradio\blocks.py", line 1103, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "C:\Users\Peter\stable-diffusion-webui\venv\lib\site-packages\anyio\to_thread.py", line 33, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "C:\Users\Peter\stable-diffusion-webui\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 877, in run_sync_in_worker_thread
    return await future
  File "C:\Users\Peter\stable-diffusion-webui\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 807, in run
    result = context.run(func, *args)
  File "C:\Users\Peter\stable-diffusion-webui\venv\lib\site-packages\gradio\utils.py", line 707, in wrapper
    response = f(*args, **kwargs)
  File "C:\Users\Peter\stable-diffusion-webui\extensions\stable-diffusion-webui-distributed\scripts\spartan\ui.py", line 57, in benchmark_btn
    self.world.benchmark(rebenchmark=True)
  File "C:\Users\Peter\stable-diffusion-webui\extensions\stable-diffusion-webui-distributed\scripts\spartan\world.py", line 243, in benchmark
    if worker.response.status_code != 200:
AttributeError: 'NoneType' object has no attribute 'status_code'

Is it possible to use the extension with Deforum?

Is it possible to use this extension with Deforum?

ERROR config is corrupt or invalid JSON

When attempting to run stable diffusion with this extension enabled, it gives me an error stating "config is corrupt or invalid JSON World.py:527". I am attempting to run a slave instance on localhost with a different GPU than master. Unsure if I am specifying the slave GPU correctly, but my guess is it should not affect the "World.py" error that was thrown.

can not installed this plugin

Error log:

*** Error completing request
*** Arguments: ('https://github.com/papuSpartan/stable-diffusion-webui-distributed.git', ['ads', 'localization', 'installed'], 0, '') {}
    Traceback (most recent call last):
      File "/stable-diffusion-webui/modules/call_queue.py", line 57, in f
        res = list(func(*args, **kwargs))
      File "/stable-diffusion-webui/modules/ui_extensions.py", line 397, in install_extension_from_index
        ext_table, message = install_extension_from_url(None, url)
      File "/stable-diffusion-webui/modules/ui_extensions.py", line 339, in install_extension_from_url
        check_access()
      File "/stable-diffusion-webui/modules/ui_extensions.py", line 23, in check_access
        assert not shared.cmd_opts.disable_extension_access, "extension access disabled because of command line flags"
    AssertionError: extension access disabled because of command line flags

---

option to use master instance in "thin-client" mode where images are only generated on remotes

TypeError: unsupported operand type(s) for /: 'int' and 'NoneType'

Ever since trying to set this up, I've gotten this error. The console doesn't show anything as of now, but before, it would show that error and that some extensions are not supported (I've disabled them since then)

Never mind that "nothing in the console", managed to get the console log:

To create a public link, set `share=True` in `launch()`.
Startup time: 69.3s (import torch: 6.3s, import gradio: 1.5s, import ldm: 0.7s, other imports: 1.2s, setup codeformer: 0.1s, load scripts: 44.0s, load SD checkpoint: 2.8s, create ui: 10.0s, gradio launch: 2.5s).
                                  WARNING  config reports invalid speed (0 ipm) for worker 'argon', setting default of 1 ipm.                 World.py:378
         please re-benchmark
Error completing request

Traceback (most recent call last):
  File "Y:\AI-GIT-CLONE\REPAIR\stable-diffusion-webui\modules\call_queue.py", line 56, in f
    res = list(func(*args, **kwargs))
  File "Y:\AI-GIT-CLONE\REPAIR\stable-diffusion-webui\modules\call_queue.py", line 37, in f
    res = func(*args, **kwargs)
  File "Y:\AI-GIT-CLONE\REPAIR\stable-diffusion-webui\modules\txt2img.py", line 53, in txt2img
    processed = modules.scripts.scripts_txt2img.run(p, *args)
  File "Y:\AI-GIT-CLONE\REPAIR\stable-diffusion-webui\modules\scripts.py", line 407, in run
    processed = script.run(p, *script_args)
  File "Y:\AI-GIT-CLONE\REPAIR\stable-diffusion-webui\extensions\stable-diffusion-webui-distributed\scripts\extension.py", line 291, in run
    Script.world.optimize_jobs(payload)  # optimize work assignment before dispatching
  File "Y:\AI-GIT-CLONE\REPAIR\stable-diffusion-webui\extensions\stable-diffusion-webui-distributed\scripts\spartan\World.py", line 402, in optimize_jobs
    lag = self.job_stall(job.worker, payload=payload)
  File "Y:\AI-GIT-CLONE\REPAIR\stable-diffusion-webui\extensions\stable-diffusion-webui-distributed\scripts\spartan\World.py", line 330, in job_stall
    lag = worker.batch_eta(payload=payload, quiet=True) - fastest_worker.batch_eta(payload=payload, quiet=True)
  File "Y:\AI-GIT-CLONE\REPAIR\stable-diffusion-webui\extensions\stable-diffusion-webui-distributed\scripts\spartan\Worker.py", line 210, in batch_eta
    eta = (num_images / self.avg_ipm) * 60
TypeError: unsupported operand type(s) for /: 'int' and 'NoneType'

Edit: Might want to specify more, I'm running python 3.10.10. If any info is required further, let me know.

Compatibility with the control net extension

Is this script compatible with the control net extension? because i get this issue:
ERROR Failed to serialize payload: �]8;id=135739;file:///kaggle/working/stable-diffusion-webui/extensions/stable-diffusion-webui-distributed/scripts/spartan/Worker.py�\Worker.py�]8;;�:�]8;id=804663;file:///kaggle/working/stable-diffusion-webui/extensions/stable-diffusion-webui-distributed/scripts/spartan/Worker.py#331�\331�]8;;�
{'outpath_samples':
'outputs/txt2img-images',
'outpath_grids':
'outputs/txt2img-grids', 'prompt':
'hjj', 'prompt_for_display': None,
'negative_prompt': '', 'styles': [],
'seed': -1.0, 'subseed': -1,
'subseed_strength': 0,
'seed_resize_from_h': 0,
'seed_resize_from_w': 0,
'sampler_name': 'Euler a',
'batch_size': 1, 'n_iter': 2,
'steps': 20, 'cfg_scale': 7, 'width':
512, 'height': 512, 'restore_faces':
False, 'tiling': False,
'do_not_save_samples': False,
'do_not_save_grid': False,
'extra_generation_params': {},
'overlay_images': None, 'eta': None,
'do_not_reload_embeddings': False,
'paste_to': None,
'color_corrections': None,
'denoising_strength': None,
'sampler_noise_scheduler_override':
None, 'ddim_discretize': 'uniform',
's_min_uncond': 0, 's_churn': 0.0,
's_tmin': 0.0, 's_tmax': 1e+308,
's_noise': 1.0, 'override_settings':
{},
'override_settings_restore_afterwards
': True,
'is_using_inpainting_conditioning':
False, 'disable_extra_networks':
False, 'scripts': None,
'script_args': (6, False, False,
'LoRA', 'None', 1, 1, 'LoRA', 'None',
1, 1, 'LoRA', 'None', 1, 1, 'LoRA',
'None', 1, 1, 'LoRA', 'None', 1, 1,
None, 'Refresh models',
<controlnet.py.UiControlNetUnit
object at 0x7adb935bca90>, False,
False, 'positive', 'comma', 0, False,
False, '', '', 1, '', [], 0, '', [],
0, '', [], True, False, False, False,
0, None, False, 50), 'all_prompts':
None, 'all_negative_prompts': None,
'all_seeds': None, 'all_subseeds':
None, 'iteration': 0, 'is_hr_pass':
False, 'enable_hr': False,
'hr_scale': 2, 'hr_upscaler':
'Latent', 'hr_second_pass_steps': 0,
'hr_resize_x': 0, 'hr_resize_y': 0,
'hr_upscale_to_x': 0,
'hr_upscale_to_y': 0, 'truncate_x':
0, 'truncate_y': 0,
'applied_old_hires_behavior_to':
None}
Exception in thread Thread-24 (request):
Traceback (most recent call last):
File "/kaggle/working/stable-diffusion-webui/extensions/stable-diffusion-webui-distributed/scripts/spartan/Worker.py", line 332, in request
raise e
File "/kaggle/working/stable-diffusion-webui/extensions/stable-diffusion-webui-distributed/scripts/spartan/Worker.py", line 329, in request
json.dumps(payload)
File "/opt/conda/lib/python3.10/json/init.py", line 231, in dumps
return _default_encoder.encode(obj)
File "/opt/conda/lib/python3.10/json/encoder.py", line 199, in encode
chunks = self.iterencode(o, _one_shot=True)
File "/opt/conda/lib/python3.10/json/encoder.py", line 257, in iterencode
return _iterencode(o, 0)
File "/opt/conda/lib/python3.10/json/encoder.py", line 179, in default
raise TypeError(f'Object of type {o.class.name} '
TypeError: Object of type UiControlNetUnit is not JSON serializable

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/conda/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
self.run()
File "/opt/conda/lib/python3.10/threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "/kaggle/working/stable-diffusion-webui/extensions/stable-diffusion-webui-distributed/scripts/spartan/Worker.py", line 369, in request
raise InvalidWorkerResponse(e)
scripts.spartan.Worker.InvalidWorkerResponse: Object of type UiControlNetUnit is not JSON serializable

error with multiprompt generation

*** Error running before_process: D:\non-docker\illyforge\extensions\stable-diffusion-webui-distributed\scripts\distributed.py
    Traceback (most recent call last):
      File "D:\non-docker\illyforge\modules\scripts.py", line 795, in before_process
        script.before_process(p, *script_args)
      File "D:\non-docker\illyforge\extensions\stable-diffusion-webui-distributed\scripts\distributed.py", line 342, in before_process
        payload_temp['subseed'] += prior_images
    TypeError: 'int' object is not iterable

I have a script generating multiple images using different prompts (different seeds, subseeds, negative prompt). I want to distribute the script over different gpus and devices.
unfortunately, I get the above error. is there an easy way to hook into distributed and add a queue of requests which it will automatically split based on the estimated ipm, when distributed is enabled?

ips calculation impacted by model loading time

I have two 3090s, the slightly faster one (420w power limit) is on the slave instance and the master instance has the slower one (currently 300w power limit).

Well, first, I encountered a repeated error

webui-docker-auto-1  | Traceback (most recent call last):
webui-docker-auto-1  |   File "/stable-diffusion-webui/modules/call_queue.py", line 57, in f
webui-docker-auto-1  |     res = list(func(*args, **kwargs))
webui-docker-auto-1  |   File "/stable-diffusion-webui/modules/call_queue.py", line 37, in f
webui-docker-auto-1  |     res = func(*args, **kwargs)
webui-docker-auto-1  |   File "/stable-diffusion-webui/modules/txt2img.py", line 54, in txt2img
webui-docker-auto-1  |     processed = modules.scripts.scripts_txt2img.run(p, *args)
webui-docker-auto-1  |   File "/stable-diffusion-webui/modules/scripts.py", line 441, in run
webui-docker-auto-1  |     processed = script.run(p, *script_args)
webui-docker-auto-1  |   File "/stable-diffusion-webui/extensions/stable-diffusion-webui-distributed/scripts/extension.py", line 291, in run
webui-docker-auto-1  |     Script.world.optimize_jobs(payload)  # optimize work assignment before dispatching
webui-docker-auto-1  |   File "/stable-diffusion-webui/extensions/stable-diffusion-webui-distributed/scripts/spartan/World.py", line 402, in optimize_jobs
webui-docker-auto-1  |     lag = self.job_stall(job.worker, payload=payload)
webui-docker-auto-1  |   File "/stable-diffusion-webui/extensions/stable-diffusion-webui-distributed/scripts/spartan/World.py", line 330, in job_stall
webui-docker-auto-1  |     lag = worker.batch_eta(payload=payload, quiet=True) - fastest_worker.batch_eta(payload=payload, quiet=True)
webui-docker-auto-1  |   File "/stable-diffusion-webui/extensions/stable-diffusion-webui-distributed/scripts/spartan/Worker.py", line 210, in batch_eta
webui-docker-auto-1  |     eta = (num_images / self.avg_ipm) * 60
webui-docker-auto-1  | TypeError: unsupported operand type(s) for /: 'int' and 'NoneType'

I read this error message and got clued in that avg_ipm was None, so I manually ran the benchmark, and it resolved the issue.

However I noticed that with the benchmark my slave was getting a worse ipm rating:

webui-docker-auto-1 | 1. 'master'(0.0.0.0:7860) - 51.36 ipm
webui-docker-auto-1 | 2. 'testrig_ftw3'(192.168.1.41:7860) - 39.91 ipm

This is surely because the slave instance is on a much inferior NVMe disk and i could watch as it was loading the model that it was way slower.

Then in the course of actual generation of images and using the same model, no model loading wait is incurred, and both rigs crank out images at the same rate.

I just wonder if we can address this somehow. Or maybe it's a non-issue... Regardless, even though they have quite different ipm ratings the image allocation isn't really being impacted (I tested with 32 images via 4 batches of 8 and the speed is awesome, 24 seconds).

Grid generation doesn't include slave output

When running with loads that span across multiple slaves the output doesn't include a grid with all of the combined images. Only the grid from a single slave is shown. Example from my 4 instances below.

papuspartan / stable-diffusion-webui-distributed Goto Github PK

stable-diffusion-webui-distributed's Issues

Recommend Projects

Recommend Topics

Recommend Org