📢 Discussion from <a class="issue-link js-issue-link" data-error-text="Failed to load

Getting the same issue on Linux installed using <a href="https://github.com/JoshuaKims

Running python -u s/webui.py has provided two r

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

webui.py crashes in Docker on a Windows (WSL) machine about stable-diffusion HOT 28 CLOSED

sygil-dev commented on July 18, 2024

webui.py crashes in Docker on a Windows (WSL) machine

from stable-diffusion.

Comments (28)

toboshii commented on July 18, 2024 1

Getting the same issue on Linux installed using this guide/script.

Playground runs fine for me.

from stable-diffusion.

altryne commented on July 18, 2024 1

Yep, likewise, playground runs fine. Although, I can only see it from the public link not my internal one. That might be a misconfiguration on my end though.

Yeah that's prob due to gradio issues with debug.

I'm out of ideas, it's really hard to debug without an error entirely 😅

The best I can do is suggest to drop random print("GOT HERE") or sys.exit() and see if you get to that point or no to try and figure out if it's possible to load these things inside docker

@hlky any other ideas?

from stable-diffusion.

oc013 commented on July 18, 2024 1

Good run from scratch on Ubuntu 20.04 below with latest pull. Maybe this can help you debug.

sd  | entrypoint.sh: Launching...
sd  | python -u scripts/webui.py --no-verify-input --optimized-turbo
sd  | Downloading: "https://github.com/xinntao/facexlib/releases/download/v0.1.0/detection_Resnet50_Final.pth" to /opt/conda/envs/ldm/lib/python3.8/site-packages/facexlib/weights/detection_Resnet50_Final.pth
sd  | 
100%|██████████| 104M/104M [00:05<00:00, 19.3MB/s] 
sd  | Downloading: "https://github.com/xinntao/facexlib/releases/download/v0.2.2/parsing_parsenet.pth" to /opt/conda/envs/ldm/lib/python3.8/site-packages/facexlib/weights/parsing_parsenet.pth
sd  | 
100%|██████████| 81.4M/81.4M [00:04<00:00, 19.6MB/s]
sd  | Loaded GFPGAN
sd  | Loaded RealESRGAN with model RealESRGAN_x4plus
sd  | Loading model from models/ldm/stable-diffusion-v1/model.ckpt
sd  | Global Step: 470000
sd  | UNet: Running in eps-prediction mode
sd  | CondStage: Running in eps-prediction mode
sd  | Downloading: "https://github.com/DagnyT/hardnet/raw/master/pretrained/train_liberty_with_aug/checkpoint_liberty_with_aug.pth" to /root/.cache/torch/hub/checkpoints/checkpoint_liberty_with_aug.pth
100%|██████████| 5.10M/5.10M [00:00<00:00, 17.1MB/s]
Downloading: 100%|██████████| 939k/939k [00:00<00:00, 7.05MB/s]
Downloading: 100%|██████████| 512k/512k [00:00<00:00, 4.68MB/s]
Downloading: 100%|██████████| 389/389 [00:00<00:00, 482kB/s]
Downloading: 100%|██████████| 905/905 [00:00<00:00, 1.04MB/s]
Downloading: 100%|██████████| 4.31k/4.31k [00:00<00:00, 5.01MB/s]
Downloading: 100%|██████████| 1.59G/1.59G [01:32<00:00, 18.4MB/s]
sd  | FirstStage: Running in eps-prediction mode
sd  | making attention of type 'vanilla' with 512 in_channels
sd  | Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
sd  | making attention of type 'vanilla' with 512 in_channels
sd  | Running on local URL:  http://localhost:7860/
sd  | 
sd  | To create a public link, set `share=True` in `launch()`.

Maybe set this up on your system without docker to determine it's actually a docker issue

from stable-diffusion.

altryne commented on July 18, 2024

Can you try running the webui.py directly and not through relauncher.py and put a stacktrace you get with the error?

from stable-diffusion.

ChrisAcrobat commented on July 18, 2024

Running python -u scripts/webui.py has provided two results for me:

Traceback (most recent call last):
  File "/sd/scripts/webui.py", line 3, in <module>
    from frontend.frontend import draw_gradio_ui
ModuleNotFoundError: No module named 'frontend'

That one can have been a badly timed cloned of the repo. ☝️

Loaded GFPGAN
Loaded RealESRGAN with model RealESRGAN_x4plus
Loading model from models/ldm/stable-diffusion-v1/model.ckpt
Global Step: 470000
LatentDiffusion: Running in eps-prediction mode
Killed

from stable-diffusion.

altryne commented on July 18, 2024

Yeah we split the frontend to it's own module before, so the first issue is fixed by pulling the latest.
The second one doesn't throw any errors? Just Killed? 😩

from stable-diffusion.

yourjelly commented on July 18, 2024

I am getting the same issue in a docker running on linux.

from stable-diffusion.

altryne commented on July 18, 2024

@yourjelly could be a gradio issue with ports and such inside docker.

if you don't mind testing, download webui_playground.py from https://github.com/hlky/stable-diffusion-webui and put in the same directory, and run python webui_playground.py and see if that crashes as well?

from stable-diffusion.

yourjelly commented on July 18, 2024

Yep, likewise, playground runs fine. Although, I can only see it from the public link not my internal one. That might be a misconfiguration on my end though.

from stable-diffusion.

toboshii commented on July 18, 2024

In my case (and I assume the same for the others as well) it's because it's getting killed by the kernel OOM killer:

[1888018.327103] Out of memory: Killed process 2510239 (python) total-vm:20822252kB, anon-rss:6211836kB, file-rss:127360kB, shmem-rss:10240kB, UID:1000 pgtables:16556kB oom_score_adj:0

It's trying to allocate ~20GB of ram, and I only have about 6GB available.

from stable-diffusion.

yourjelly commented on July 18, 2024

I have observed this too, eating ram before being killed.
I followed it to webui.py model = load_model_from_config(config, opt.ckpt)
Then that takes it to util.py where 'get_obj_from_str(string, reload=False)' runs twice before it dies
There might be more it gets through but thats how close ive got so far.

I'm gonna stick another 16GB of ram in my server and see if it gets further.

from stable-diffusion.

yourjelly commented on July 18, 2024

Yep, I got past that point now. It's just trying to eat too much ram. Hovering around 10GB of usage afterwards.

from stable-diffusion.

altryne commented on July 18, 2024

@yourjelly have you tried running it with the --turbo mode or the --optimized-turbo flags on and see if it's better?

from stable-diffusion.

altryne commented on July 18, 2024

cc @toboshii

from stable-diffusion.

yourjelly commented on July 18, 2024

Will do, because my GPU ran out of memory when i tried to text2img

!!Runtime error (txt2img)!! 
 CUDA out of memory. Tried to allocate 1024.00 MiB (GPU 0; 7.79 GiB total capacity; 5.64 GiB already allocated; 1008.06 MiB free; 5.75 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
exiting...calling os._exit(0)

from stable-diffusion.

yourjelly commented on July 18, 2024

Optimized Turbo works using 98% of GPU ram

from stable-diffusion.

altryne commented on July 18, 2024

ok progress! It doesn't crash anymore without a notification, it's a good thing :)
I guess we found the issue, now question is, given --turbo, should we close this issue?

from stable-diffusion.

JoshuaKimsey commented on July 18, 2024

Getting the same issue on Linux installed using this guide/script.

Script creator here, I'm really not sure what to make of this issue? It's not associated with my script itself, but might be tied back to a faulty conda env? Make sure you're using the newest version of my script and make sure it pulls in the latest updates from the repo. Choose no on the previous parameters question, then yes on the do you want to update screen. If that fails, delete the conda env, conda env remove -n lsd, run the script again, select no on the previous parameters and let it generate a new one.

If it still fails after that, then it is definitely something tied back either to your conda installation itself or a niche bug inside of the python code, most likely the latter at that point since it will at least partially run.

from stable-diffusion.

ChrisAcrobat commented on July 18, 2024

[1888018.327103] Out of memory: Killed process 2510239 (python) total-vm:20822252kB, anon-rss:6211836kB, file-rss:127360kB, shmem-rss:10240kB, UID:1000 pgtables:16556kB oom_score_adj:0

@toboshii I haven't seen that error.

@yourjelly have you tried running it with the --turbo mode or the --optimized-turbo flags on and see if it's better?

I guess we found the issue, now question is, given --turbo, should we close this issue?

Is that a Docker compose flag? Or could it be included when ’Relaunch count’?
If the problem can't be optimised away, then this solution should at least be mentioned in the wiki.

I haven't tried the solution yet (or searched about it, just woke up 🌄), but I will check it out as soon as I can!

from stable-diffusion.

yourjelly commented on July 18, 2024

I believe it should be added here python -u scripts/webui.py --optimized-turbo in the entrypoint.sh file for dockers

from stable-diffusion.

toboshii commented on July 18, 2024

Script creator here, I'm really not sure what to make of this issue?

I don't think it's anything off with your script, I only mentioned that to make it clear I was running on bare metal and was using a "supported" method of installation (I hadn't just hacked stuff together myself 🤣). This was in a clean miniconda install, I tried rebuilding the env, same issue.

have you tried running it with the --turbo mode or the --optimized-turbo flags on and see if it's better?

--turbo doesn't seem to exist and --optimized-turbo made no difference in my case.

@toboshii I haven't seen that error.

Did you look in the kernel logs? If it's the same issue you should be able to find it there using dmesg (assuming WSL provides for that, not sure, haven't used Windows in like 12 years 😅)

All in all I'm not really sure this is a "bug" or "issue" I think maybe in my and others cases we just greatly underestimated the amount of memory needed to load the models. As @yourjelly did, I moved to trying it on another machine with 32GB free and had no issues. The original machine I'm trying it on (my desktop) has 16GB and generally around 6-8GB free. From what I see on both machines it needs a minimum of about 26GB free to load the model initially and then idles around 10GB as @yourjelly mentioned, which seems pretty odd given the model is ~4GB but honestly this is my first foray into AI stuff outside of collabs, etc so maybe this is expected.

from stable-diffusion.

ChrisAcrobat commented on July 18, 2024

dmesg in WSL: dmesg: read kernel buffer failed: Operation not permitted 🙂
I'll be monitoring the Windows logs, just to confirm the hypothesis.

from stable-diffusion.

hlky commented on July 18, 2024

@toboshii @altryne meant --optimized or --optimized-turbo

Ideally you need 8gb+ vram and 16gb+ ram

The --optimized option is designed for 4gb vram and --optimized-turbo for 6-8gb vram, and will increase ram usage compared to without either of those options.

from stable-diffusion.

ChrisAcrobat commented on July 18, 2024

My laptop has 16 GB RAM installed and 8 GB video memory. I do not want to compare, because I understand they are different projects with diffident goals, but I have able with hat to start (one, all I have tried) other Stable Diffusion-project.
EDIT: --optimized-turbo did not solve it for me.

from stable-diffusion.

ChrisAcrobat commented on July 18, 2024

Just read this #134. I will try again.

from stable-diffusion.

ChrisAcrobat commented on July 18, 2024

Sadly it still crashed.

sd | saved RealESRGAN_x4plus_anime_6B.pth
sd | entrypoint.sh: Launching...
sd | Loaded GFPGAN
sd | Loaded RealESRGAN with model RealESRGAN_x4plus
sd | Loading model from models/ldm/stable-diffusion-v1/model.ckpt
sd | Global Step: 470000
sd | UNet: Running in eps-prediction mode
sd | entrypoint.sh: Process is ending. Relaunching in 0.5s...
sd | /sd/entrypoint.sh: line 89:   559 Killed                  python -u scripts/webui.py --optimized-turbo
sd | entrypoint.sh: Launching...
sd | Relaunch count: 1

from stable-diffusion.

ChrisAcrobat commented on July 18, 2024

Slight progress. The UI is now starting in --optimized, but with optimizedSD.ddpm.UNet removed. This means of cause that UI is not working, but now I know it only is optimizedSD.ddpm.UNet (while --optimized) that causing problems for me.
Are there other models I can switch it out for? Any tip on how to do so?

from stable-diffusion.

ChrisAcrobat commented on July 18, 2024

I have seen that Docker has an argument for disabling the out-of-memory watcher (--oom-kill-disable). I'm trying to get it to work and will reply later.

from stable-diffusion.

webui.py crashes in Docker on a Windows (WSL) machine about stable-diffusion HOT 28 CLOSED

Comments (28)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent