Can't train a lora for SDXL from a fresh install.,about bmaltais/kohya_ss

Comments (22)

eija06 commented on July 28, 2024

looks like something wrong with your CUDA installation. installed requirements.txt ?

from kohya_ss.

bmaltais commented on July 28, 2024

Did you install CUDA 11.8 as per the README instructions?

from kohya_ss.

Deejay85 commented on July 28, 2024

I installed not only Kohya from the setup.bat, but also CuDDN, bitsandbytes, CUDA 11.8 (cuda_11.8.0_522.06_windows to be precise), the files required in the sd-script folder, and I updated PIP from venv and redid all of that...I'm not sure what else I need to do.

from kohya_ss.

machineminded commented on July 28, 2024

I'm having a similar issue. I've been trying to get kohya to work for a few days, and I see a tangentially related error:

NotImplementedError: No operator found for `memory_efficient_attention_forward` with inputs:
     query       : shape=(1, 15744, 1, 512) (torch.float32)
     key         : shape=(1, 15744, 1, 512) (torch.float32)
     value       : shape=(1, 15744, 1, 512) (torch.float32)
     attn_bias   : <class 'NoneType'>
     p           : 0.0

Same as @Deejay85 - CUDA 11.8, CuCDDN... ran everything from setup.bat.

from kohya_ss.

Deejay85 commented on July 28, 2024

I was hoping that, since someone else is having the same problem I am, that an answer would be forthcoming, but apparently not. I did try to install the new version that was released today, and even tried to reinstall all the packages for Kohya just in case, but that didn't fix it either...I seem to be getting the same messages as I did before. I really don't know if it is something on my end, or if it's something on Kohya's end, but I really would wish the dev would take a look at it, so that I at least know where to start in fixing the problem.

from kohya_ss.

Deejay85 commented on July 28, 2024

Didn't want to make a new thread, so I decided to bump the old one I made two weeks ago. I tried copying the newest release into Kohya, but that didn't make any difference, so even after two releases, I'm still having the same problems I did before.

from kohya_ss.

machineminded commented on July 28, 2024

I ended up uninstalling anything related to python, cuda, nvidia, and microsoft development (cpp redistributables), then reinstalled and it fixed all of my issues. Before I also had Cuda 11.8 and 12.x installed and I'm guessing something went stupid there. So I stuck with cuda 11.8 this time. But not really sure - basically uninstalling and reinstalling fixed everything.

from kohya_ss.

bmaltais commented on July 28, 2024

Yeah, so many thing. A break down within the software stack… This was the best thing to do. Glad it fixed thing for you.

from kohya_ss.

Agnusse commented on July 28, 2024

I am having the exact same issue. Did anyone find a solution that does not involve reinstalling everything?

from kohya_ss.

Deejay85 commented on July 28, 2024

I uninstalled everything as listed by Machineminded, and mine is still producing the same exact problems as before. Should I paste the entire log just to verify?

from kohya_ss.

Deejay85 commented on July 28, 2024

Downloaded the newest version of Kohya, did a fresh install to a new directory, installed everything, and here are the results I got when I tried to train something:

07:34:51-795074 INFO     Kohya_ss GUI version: v24.1.4
fatal: not a git repository (or any of the parent directories): .git
07:34:52-077285 ERROR    Error during Git operation: Command '['git', 'submodule', 'update', '--init', '--recursive',
                         '--quiet']' returned non-zero exit status 128.
07:34:52-081194 INFO     nVidia toolkit detected
07:34:53-412725 INFO     Torch 2.1.2+cu118
07:34:53-437137 INFO     Torch backend: nVidia CUDA 11.8 cuDNN 8905
07:34:53-439117 INFO     Torch detected GPU: NVIDIA GeForce RTX 4090 VRAM 24564 Arch (8, 9) Cores 128
07:34:53-444947 INFO     Python version is 3.10.11 (tags/v3.10.11:7d4cc5a, Apr  5 2023, 00:38:17) [MSC v.1929 64 bit
                         (AMD64)]
07:34:53-447878 INFO     Verifying modules installation status from requirements_pytorch_windows.txt...
07:34:53-450808 INFO     Verifying modules installation status from requirements_windows.txt...
07:34:53-451785 WARNING  Package wrong version: bitsandbytes 0.41.2.post2 required 0.43.0
07:34:53-453735 INFO     Installing package: bitsandbytes==0.43.0
07:34:58-070392 INFO     Verifying modules installation status from requirements.txt...
07:35:06-749071 INFO     headless: False
07:35:06-783250 INFO     Using shell=True when running external commands...
Running on local URL:  http://127.0.0.1:7861

To create a public link, set `share=True` in `launch()`.
IMPORTANT: You are using gradio version 4.26.0, however version 4.29.0 is available, please upgrade.
--------
Exception in thread Thread-5 (_do_normal_analytics_request):
Traceback (most recent call last):
  File "M:\z\venv\lib\site-packages\httpx\_transports\default.py", line 69, in map_httpcore_exceptions
    yield
  File "M:\z\venv\lib\site-packages\httpx\_transports\default.py", line 233, in handle_request
    resp = self._pool.handle_request(req)
  File "M:\z\venv\lib\site-packages\httpcore\_sync\connection_pool.py", line 216, in handle_request
    raise exc from None
  File "M:\z\venv\lib\site-packages\httpcore\_sync\connection_pool.py", line 196, in handle_request
    response = connection.handle_request(
  File "M:\z\venv\lib\site-packages\httpcore\_sync\connection.py", line 101, in handle_request
    return self._connection.handle_request(request)
  File "M:\z\venv\lib\site-packages\httpcore\_sync\http11.py", line 143, in handle_request
    raise exc
  File "M:\z\venv\lib\site-packages\httpcore\_sync\http11.py", line 95, in handle_request
    self._send_request_body(**kwargs)
  File "M:\z\venv\lib\site-packages\httpcore\_sync\http11.py", line 166, in _send_request_body
    self._send_event(event, timeout=timeout)
  File "M:\z\venv\lib\site-packages\httpcore\_sync\http11.py", line 175, in _send_event
    self._network_stream.write(bytes_to_send, timeout=timeout)
  File "M:\z\venv\lib\site-packages\httpcore\_backends\sync.py", line 133, in write
    with map_exceptions(exc_map):
  File "C:\Users\Ande\AppData\Local\Programs\Python\Python310\lib\contextlib.py", line 153, in __exit__
    self.gen.throw(typ, value, traceback)
  File "M:\z\venv\lib\site-packages\httpcore\_exceptions.py", line 14, in map_exceptions
    raise to_exc(exc) from exc
httpcore.WriteTimeout: The write operation timed out

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\Ande\AppData\Local\Programs\Python\Python310\lib\threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "C:\Users\Ande\AppData\Local\Programs\Python\Python310\lib\threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "M:\z\venv\lib\site-packages\gradio\analytics.py", line 63, in _do_normal_analytics_request
    httpx.post(url, data=data, timeout=5)
  File "M:\z\venv\lib\site-packages\httpx\_api.py", line 319, in post
    return request(
  File "M:\z\venv\lib\site-packages\httpx\_api.py", line 106, in request
    return client.request(
  File "M:\z\venv\lib\site-packages\httpx\_client.py", line 827, in request
    return self.send(request, auth=auth, follow_redirects=follow_redirects)
  File "M:\z\venv\lib\site-packages\httpx\_client.py", line 914, in send
    response = self._send_handling_auth(
  File "M:\z\venv\lib\site-packages\httpx\_client.py", line 942, in _send_handling_auth
    response = self._send_handling_redirects(
  File "M:\z\venv\lib\site-packages\httpx\_client.py", line 979, in _send_handling_redirects
    response = self._send_single_request(request)
  File "M:\z\venv\lib\site-packages\httpx\_client.py", line 1015, in _send_single_request
    response = transport.handle_request(request)
  File "M:\z\venv\lib\site-packages\httpx\_transports\default.py", line 232, in handle_request
    with map_httpcore_exceptions():
  File "C:\Users\Ande\AppData\Local\Programs\Python\Python310\lib\contextlib.py", line 153, in __exit__
    self.gen.throw(typ, value, traceback)
  File "M:\z\venv\lib\site-packages\httpx\_transports\default.py", line 86, in map_httpcore_exceptions
    raise mapped_exc(message) from exc
httpx.WriteTimeout: The write operation timed out
07:35:38-075101 INFO     Loading config...
07:35:43-538559 INFO     Start training LoRA Standard ...
07:35:43-540511 INFO     Validating lr scheduler arguments...
07:35:43-541488 INFO     Validating optimizer arguments...
07:35:43-542464 INFO     Validating M:/kohya_ss/Sampleimages/log existence and writability... SUCCESS
07:35:43-543440 INFO     Validating M:/kohya_ss/Sampleimages/model existence and writability... SUCCESS
07:35:43-544417 INFO     Validating M:/StableDiffusion/models/Stable-diffusion/SDXL/sd_xl_base_1.0.safetensors
                         existence... SUCCESS
07:35:43-545393 INFO     Validating M:/kohya_ss/Sampleimages/Images existence... SUCCESS
07:35:43-546370 INFO     Folder 4_giganticbreasts: 4 repeats found
07:35:43-547347 INFO     Folder 4_giganticbreasts: 115 images found
07:35:43-548324 INFO     Folder 4_giganticbreasts: 115 * 4 = 460 steps
07:35:43-551252 INFO     Regulatization factor: 1
07:35:43-553205 INFO     Total steps: 460
07:35:43-553205 INFO     Train batch size: 1
07:35:43-554183 INFO     Gradient accumulation steps: 1
07:35:43-555159 INFO     Epoch: 40
07:35:43-556136 INFO     Max train steps: 1600
07:35:43-556136 INFO     stop_text_encoder_training = 0
07:35:43-557111 INFO     lr_warmup_steps = 160
07:35:43-559066 INFO     Saving training config to
                         M:/kohya_ss/Sampleimages/model\giganticbreasts_20240512-073543.json...
07:35:43-561017 INFO     Executing command: M:\z\venv\Scripts\accelerate.EXE launch --dynamo_backend no --dynamo_mode
                         default --gpu_ids 10de268488e21043 --mixed_precision bf16 --num_processes 1 --num_machines 1
                         --num_cpu_threads_per_process 2 M:/z/sd-scripts/sdxl_train_network.py --config_file
                         M:/kohya_ss/Sampleimages/model/config_lora-20240512-073543.toml
07:35:43-564925 INFO     Command executed.
2024-05-12 07:35:51 INFO     Loading settings from                                                    train_util.py:3744
                             M:/kohya_ss/Sampleimages/model/config_lora-20240512-073543.toml...
                    INFO     M:/kohya_ss/Sampleimages/model/config_lora-20240512-073543               train_util.py:3763
2024-05-12 07:35:51 INFO     prepare tokenizers                                                   sdxl_train_util.py:134
2024-05-12 07:35:53 INFO     update token length: 75                                              sdxl_train_util.py:159
                    INFO     Using DreamBooth method.                                               train_network.py:172
                    INFO     prepare images.                                                          train_util.py:1572
                    INFO     found directory M:\kohya_ss\Sampleimages\Images\4_giganticbreasts        train_util.py:1519
                             contains 115 image files
                    INFO     460 train images with repeating.                                         train_util.py:1613
                    INFO     0 reg images.                                                            train_util.py:1616
                    WARNING  no regularization images / 正則化画像が見つかりませんでした              train_util.py:1621
                    INFO     [Dataset 0]                                                              config_util.py:565
                               batch_size: 1
                               resolution: (1024, 1024)
                               enable_bucket: True
                               network_multiplier: 1.0
                               min_bucket_reso: 64
                               max_bucket_reso: 2048
                               bucket_reso_steps: 64
                               bucket_no_upscale: True

                               [Subset 0 of Dataset 0]
                                 image_dir: "M:\kohya_ss\Sampleimages\Images\4_giganticbreasts"
                                 image_count: 115
                                 num_repeats: 4
                                 shuffle_caption: True
                                 keep_tokens: 1
                                 keep_tokens_separator:
                                 secondary_separator: None
                                 enable_wildcard: False
                                 caption_dropout_rate: 0.0
                                 caption_dropout_every_n_epoches: 0
                                 caption_tag_dropout_rate: 0.0
                                 caption_prefix: None
                                 caption_suffix: None
                                 color_aug: False
                                 flip_aug: False
                                 face_crop_aug_range: None
                                 random_crop: False
                                 token_warmup_min: 1,
                                 token_warmup_step: 0,
                                 is_reg: False
                                 class_tokens: giganticbreasts
                                 caption_extension: .txt


                    INFO     [Dataset 0]                                                              config_util.py:571
                    INFO     loading image sizes.                                                      train_util.py:853
100%|█████████████████████████████████████████████████████████████████████████████| 115/115 [00:00<00:00, 39272.51it/s]
                    INFO     make buckets                                                              train_util.py:859
                    WARNING  min_bucket_reso and max_bucket_reso are ignored if bucket_no_upscale is   train_util.py:876
                             set, because bucket reso is defined by image size automatically /
                             bucket_no_upscaleが指定された場合は、bucketの解像度は画像サイズから自動計
                             算されるため、min_bucket_resoとmax_bucket_resoは無視されます
                    INFO     number of images (including repeats) /                                    train_util.py:905
                             各bucketの画像枚数（繰り返し回数を含む）
                    INFO     bucket 0: resolution (576, 832), count: 4                                 train_util.py:910
                    INFO     bucket 1: resolution (576, 960), count: 4                                 train_util.py:910
                    INFO     bucket 2: resolution (640, 640), count: 4                                 train_util.py:910
                    INFO     bucket 3: resolution (704, 960), count: 4                                 train_util.py:910
                    INFO     bucket 4: resolution (704, 1280), count: 4                                train_util.py:910
                    INFO     bucket 5: resolution (704, 1344), count: 4                                train_util.py:910
                    INFO     bucket 6: resolution (704, 1408), count: 4                                train_util.py:910
                    INFO     bucket 7: resolution (768, 704), count: 4                                 train_util.py:910
                    INFO     bucket 8: resolution (768, 1152), count: 12                               train_util.py:910
                    INFO     bucket 9: resolution (768, 1216), count: 4                                train_util.py:910
                    INFO     bucket 10: resolution (768, 1344), count: 4                               train_util.py:910
                    INFO     bucket 11: resolution (832, 768), count: 4                                train_util.py:910
                    INFO     bucket 12: resolution (832, 896), count: 4                                train_util.py:910
                    INFO     bucket 13: resolution (832, 1024), count: 4                               train_util.py:910
                    INFO     bucket 14: resolution (832, 1088), count: 36                              train_util.py:910
                    INFO     bucket 15: resolution (832, 1152), count: 68                              train_util.py:910
                    INFO     bucket 16: resolution (832, 1216), count: 44                              train_util.py:910
                    INFO     bucket 17: resolution (896, 832), count: 4                                train_util.py:910
                    INFO     bucket 18: resolution (896, 1024), count: 16                              train_util.py:910
                    INFO     bucket 19: resolution (896, 1088), count: 40                              train_util.py:910
                    INFO     bucket 20: resolution (896, 1152), count: 40                              train_util.py:910
                    INFO     bucket 21: resolution (960, 960), count: 16                               train_util.py:910
                    INFO     bucket 22: resolution (960, 1024), count: 36                              train_util.py:910
                    INFO     bucket 23: resolution (1024, 896), count: 4                               train_util.py:910
                    INFO     bucket 24: resolution (1024, 960), count: 4                               train_util.py:910
                    INFO     bucket 25: resolution (1024, 1024), count: 36                             train_util.py:910
                    INFO     bucket 26: resolution (1088, 832), count: 8                               train_util.py:910
                    INFO     bucket 27: resolution (1088, 896), count: 4                               train_util.py:910
                    INFO     bucket 28: resolution (1152, 832), count: 8                               train_util.py:910
                    INFO     bucket 29: resolution (1152, 896), count: 8                               train_util.py:910
                    INFO     bucket 30: resolution (1216, 832), count: 12                              train_util.py:910
                    INFO     bucket 31: resolution (1280, 704), count: 4                               train_util.py:910
                    INFO     bucket 32: resolution (1344, 768), count: 8                               train_util.py:910
                    INFO     mean ar error (without repeats): 0.012568990147454271                     train_util.py:915
                    WARNING  clip_skip will be unexpected / SDXL学習ではclip_skipは動作しません   sdxl_train_util.py:343
                    INFO     preparing accelerator                                                  train_network.py:225
accelerator device: cpu
                    INFO     loading model for process 0/1                                         sdxl_train_util.py:30
                    INFO     load StableDiffusion checkpoint:                                      sdxl_train_util.py:70
                             M:/StableDiffusion/models/Stable-diffusion/SDXL/sd_xl_base_1.0.safete
                             nsors
                    INFO     building U-Net                                                       sdxl_model_util.py:192
2024-05-12 07:35:54 INFO     loading U-Net from checkpoint                                        sdxl_model_util.py:196
2024-05-12 07:36:06 INFO     U-Net: <All keys matched successfully>                               sdxl_model_util.py:202
                    INFO     building text encoders                                               sdxl_model_util.py:205
                    INFO     loading text encoders from checkpoint                                sdxl_model_util.py:258
                    INFO     text encoder 1: <All keys matched successfully>                      sdxl_model_util.py:272
2024-05-12 07:36:10 INFO     text encoder 2: <All keys matched successfully>                      sdxl_model_util.py:276
                    INFO     building VAE                                                         sdxl_model_util.py:279
                    INFO     loading VAE from checkpoint                                          sdxl_model_util.py:284
                    INFO     VAE: <All keys matched successfully>                                 sdxl_model_util.py:287
                    INFO     Enable xformers for U-Net                                                train_util.py:2660
Traceback (most recent call last):
  File "M:\z\sd-scripts\sdxl_train_network.py", line 185, in <module>
    trainer.train(args)
  File "M:\z\sd-scripts\train_network.py", line 242, in train
    vae.set_use_memory_efficient_attention_xformers(args.xformers)
  File "M:\z\venv\lib\site-packages\diffusers\models\modeling_utils.py", line 262, in set_use_memory_efficient_attention_xformers
    fn_recursive_set_mem_eff(module)
  File "M:\z\venv\lib\site-packages\diffusers\models\modeling_utils.py", line 258, in fn_recursive_set_mem_eff
    fn_recursive_set_mem_eff(child)
  File "M:\z\venv\lib\site-packages\diffusers\models\modeling_utils.py", line 258, in fn_recursive_set_mem_eff
    fn_recursive_set_mem_eff(child)
  File "M:\z\venv\lib\site-packages\diffusers\models\modeling_utils.py", line 258, in fn_recursive_set_mem_eff
    fn_recursive_set_mem_eff(child)
  File "M:\z\venv\lib\site-packages\diffusers\models\modeling_utils.py", line 255, in fn_recursive_set_mem_eff
    module.set_use_memory_efficient_attention_xformers(valid, attention_op)
  File "M:\z\venv\lib\site-packages\diffusers\models\attention_processor.py", line 260, in set_use_memory_efficient_attention_xformers
    raise ValueError(
ValueError: torch.cuda.is_available() should be True but is False. xformers' memory efficient attention is only available for GPU
Traceback (most recent call last):
  File "C:\Users\Ande\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\Ande\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "M:\z\venv\Scripts\accelerate.EXE\__main__.py", line 7, in <module>
  File "M:\z\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 47, in main
    args.func(args)
  File "M:\z\venv\lib\site-packages\accelerate\commands\launch.py", line 1017, in launch_command
    simple_launcher(args)
  File "M:\z\venv\lib\site-packages\accelerate\commands\launch.py", line 637, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['M:\\z\\venv\\Scripts\\python.exe', 'M:/z/sd-scripts/sdxl_train_network.py', '--config_file', 'M:/kohya_ss/Sampleimages/model/config_lora-20240512-073543.toml']' returned non-zero exit status 1.
07:36:12-768539 INFO     Training has ended.
Keyboard interruption in main thread... closing server.

In short the same old song and dance. 😩 Any advice?

from kohya_ss.

machineminded commented on July 28, 2024

ValueError: torch.cuda.is_available() should be True but is False.

Not sure what happened but you should be able to install torch+cu118 and resolve this. Check this link:

https://pytorch.org/get-started/locally/

from kohya_ss.

b-fission commented on July 28, 2024

@Deejay85
This part of your log indicates the most likely problem: --gpu_ids 10de268488e21043

The GPU IDs option on your config seems to be junk text. (It's an option located under the Accelerate launch category)
Leave it blank so it resembles the screenshot below, and the training should be able to run.

from kohya_ss.

bmaltais commented on July 28, 2024

I might add an input validator and log a message if it does not match the expected pattern

from kohya_ss.

Deejay85 commented on July 28, 2024

I tried leaving it blank, with spaces, dashes, and as two blocks of text separated only by a hyphen...none of that worked. I am using only one graphics card BTW, because 4090s don't grow on trees you know? 😜

from kohya_ss.

b-fission commented on July 28, 2024

Supposing you kept it blank for GPU ID, does the log still show this error you had before like ValueError: torch.cuda.is_available() should be True but is False .. or was it a different error?

from kohya_ss.

Deejay85 commented on July 28, 2024

Same error. If you want I could copy/paste the new log.

from kohya_ss.

b-fission commented on July 28, 2024

Sure, post your log output.

from kohya_ss.

Deejay85 commented on July 28, 2024

18:54:24-292864 INFO Kohya_ss GUI version: v24.1.4
fatal: not a git repository (or any of the parent directories): .git
18:54:24-530151 ERROR Error during Git operation: Command '['git', 'submodule', 'update', '--init', '--recursive',
'--quiet']' returned non-zero exit status 128.
18:54:24-535034 INFO nVidia toolkit detected
18:54:25-865018 INFO Torch 2.1.2+cu118
18:54:25-885525 INFO Torch backend: nVidia CUDA 11.8 cuDNN 8905
18:54:25-888454 INFO Torch detected GPU: NVIDIA GeForce RTX 4090 VRAM 24564 Arch (8, 9) Cores 128
18:54:25-892360 INFO Python version is 3.10.11 (tags/v3.10.11:7d4cc5a, Apr 5 2023, 00:38:17) [MSC v.1929 64 bit
(AMD64)]
18:54:25-895289 INFO Verifying modules installation status from requirements_pytorch_windows.txt...
18:54:25-898219 INFO Verifying modules installation status from requirements_windows.txt...
18:54:25-900172 INFO Verifying modules installation status from requirements.txt...
18:54:31-930997 INFO headless: False
18:54:31-969079 INFO Using shell=True when running external commands...
IMPORTANT: You are using gradio version 4.26.0, however version 4.29.0 is available, please upgrade.

Running on local URL: http://127.0.0.1:7860

To create a public link, set share=True in launch().
Exception in thread Thread-5 (_do_normal_analytics_request):
Traceback (most recent call last):
File "M:\kohya_ss\venv\lib\site-packages\httpx_transports\default.py", line 69, in map_httpcore_exceptions
yield
File "M:\kohya_ss\venv\lib\site-packages\httpx_transports\default.py", line 233, in handle_request
resp = self._pool.handle_request(req)
File "M:\kohya_ss\venv\lib\site-packages\httpcore_sync\connection_pool.py", line 216, in handle_request
raise exc from None
File "M:\kohya_ss\venv\lib\site-packages\httpcore_sync\connection_pool.py", line 196, in handle_request
response = connection.handle_request(
File "M:\kohya_ss\venv\lib\site-packages\httpcore_sync\connection.py", line 101, in handle_request
return self._connection.handle_request(request)
File "M:\kohya_ss\venv\lib\site-packages\httpcore_sync\http11.py", line 143, in handle_request
raise exc
File "M:\kohya_ss\venv\lib\site-packages\httpcore_sync\http11.py", line 95, in handle_request
self._send_request_body(**kwargs)
File "M:\kohya_ss\venv\lib\site-packages\httpcore_sync\http11.py", line 166, in _send_request_body
self._send_event(event, timeout=timeout)
File "M:\kohya_ss\venv\lib\site-packages\httpcore_sync\http11.py", line 175, in _send_event
self._network_stream.write(bytes_to_send, timeout=timeout)
File "M:\kohya_ss\venv\lib\site-packages\httpcore_backends\sync.py", line 133, in write
with map_exceptions(exc_map):
File "C:\Users\Ande\AppData\Local\Programs\Python\Python310\lib\contextlib.py", line 153, in exit
self.gen.throw(typ, value, traceback)
File "M:\kohya_ss\venv\lib\site-packages\httpcore_exceptions.py", line 14, in map_exceptions
raise to_exc(exc) from exc
httpcore.WriteTimeout: The write operation timed out

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "C:\Users\Ande\AppData\Local\Programs\Python\Python310\lib\threading.py", line 1016, in _bootstrap_inner
self.run()
File "C:\Users\Ande\AppData\Local\Programs\Python\Python310\lib\threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "M:\kohya_ss\venv\lib\site-packages\gradio\analytics.py", line 63, in _do_normal_analytics_request
httpx.post(url, data=data, timeout=5)
File "M:\kohya_ss\venv\lib\site-packages\httpx_api.py", line 319, in post
return request(
File "M:\kohya_ss\venv\lib\site-packages\httpx_api.py", line 106, in request
return client.request(
File "M:\kohya_ss\venv\lib\site-packages\httpx_client.py", line 827, in request
return self.send(request, auth=auth, follow_redirects=follow_redirects)
File "M:\kohya_ss\venv\lib\site-packages\httpx_client.py", line 914, in send
response = self._send_handling_auth(
File "M:\kohya_ss\venv\lib\site-packages\httpx_client.py", line 942, in _send_handling_auth
response = self._send_handling_redirects(
File "M:\kohya_ss\venv\lib\site-packages\httpx_client.py", line 979, in _send_handling_redirects
response = self._send_single_request(request)
File "M:\kohya_ss\venv\lib\site-packages\httpx_client.py", line 1015, in _send_single_request
response = transport.handle_request(request)
File "M:\kohya_ss\venv\lib\site-packages\httpx_transports\default.py", line 232, in handle_request
with map_httpcore_exceptions():
File "C:\Users\Ande\AppData\Local\Programs\Python\Python310\lib\contextlib.py", line 153, in exit
self.gen.throw(typ, value, traceback)
File "M:\kohya_ss\venv\lib\site-packages\httpx_transports\default.py", line 86, in map_httpcore_exceptions
raise mapped_exc(message) from exc
httpx.WriteTimeout: The write operation timed out
18:54:53-247570 INFO Loading config...
18:55:04-303432 INFO Save...
18:55:07-492658 INFO Start training LoRA Standard ...
18:55:07-493635 INFO Validating lr scheduler arguments...
18:55:07-495588 INFO Validating optimizer arguments...
18:55:07-496565 INFO Validating M:/kohya_ss/Sampleimages/log existence and writability... SUCCESS
18:55:07-497541 INFO Validating M:/kohya_ss/Sampleimages/model existence and writability... SUCCESS
18:55:07-498521 INFO Validating M:/StableDiffusion/models/Stable-diffusion/SDXL/sd_xl_base_1.0.safetensors
existence... SUCCESS
18:55:07-499494 INFO Validating M:/kohya_ss/Sampleimages/Images existence... SUCCESS
18:55:07-500471 INFO Folder 4_giganticbreasts: 4 repeats found
18:55:07-501447 INFO Folder 4_giganticbreasts: 115 images found
18:55:07-502424 INFO Folder 4_giganticbreasts: 115 * 4 = 460 steps
18:55:07-504377 INFO Regulatization factor: 1
18:55:07-505353 INFO Total steps: 460
18:55:07-508283 INFO Train batch size: 1
18:55:07-512189 INFO Gradient accumulation steps: 1
18:55:07-516095 INFO Epoch: 40
18:55:07-517072 INFO Max train steps: 1600
18:55:07-518049 INFO stop_text_encoder_training = 0
18:55:07-519025 INFO lr_warmup_steps = 160
18:55:07-520978 INFO Saving training config to
M:/kohya_ss/Sampleimages/model\giganticbreasts_20240526-185507.json...
18:55:07-521953 INFO Executing command: M:\kohya_ss\venv\Scripts\accelerate.EXE launch --dynamo_backend no
--dynamo_mode default --mixed_precision bf16 --num_processes 1 --num_machines 1
--num_cpu_threads_per_process 2 M:/kohya_ss/sd-scripts/sdxl_train_network.py --config_file
M:/kohya_ss/Sampleimages/model/config_lora-20240526-185507.toml
18:55:07-526836 INFO Command executed.
2024-05-26 18:55:14 INFO Loading settings from train_util.py:3744
M:/kohya_ss/Sampleimages/model/config_lora-20240526-185507.toml...
INFO M:/kohya_ss/Sampleimages/model/config_lora-20240526-185507 train_util.py:3763
2024-05-26 18:55:14 INFO prepare tokenizers sdxl_train_util.py:134
2024-05-26 18:55:15 INFO update token length: 75 sdxl_train_util.py:159
INFO Using DreamBooth method. train_network.py:172
INFO prepare images. train_util.py:1572
INFO found directory M:\kohya_ss\Sampleimages\Images\4_giganticbreasts train_util.py:1519
contains 115 image files
INFO 460 train images with repeating. train_util.py:1613
INFO 0 reg images. train_util.py:1616
WARNING no regularization images / 正則化画像が見つかりませんでした train_util.py:1621
INFO [Dataset 0] config_util.py:565
batch_size: 1
resolution: (1024, 1024)
enable_bucket: True
network_multiplier: 1.0
min_bucket_reso: 64
max_bucket_reso: 2048
bucket_reso_steps: 64
bucket_no_upscale: True

                           [Subset 0 of Dataset 0]
                             image_dir: "M:\kohya_ss\Sampleimages\Images\4_giganticbreasts"
                             image_count: 115
                             num_repeats: 4
                             shuffle_caption: True
                             keep_tokens: 1
                             keep_tokens_separator:
                             secondary_separator: None
                             enable_wildcard: False
                             caption_dropout_rate: 0.0
                             caption_dropout_every_n_epoches: 0
                             caption_tag_dropout_rate: 0.0
                             caption_prefix: None
                             caption_suffix: None
                             color_aug: False
                             flip_aug: False
                             face_crop_aug_range: None
                             random_crop: False
                             token_warmup_min: 1,
                             token_warmup_step: 0,
                             is_reg: False
                             class_tokens: giganticbreasts
                             caption_extension: .txt


                INFO     [Dataset 0]                                                              config_util.py:571
                INFO     loading image sizes.                                                      train_util.py:853

100%|█████████████████████████████████████████████████████████████████████████████| 115/115 [00:00<00:00, 39262.92it/s]
INFO make buckets train_util.py:859
WARNING min_bucket_reso and max_bucket_reso are ignored if bucket_no_upscale is train_util.py:876
set, because bucket reso is defined by image size automatically /
bucket_no_upscaleが指定された場合は、bucketの解像度は画像サイズから自動計
算されるため、min_bucket_resoとmax_bucket_resoは無視されます
INFO number of images (including repeats) / train_util.py:905
各bucketの画像枚数（繰り返し回数を含む）
INFO bucket 0: resolution (576, 832), count: 4 train_util.py:910
INFO bucket 1: resolution (576, 960), count: 4 train_util.py:910
INFO bucket 2: resolution (640, 640), count: 4 train_util.py:910
INFO bucket 3: resolution (704, 960), count: 4 train_util.py:910
INFO bucket 4: resolution (704, 1280), count: 4 train_util.py:910
INFO bucket 5: resolution (704, 1344), count: 4 train_util.py:910
INFO bucket 6: resolution (704, 1408), count: 4 train_util.py:910
INFO bucket 7: resolution (768, 704), count: 4 train_util.py:910
INFO bucket 8: resolution (768, 1152), count: 12 train_util.py:910
INFO bucket 9: resolution (768, 1216), count: 4 train_util.py:910
INFO bucket 10: resolution (768, 1344), count: 4 train_util.py:910
INFO bucket 11: resolution (832, 768), count: 4 train_util.py:910
INFO bucket 12: resolution (832, 896), count: 4 train_util.py:910
INFO bucket 13: resolution (832, 1024), count: 4 train_util.py:910
INFO bucket 14: resolution (832, 1088), count: 36 train_util.py:910
INFO bucket 15: resolution (832, 1152), count: 68 train_util.py:910
INFO bucket 16: resolution (832, 1216), count: 44 train_util.py:910
INFO bucket 17: resolution (896, 832), count: 4 train_util.py:910
INFO bucket 18: resolution (896, 1024), count: 16 train_util.py:910
INFO bucket 19: resolution (896, 1088), count: 40 train_util.py:910
INFO bucket 20: resolution (896, 1152), count: 40 train_util.py:910
INFO bucket 21: resolution (960, 960), count: 16 train_util.py:910
INFO bucket 22: resolution (960, 1024), count: 36 train_util.py:910
INFO bucket 23: resolution (1024, 896), count: 4 train_util.py:910
INFO bucket 24: resolution (1024, 960), count: 4 train_util.py:910
INFO bucket 25: resolution (1024, 1024), count: 36 train_util.py:910
INFO bucket 26: resolution (1088, 832), count: 8 train_util.py:910
INFO bucket 27: resolution (1088, 896), count: 4 train_util.py:910
INFO bucket 28: resolution (1152, 832), count: 8 train_util.py:910
INFO bucket 29: resolution (1152, 896), count: 8 train_util.py:910
INFO bucket 30: resolution (1216, 832), count: 12 train_util.py:910
INFO bucket 31: resolution (1280, 704), count: 4 train_util.py:910
INFO bucket 32: resolution (1344, 768), count: 8 train_util.py:910
INFO mean ar error (without repeats): 0.012568990147454271 train_util.py:915
WARNING clip_skip will be unexpected / SDXL学習ではclip_skipは動作しません sdxl_train_util.py:343
INFO preparing accelerator train_network.py:225
accelerator device: cpu
INFO loading model for process 0/1 sdxl_train_util.py:30
INFO load StableDiffusion checkpoint: sdxl_train_util.py:70
M:/StableDiffusion/models/Stable-diffusion/SDXL/sd_xl_base_1.0.safete
nsors
2024-05-26 18:55:16 INFO building U-Net sdxl_model_util.py:192
INFO loading U-Net from checkpoint sdxl_model_util.py:196
2024-05-26 18:55:28 INFO U-Net: sdxl_model_util.py:202
INFO building text encoders sdxl_model_util.py:205
INFO loading text encoders from checkpoint sdxl_model_util.py:258
INFO text encoder 1: sdxl_model_util.py:272
2024-05-26 18:55:32 INFO text encoder 2: sdxl_model_util.py:276
INFO building VAE sdxl_model_util.py:279
INFO loading VAE from checkpoint sdxl_model_util.py:284
INFO VAE: sdxl_model_util.py:287
INFO Enable xformers for U-Net train_util.py:2660
Traceback (most recent call last):
File "M:\kohya_ss\sd-scripts\sdxl_train_network.py", line 185, in
trainer.train(args)
File "M:\kohya_ss\sd-scripts\train_network.py", line 242, in train
vae.set_use_memory_efficient_attention_xformers(args.xformers)
File "M:\kohya_ss\venv\lib\site-packages\diffusers\models\modeling_utils.py", line 262, in set_use_memory_efficient_attention_xformers
fn_recursive_set_mem_eff(module)
File "M:\kohya_ss\venv\lib\site-packages\diffusers\models\modeling_utils.py", line 258, in fn_recursive_set_mem_eff
fn_recursive_set_mem_eff(child)
File "M:\kohya_ss\venv\lib\site-packages\diffusers\models\modeling_utils.py", line 258, in fn_recursive_set_mem_eff
fn_recursive_set_mem_eff(child)
File "M:\kohya_ss\venv\lib\site-packages\diffusers\models\modeling_utils.py", line 258, in fn_recursive_set_mem_eff
fn_recursive_set_mem_eff(child)
File "M:\kohya_ss\venv\lib\site-packages\diffusers\models\modeling_utils.py", line 255, in fn_recursive_set_mem_eff
module.set_use_memory_efficient_attention_xformers(valid, attention_op)
File "M:\kohya_ss\venv\lib\site-packages\diffusers\models\attention_processor.py", line 260, in set_use_memory_efficient_attention_xformers
raise ValueError(
ValueError: torch.cuda.is_available() should be True but is False. xformers' memory efficient attention is only available for GPU
Traceback (most recent call last):
File "C:\Users\Ande\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\Ande\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in run_code
exec(code, run_globals)
File "M:\kohya_ss\venv\Scripts\accelerate.EXE_main.py", line 7, in
File "M:\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 47, in main
args.func(args)
File "M:\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1017, in launch_command
simple_launcher(args)
File "M:\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 637, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['M:\kohya_ss\venv\Scripts\python.exe', 'M:/kohya_ss/sd-scripts/sdxl_train_network.py', '--config_file', 'M:/kohya_ss/Sampleimages/model/config_lora-20240526-185507.toml']' returned non-zero exit status 1.
18:55:35-725043 INFO Training has ended.

from kohya_ss.

b-fission commented on July 28, 2024

Can you look in the folder at C:\Users\yourname\.cache\huggingface\accelerate
If you see a file called default_config.yaml then delete that file, and see if that fixes it.

from kohya_ss.

Deejay85 commented on July 28, 2024

Surprisingly it did. 🎉 Now if I only knew what value was messing it up.

from kohya_ss.

b-fission commented on July 28, 2024

It's probably the gpu_ids setting in that file. The default value is all and I'll assume it wasn't at default which caused the problems here.

from kohya_ss.

Can't train a lora for SDXL from a fresh install. about kohya_ss HOT 22 CLOSED

Comments (22)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent