returned non-zero exit status 1. about kohya_ss HOT 15 CLOSED

bmaltais commented on July 28, 2024

returned non-zero exit status 1.

from kohya_ss.

Comments (15)

bmaltais commented on July 28, 2024

This is usually the result of a bad folder structure for the training. Did you try to do a dreambooth? If so, did you use the dreambooth folder prep tool to create it?

from kohya_ss.

healthyfat commented on July 28, 2024

Yes. I used Dreambooth Lora prepare data button in Tool tab to create training folders.

from kohya_ss.

NaughtDZ commented on July 28, 2024

Same issue here.I wonder did reg picture need prompt .txt file?

from kohya_ss.

bmaltais commented on July 28, 2024

Usually there is no need for .txt in reg folder

from kohya_ss.

qqualia commented on July 28, 2024

Was going to post I'm also getting this err, but found that switching from a .safetensors custom model I'd downloaded to standard 1.5 fixed it. So, I'm guessing there are errors around either safetensors format or certain custom models?

Edit: Automatic 3 (ckpt) through up an error too. So not sure.

from kohya_ss.

mykeehu commented on July 28, 2024

Same problem here with DreamBooth training, what is the solution? "Division by zero"?

CUDA SETUP: Loading binary H:\Kohya-DB\kohya_ss\venv\lib\site-packages\bitsandbytes\libbitsandbytes_cuda116.dll...
use 8-bit Adam optimizer
Traceback (most recent call last):
  File "H:\Kohya-DB\kohya_ss\train_db.py", line 337, in <module>
    train(args)
  File "H:\Kohya-DB\kohya_ss\train_db.py", line 178, in train
    num_train_epochs = math.ceil(args.max_train_steps / num_update_steps_per_epoch)
ZeroDivisionError: division by zero
Traceback (most recent call last):
  File "C:\Users\Mykee\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\Mykee\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "H:\Kohya-DB\kohya_ss\venv\Scripts\accelerate.exe\__main__.py", line 7, in <module>
  File "H:\Kohya-DB\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 45, in main
    args.func(args)
  File "H:\Kohya-DB\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1104, in launch_command
    simple_launcher(args)
  File "H:\Kohya-DB\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 567, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['H:\\Kohya-DB\\kohya_ss\\venv\\Scripts\\python.exe', 'train_db.py', '--enable_bucket', '--pretrained_model_name_or_path=runwayml/stable-diffusion-v1-5', '--train_data_dir=H:\\Stable-Diffusion-Automatic\\Dreambooth\\onoffupftv\\train sorted BLIP\\sitting', '--resolution=512,512', '--output_dir=V:\\!SDModels\\Kohya', '--logging_dir=H:\\Stable-Diffusion-Automatic\\Dreambooth\\onoffupftv\\log', '--save_model_as=safetensors', '--output_name=oof16a', '--max_data_loader_n_workers=1', '--learning_rate=1e-5', '--lr_scheduler=constant', '--train_batch_size=2', '--save_every_n_epochs=1', '--mixed_precision=bf16', '--save_precision=bf16', '--seed=1234', '--caption_extension=.txt', '--cache_latents', '--max_data_loader_n_workers=1', '--xformers', '--use_8bit_adam']' returned non-zero exit status 1.

Ok, I was the stupid one. First you have to prepare the folder in the Tool tab and then load it. Even though the images were there, it needed the preparation.

from kohya_ss.

marcotrani commented on July 28, 2024

Not sure but seems same problem, I've tried out some test but can't resolve... This is the error, what can I do?

CUDA SETUP: Loading binary C:\ai\Kohya\kohya_ss\venv\lib\site-packages\bitsandbytes\libbitsandbytes_cuda116.dll... use 8-bit Adam optimizer running training / 学習開始 num train images * repeats / 学習画像の数×繰り返し回数: 1500 num reg images / 正則化画像の数: 0 num batches per epoch / 1epochのバッチ数: 750 num epochs / epoch数: 1 batch size per device / バッチサイズ: 2 total train batch size (with parallel & distributed & accumulation) / 総バッチサイズ（並列学習、勾配合計含む）: 2 gradient accumulation steps / 勾配を合計するステップ数 = 1 total optimization steps / 学習ステップ数: 750 steps: 0%| | 0/750 [00:00<?, ?it/s]epoch 1/1 Error no kernel image is available for execution on the device at line 167 in file D:\ai\tool\bitsandbytes\csrc\ops.cu Traceback (most recent call last): File "C:\Users\i5Desktop7600k\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\i5Desktop7600k\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "C:\ai\Kohya\kohya_ss\venv\Scripts\accelerate.exe\__main__.py", line 7, in <module> File "C:\ai\Kohya\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 45, in main args.func(args) File "C:\ai\Kohya\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1104, in launch_command simple_launcher(args) File "C:\ai\Kohya\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 567, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['C:\\ai\\Kohya\\kohya_ss\\venv\\Scripts\\python.exe', 'train_network.py', '--bucket_reso_steps=1', '--bucket_no_upscale', '--pretrained_model_name_or_path=C:/ai/stable-diffusion-webui/models/Stable-diffusion/v1-5-pruned-emaonly.ckpt', '--train_data_dir=C:/mauroprellilora/images formatted/150_mauroprelli/img', '--resolution=512,512', '--output_dir=C:/mauroprellilora/images formatted/150_mauroprelli/model', '--logging_dir=C:/mauroprellilora/images formatted/150_mauroprelli/log', '--network_alpha=128', '--save_model_as=safetensors', '--network_module=networks.lora', '--text_encoder_lr=5e-5', '--unet_lr=0.0001', '--network_dim=128', '--output_name=test', '--lr_scheduler_num_cycles=1', '--learning_rate=0.0001', '--lr_scheduler=constant', '--train_batch_size=2', '--max_train_steps=750', '--save_every_n_epochs=1', '--mixed_precision=no', '--save_precision=fp16', '--seed=1234', '--caption_extension=.txt', '--cache_latents', '--max_data_loader_n_workers=1', '--clip_skip=2', '--bucket_reso_steps=64', '--xformers', '--use_8bit_adam', '--bucket_no_upscale']' returned non-zero exit status 1.

from kohya_ss.

healthyfat commented on July 28, 2024

C:/mauroprellilora/images formatted/150_mauroprelli/img

I think your folder structure is wrong.

Maybe try placing folder /150_mauroprelli with images inside /img folder and use the path to the folder that contains folder with images. So for example:

folder structure should be:

/img/150_mauroprelli

and the actual path:

/img

That worked in my case.

from kohya_ss.

Nezara commented on July 28, 2024

I think something is up with the folder preparation tool, last two times I used it, it continuously created a recursive folder structure until python shut it down.

from kohya_ss.

NaughtDZ commented on July 28, 2024

In fact, I still encountered some problems in the version made by others using this project, but the problem was solved after I adjusted the virtual memory space of this partition to 16G(I have 32G of physical memory).

from kohya_ss.

marcotrani commented on July 28, 2024

Thank you mate, I've just tried but nothing. Consider when I'm using the same tecnique on other PC with 3080Ti (instead of 1080Ti) all works properly, also with folder like /150_mauro prelli with spaces... I really don't know why it doesn't work on Intel 7700k with 1080Ti updated drivers and same version of all, Python etc etc...

C:/mauroprellilora/images formatted/150_mauroprelli/img

I think your folder structure is wrong.

Maybe try placing folder /150_mauroprelli with images inside /img folder and use the path to the folder that contains folder with images. So for example:

folder structure should be:

/img/150_mauroprelli

and the actual path:

/img

That worked in my case.

from kohya_ss.

Jocha00001 commented on July 28, 2024

I encountered the same situation as you. If you use json file in lora network weight.
It may be because the json file is not enabled (it should be a bug of koyha, it can be enabled before but it is not working now), and the input of the corresponding parameters needs to be completed manually

You can try to manually enter to solve it.Hope this helps

from kohya_ss.

revolvedai commented on July 28, 2024

This is usually the result of a bad folder structure for the training. Did you try to do a dreambooth? If so, did you use the dreambooth folder prep tool to create it?

Could this error be named a bit different so it is a bit obvious to the casual user?
Aka "returned non-zero exit status of 1: check your folder paths"

from kohya_ss.

rotabul commented on July 28, 2024

I want to share my experiences when solving the same issue. First I didn't notice I was on the Dreambooth tab instead of the Lora tab when starting the training. They seem to be the same but probably are not the same (I'm running it for the first time).

So after I switched to the Lora tab and started the training, the result was the same - an error. So I changed the "Optimizer" to AdamW instead of AdamW8bit (be sure you are on the Lora tab again) and started the training. And now it worked.

edit: I forgot to mention I also had to change the Mixed Precision option to fp16.

I hope it helps you to solve the problem as well.

from kohya_ss.

Cali4our commented on July 28, 2024

I have fixed it by simply moving the folder I put my images, log and output; into my D drive.

For some reason, it doesn't let me train on desktop. Make sure you try to put on your drive, documents or somewhere else other than Desktop and try it like that. It may work.

from kohya_ss.

returned non-zero exit status 1. about kohya_ss HOT 15 CLOSED

Comments (15)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent