Comments (15)
This is usually the result of a bad folder structure for the training. Did you try to do a dreambooth? If so, did you use the dreambooth folder prep tool to create it?
from kohya_ss.
Yes. I used Dreambooth Lora prepare data button in Tool tab to create training folders.
from kohya_ss.
Same issue here.I wonder did reg picture need prompt .txt file?
from kohya_ss.
Usually there is no need for .txt in reg folder
from kohya_ss.
Was going to post I'm also getting this err, but found that switching from a .safetensors custom model I'd downloaded to standard 1.5 fixed it. So, I'm guessing there are errors around either safetensors format or certain custom models?
Edit: Automatic 3 (ckpt) through up an error too. So not sure.
from kohya_ss.
Same problem here with DreamBooth training, what is the solution? "Division by zero"?
CUDA SETUP: Loading binary H:\Kohya-DB\kohya_ss\venv\lib\site-packages\bitsandbytes\libbitsandbytes_cuda116.dll...
use 8-bit Adam optimizer
Traceback (most recent call last):
File "H:\Kohya-DB\kohya_ss\train_db.py", line 337, in <module>
train(args)
File "H:\Kohya-DB\kohya_ss\train_db.py", line 178, in train
num_train_epochs = math.ceil(args.max_train_steps / num_update_steps_per_epoch)
ZeroDivisionError: division by zero
Traceback (most recent call last):
File "C:\Users\Mykee\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\Mykee\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "H:\Kohya-DB\kohya_ss\venv\Scripts\accelerate.exe\__main__.py", line 7, in <module>
File "H:\Kohya-DB\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 45, in main
args.func(args)
File "H:\Kohya-DB\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1104, in launch_command
simple_launcher(args)
File "H:\Kohya-DB\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 567, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['H:\\Kohya-DB\\kohya_ss\\venv\\Scripts\\python.exe', 'train_db.py', '--enable_bucket', '--pretrained_model_name_or_path=runwayml/stable-diffusion-v1-5', '--train_data_dir=H:\\Stable-Diffusion-Automatic\\Dreambooth\\onoffupftv\\train sorted BLIP\\sitting', '--resolution=512,512', '--output_dir=V:\\!SDModels\\Kohya', '--logging_dir=H:\\Stable-Diffusion-Automatic\\Dreambooth\\onoffupftv\\log', '--save_model_as=safetensors', '--output_name=oof16a', '--max_data_loader_n_workers=1', '--learning_rate=1e-5', '--lr_scheduler=constant', '--train_batch_size=2', '--save_every_n_epochs=1', '--mixed_precision=bf16', '--save_precision=bf16', '--seed=1234', '--caption_extension=.txt', '--cache_latents', '--max_data_loader_n_workers=1', '--xformers', '--use_8bit_adam']' returned non-zero exit status 1.
Ok, I was the stupid one. First you have to prepare the folder in the Tool tab and then load it. Even though the images were there, it needed the preparation.
from kohya_ss.
Not sure but seems same problem, I've tried out some test but can't resolve... This is the error, what can I do?
CUDA SETUP: Loading binary C:\ai\Kohya\kohya_ss\venv\lib\site-packages\bitsandbytes\libbitsandbytes_cuda116.dll... use 8-bit Adam optimizer running training / 学習開始 num train images * repeats / 学習画像の数×繰り返し回数: 1500 num reg images / 正則化画像の数: 0 num batches per epoch / 1epochのバッチ数: 750 num epochs / epoch数: 1 batch size per device / バッチサイズ: 2 total train batch size (with parallel & distributed & accumulation) / 総バッチサイズ(並列学習、勾配合計含む): 2 gradient accumulation steps / 勾配を合計するステップ数 = 1 total optimization steps / 学習ステップ数: 750 steps: 0%| | 0/750 [00:00<?, ?it/s]epoch 1/1 Error no kernel image is available for execution on the device at line 167 in file D:\ai\tool\bitsandbytes\csrc\ops.cu Traceback (most recent call last): File "C:\Users\i5Desktop7600k\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\i5Desktop7600k\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "C:\ai\Kohya\kohya_ss\venv\Scripts\accelerate.exe\__main__.py", line 7, in <module> File "C:\ai\Kohya\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 45, in main args.func(args) File "C:\ai\Kohya\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1104, in launch_command simple_launcher(args) File "C:\ai\Kohya\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 567, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['C:\\ai\\Kohya\\kohya_ss\\venv\\Scripts\\python.exe', 'train_network.py', '--bucket_reso_steps=1', '--bucket_no_upscale', '--pretrained_model_name_or_path=C:/ai/stable-diffusion-webui/models/Stable-diffusion/v1-5-pruned-emaonly.ckpt', '--train_data_dir=C:/mauroprellilora/images formatted/150_mauroprelli/img', '--resolution=512,512', '--output_dir=C:/mauroprellilora/images formatted/150_mauroprelli/model', '--logging_dir=C:/mauroprellilora/images formatted/150_mauroprelli/log', '--network_alpha=128', '--save_model_as=safetensors', '--network_module=networks.lora', '--text_encoder_lr=5e-5', '--unet_lr=0.0001', '--network_dim=128', '--output_name=test', '--lr_scheduler_num_cycles=1', '--learning_rate=0.0001', '--lr_scheduler=constant', '--train_batch_size=2', '--max_train_steps=750', '--save_every_n_epochs=1', '--mixed_precision=no', '--save_precision=fp16', '--seed=1234', '--caption_extension=.txt', '--cache_latents', '--max_data_loader_n_workers=1', '--clip_skip=2', '--bucket_reso_steps=64', '--xformers', '--use_8bit_adam', '--bucket_no_upscale']' returned non-zero exit status 1.
from kohya_ss.
C:/mauroprellilora/images formatted/150_mauroprelli/img
I think your folder structure is wrong.
Maybe try placing folder /150_mauroprelli with images inside /img folder and use the path to the folder that contains folder with images. So for example:
folder structure should be:
/img/150_mauroprelli
and the actual path:
/img
That worked in my case.
from kohya_ss.
I think something is up with the folder preparation tool, last two times I used it, it continuously created a recursive folder structure until python shut it down.
from kohya_ss.
In fact, I still encountered some problems in the version made by others using this project, but the problem was solved after I adjusted the virtual memory space of this partition to 16G(I have 32G of physical memory).
from kohya_ss.
Thank you mate, I've just tried but nothing. Consider when I'm using the same tecnique on other PC with 3080Ti (instead of 1080Ti) all works properly, also with folder like /150_mauro prelli with spaces... I really don't know why it doesn't work on Intel 7700k with 1080Ti updated drivers and same version of all, Python etc etc...
C:/mauroprellilora/images formatted/150_mauroprelli/img
I think your folder structure is wrong.
Maybe try placing folder /150_mauroprelli with images inside /img folder and use the path to the folder that contains folder with images. So for example:
folder structure should be:
/img/150_mauroprelli
and the actual path:
/img
That worked in my case.
from kohya_ss.
I encountered the same situation as you. If you use json file in lora network weight.
It may be because the json file is not enabled (it should be a bug of koyha, it can be enabled before but it is not working now), and the input of the corresponding parameters needs to be completed manually
You can try to manually enter to solve it.Hope this helps
from kohya_ss.
This is usually the result of a bad folder structure for the training. Did you try to do a dreambooth? If so, did you use the dreambooth folder prep tool to create it?
Could this error be named a bit different so it is a bit obvious to the casual user?
Aka "returned non-zero exit status of 1: check your folder paths"
from kohya_ss.
I want to share my experiences when solving the same issue. First I didn't notice I was on the Dreambooth tab instead of the Lora tab when starting the training. They seem to be the same but probably are not the same (I'm running it for the first time).
So after I switched to the Lora tab and started the training, the result was the same - an error. So I changed the "Optimizer" to AdamW instead of AdamW8bit (be sure you are on the Lora tab again) and started the training. And now it worked.
edit: I forgot to mention I also had to change the Mixed Precision option to fp16.
I hope it helps you to solve the problem as well.
from kohya_ss.
I have fixed it by simply moving the folder I put my images, log and output; into my D drive.
For some reason, it doesn't let me train on desktop. Make sure you try to put on your drive, documents or somewhere else other than Desktop and try it like that. It may work.
from kohya_ss.
Related Issues (20)
- Can't train for some reason
- An error occurs when running the second line of gui.bat HOT 2
- any way to disable scheduled huber loss HOT 4
- does sdxl lora training train both text encoders or only 1? HOT 3
- Strange [Errno 13]. I think it's my fault. HOT 3
- Cache latents error HOT 1
- path with spaces error HOT 1
- How to train LORA with textEncoder 1 only. I get sdxl_train_network.py: error: unrecognized arguments: --train_text_encoder --learning_rate_te2=0" HOT 1
- I'm going to use the 4090 to fine-tune the large model of SDXL but I keep saying cuda is insufficient when I drive bisz to 1. HOT 2
- Multi GPU training not working HOT 2
- MultipleInvalid: extra keys not allowed @ data['datasets'][0]['subsets'][1]['is_reg']
- Could anyone help me qq?
- dreambooth lora extraction = bad results.
- Full FP16 Dreambooth for SDXL does not work HOT 2
- Missing keys & size mismatch when merging LORAs
- Blue screen "video scheduler internal error" when using SD 3 branch with SD 3 training (normal branch XL works fine) HOT 1
- SD3 Gui Sampling + lora extraction HOT 1
- Request to add controlnet fine-tuning
- Network dropout and samples
- RuntimeError: The size of tensor a (8) must match the size of tensor b (2744) at non-singleton dimension 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from kohya_ss.