devilismyfriend / stabletuner Goto Github PK
View Code? Open in Web Editor NEWFinetuning SD in style.
License: GNU Affero General Public License v3.0
Finetuning SD in style.
License: GNU Affero General Public License v3.0
Hello, I encountered the following RuntimeError while loading the WD 1.4 model from CKPT on disk:
(base) PS C:\Users\alaith\git\github.com\devilismyfriend\StableTuner> .\StableTuner.cmd
Environment name is set as "ST" as per environment.yaml
anaconda3/miniconda3 detected in C:\Users\alaith\anaconda3
Starting conda environment "ST" from C:\Users\alaith\anaconda3
loading model from: C:/Users/alaith/git/github.com/AUTOMATIC1111/stable-diffusion-webui/models/Stable-diffusion/wd-1-4-anime_e1.ckpt
loading u-net: <All keys matched successfully>
loadint vae: <All keys matched successfully>
Exception in Tkinter callback
Traceback (most recent call last):
File "C:\Users\alaith\anaconda3\envs\ST\lib\tkinter\__init__.py", line 1921, in __call__
return self.func(*args)
File "C:\Users\alaith\anaconda3\envs\ST\lib\site-packages\customtkinter\windows\widgets\ctk_button.py", line 531, in _clicked
self._command()
File "C:\Users\alaith\git\github.com\devilismyfriend\StableTuner\scripts\configuration_gui.py", line 2673, in choose_model
convert = converters.Convert_SD_to_Diffusers(sd_file,model_path,prediction_type=prediction,version=version)
File "C:\Users\alaith\git\github.com\devilismyfriend\StableTuner\scripts\converters.py", line 61, in __init__
self.main()
File "C:\Users\alaith\git\github.com\devilismyfriend\StableTuner\scripts\converters.py", line 89, in main
text_encoder, vae, unet = model_util.load_models_from_stable_diffusion_checkpoint(v2_model, self.checkpoint_path)
File "C:\Users\alaith\git\github.com\devilismyfriend\StableTuner\scripts\model_util.py", line 1205, in load_models_from_stable_diffusion_checkpoint
info = text_model.load_state_dict(converted_text_encoder_checkpoint)
File "C:\Users\alaith\anaconda3\envs\ST\lib\site-packages\torch\nn\modules\module.py", line 1604, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for CLIPTextModel:
Unexpected key(s) in state_dict: "text_model.encoder.text_model.embeddings.position_ids".
I noticed someone else seemed to have a similar error a few days ago, but as that one was fixed I thought I should open another issue.
Running on Windows 10 at commit 94d2dd28ac1b70d91e76a521e06633b2ebced508
(head of main branch as of this issue being created).
One of the very recent changes after commit
wrecked the ability to train subjects in Dreambooth mode. I can A/B the repositories and there's a stark difference in likeness and quality. Possibly one of the CLIP changes?
Currently StabelTuner defaults to a hardcoded tensor.float16 value for converting diffusers to ckpt and safetensor. I trained with FlashAttention and used tf32 for the increased precision and better details and would like to convert the result to fp32 ckpt or safetensor. A quick dialogue asking which version you want for the conversion would be nice. Personally I just changed the value from tensor.float16 to tensor.float32 in converters.py and got the expected result.
Currently it seems like you can only specific the size of the height and width dimensions together for a square shape. I would like to be able to independently specify different height and width values, for certain kinds of training data that for example uses a 1:2 ratio (768x1536) and requires that the entire image be visible to the model during each training step.
1070ti 8g can ues?~~
Anyone with more experience that has any ideas, I'd gladly appreciate it.
Are the converted diffusion to ckpt not currently supported outside of StableTuner? Finally had a decent session and exported it to ckpt to do a x/y graph between checkpoints in AUTO's gui to see which epoch had the better results but it threw up errors about size mismatch.
Loading weights [5d943c6a] from D:\SDV2\models\Stable-diffusion\sdv2-warm-digitalart-300.ckpt Traceback (most recent call last): File "D:\SDV2\venv\lib\site-packages\gradio\routes.py", line 284, in run_predict output = await app.blocks.process_api( File "D:\SDV2\venv\lib\site-packages\gradio\blocks.py", line 982, in process_api result = await self.call_function(fn_index, inputs, iterator) File "D:\SDV2\venv\lib\site-packages\gradio\blocks.py", line 824, in call_function prediction = await anyio.to_thread.run_sync( File "D:\SDV2\venv\lib\site-packages\anyio\to_thread.py", line 31, in run_sync return await get_asynclib().run_sync_in_worker_thread( File "D:\SDV2\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 937, in run_sync_in_worker_thread return await future File "D:\SDV2\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 867, in run result = context.run(func, *args) File "D:\SDV2\modules\ui.py", line 1618, in <lambda> fn=lambda value, k=k: run_settings_single(value, key=k), File "D:\SDV2\modules\ui.py", line 1459, in run_settings_single if not opts.set(key, value): File "D:\SDV2\modules\shared.py", line 473, in set self.data_labels[key].onchange() File "D:\SDV2\modules\call_queue.py", line 15, in f res = func(*args, **kwargs) File "D:\SDV2\webui.py", line 63, in <lambda> shared.opts.onchange("sd_model_checkpoint", wrap_queued_call(lambda: modules.sd_models.reload_model_weights())) File "D:\SDV2\modules\sd_models.py", line 292, in reload_model_weights load_model(checkpoint_info) File "D:\SDV2\modules\sd_models.py", line 261, in load_model load_model_weights(sd_model, checkpoint_info) File "D:\SDV2\modules\sd_models.py", line 192, in load_model_weights model.load_state_dict(sd, strict=False) File "D:\SDV2\venv\lib\site-packages\torch\nn\modules\module.py", line 1604, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for LatentDiffusion: size mismatch for model.diffusion_model.input_blocks.1.1.proj_in.weight: copying a param with shape torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]). size mismatch for model.diffusion_model.input_blocks.1.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 768]). size mismatch for model.diffusion_model.input_blocks.1.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 768]). size mismatch for model.diffusion_model.input_blocks.1.1.proj_out.weight: copying a param with shape torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]). size mismatch for model.diffusion_model.input_blocks.2.1.proj_in.weight: copying a param with shape torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]). size mismatch for model.diffusion_model.input_blocks.2.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 768]). size mismatch for model.diffusion_model.input_blocks.2.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 768]). size mismatch for model.diffusion_model.input_blocks.2.1.proj_out.weight: copying a param with shape torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]). size mismatch for model.diffusion_model.input_blocks.4.1.proj_in.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]). size mismatch for model.diffusion_model.input_blocks.4.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is torch.Size([640, 768]). size mismatch for model.diffusion_model.input_blocks.4.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is torch.Size([640, 768]). size mismatch for model.diffusion_model.input_blocks.4.1.proj_out.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]). size mismatch for model.diffusion_model.input_blocks.5.1.proj_in.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]). size mismatch for model.diffusion_model.input_blocks.5.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is torch.Size([640, 768]). size mismatch for model.diffusion_model.input_blocks.5.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is torch.Size([640, 768]). size mismatch for model.diffusion_model.input_blocks.5.1.proj_out.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]). size mismatch for model.diffusion_model.input_blocks.7.1.proj_in.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]). size mismatch for model.diffusion_model.input_blocks.7.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 768]). size mismatch for model.diffusion_model.input_blocks.7.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 768]). size mismatch for model.diffusion_model.input_blocks.7.1.proj_out.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]). size mismatch for model.diffusion_model.input_blocks.8.1.proj_in.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]). size mismatch for model.diffusion_model.input_blocks.8.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 768]). size mismatch for model.diffusion_model.input_blocks.8.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 768]). size mismatch for model.diffusion_model.input_blocks.8.1.proj_out.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]). size mismatch for model.diffusion_model.middle_block.1.proj_in.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]). size mismatch for model.diffusion_model.middle_block.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 768]). size mismatch for model.diffusion_model.middle_block.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 768]). size mismatch for model.diffusion_model.middle_block.1.proj_out.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]). size mismatch for model.diffusion_model.output_blocks.3.1.proj_in.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]). size mismatch for model.diffusion_model.output_blocks.3.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 768]). size mismatch for model.diffusion_model.output_blocks.3.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 768]). size mismatch for model.diffusion_model.output_blocks.3.1.proj_out.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]). size mismatch for model.diffusion_model.output_blocks.4.1.proj_in.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]). size mismatch for model.diffusion_model.output_blocks.4.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 768]). size mismatch for model.diffusion_model.output_blocks.4.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 768]). size mismatch for model.diffusion_model.output_blocks.4.1.proj_out.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]). size mismatch for model.diffusion_model.output_blocks.5.1.proj_in.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]). size mismatch for model.diffusion_model.output_blocks.5.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 768]). size mismatch for model.diffusion_model.output_blocks.5.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 768]). size mismatch for model.diffusion_model.output_blocks.5.1.proj_out.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]). size mismatch for model.diffusion_model.output_blocks.6.1.proj_in.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]). size mismatch for model.diffusion_model.output_blocks.6.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is torch.Size([640, 768]). size mismatch for model.diffusion_model.output_blocks.6.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is torch.Size([640, 768]). size mismatch for model.diffusion_model.output_blocks.6.1.proj_out.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]). size mismatch for model.diffusion_model.output_blocks.7.1.proj_in.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]). size mismatch for model.diffusion_model.output_blocks.7.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is torch.Size([640, 768]). size mismatch for model.diffusion_model.output_blocks.7.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is torch.Size([640, 768]). size mismatch for model.diffusion_model.output_blocks.7.1.proj_out.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]). size mismatch for model.diffusion_model.output_blocks.8.1.proj_in.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]). size mismatch for model.diffusion_model.output_blocks.8.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is torch.Size([640, 768]). size mismatch for model.diffusion_model.output_blocks.8.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is torch.Size([640, 768]). size mismatch for model.diffusion_model.output_blocks.8.1.proj_out.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]). size mismatch for model.diffusion_model.output_blocks.9.1.proj_in.weight: copying a param with shape torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]). size mismatch for model.diffusion_model.output_blocks.9.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 768]). size mismatch for model.diffusion_model.output_blocks.9.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 768]). size mismatch for model.diffusion_model.output_blocks.9.1.proj_out.weight: copying a param with shape torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]). size mismatch for model.diffusion_model.output_blocks.10.1.proj_in.weight: copying a param with shape torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]). size mismatch for model.diffusion_model.output_blocks.10.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 768]). size mismatch for model.diffusion_model.output_blocks.10.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 768]). size mismatch for model.diffusion_model.output_blocks.10.1.proj_out.weight: copying a param with shape torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]). size mismatch for model.diffusion_model.output_blocks.11.1.proj_in.weight: copying a param with shape torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]). size mismatch for model.diffusion_model.output_blocks.11.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 768]). size mismatch for model.diffusion_model.output_blocks.11.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 768]). size mismatch for model.diffusion_model.output_blocks.11.1.proj_out.weight: copying a param with shape torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]).
trying to train my first model with the dreambooth options each time i hit train this happens.
anaconda3/miniconda3 detected in C:\Users\evil\anaconda3
Starting conda environment "ST" from C:\Users\evil\anaconda3
warning: redirecting to https://github.com/devilismyfriend/StableTuner.git/
Latest git hash: 80af7be
(ST) C:\Users\evil\Desktop\Work\StableTuner>accelerate "launch" "--mixed_precision=fp16" "scripts/trainer.py" "--attention=xformers" "--model_variant=base" "--disable_cudnn_benchmark" "--sample_step_interval=100" "--pretrained_model_name_or_path=stabilityai/stable-diffusion-2" "--pretrained_vae_name_or_path=" "--output_dir=models/new_model" "--seed=3434554" "--resolution=768" "--train_batch_size=38" "--num_train_epochs=100" "--mixed_precision=fp16" "--use_bucketing" "--aspect_mode=dynamic" "--aspect_mode_action_preference=add" "--gradient_accumulation_steps=1" "--learning_rate=3e-6" "--lr_warmup_steps=0" "--lr_scheduler=constant" "--train_text_encoder" "--concepts_list=stabletune_concept_list.json" "--num_class_images=200" "--save_every_n_epoch=25" "--n_save_sample=2" "--sample_height=768" "--sample_width=768" "--dataset_repeats=1" "--add_sample_prompt=a photograph of a cyberpunk male warrior by sami2023" "--sample_on_training_start"
The following values were not passed to `accelerate launch` and had defaults used instead:
`--num_processes` was set to a value of `1`
`--num_machines` was set to a value of `1`
`--dynamo_backend` was set to a value of `'no'`
To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.
Booting Up StableTuner
Please wait a moment as we load up some stuff...
You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
C:\Users\evil\anaconda3\envs\ST\lib\site-packages\diffusers\configuration_utils.py:195: FutureWarning: It is deprecated to pass a pretrained model name or path to `from_config`.If you were trying to load a scheduler, please use <class 'diffusers.schedulers.scheduling_ddpm.DDPMScheduler'>.from_pretrained(...) instead. Otherwise, please make sure to pass a configuration dictionary instead. This functionality will be removed in v1.0.0.
deprecate("config-passed-as-path", "1.0.0", deprecation_message, standard_warn=False)
Creating Auto Bucketing Dataloader
Rounded resolution to: 768
Preloading images...
** Processing Y:/Stable Diffusion/Models/Sami/train768-ST: 100%|████████████████████| 14/14 [00:00<00:00, 147.24it/s]
** Number of buckets: 1
** Bucket (768, 768) found 14 images, will duplicate 24 images due to batch size 38
Number of image-caption pairs: 38
** Validation Set: val, steps: 1, repeats: 1
Loading Latent Cache from models\new_model\logs\latent_cache
Latents are ready.
Traceback (most recent call last):
File "C:\Users\evil\Desktop\Work\StableTuner\scripts\trainer.py", line 2808, in <module>
main()
File "C:\Users\evil\Desktop\Work\StableTuner\scripts\trainer.py", line 2175, in main
args.num_train_epochs = math.ceil(args.max_train_steps / num_update_steps_per_epoch)
ZeroDivisionError: division by zero
Traceback (most recent call last):
File "C:\Users\evil\anaconda3\envs\ST\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\evil\anaconda3\envs\ST\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "C:\Users\evil\anaconda3\envs\ST\Scripts\accelerate.exe\__main__.py", line 7, in <module>
File "C:\Users\evil\anaconda3\envs\ST\lib\site-packages\accelerate\commands\accelerate_cli.py", line 45, in main
args.func(args)
File "C:\Users\evil\anaconda3\envs\ST\lib\site-packages\accelerate\commands\launch.py", line 1104, in launch_command
simple_launcher(args)
File "C:\Users\evil\anaconda3\envs\ST\lib\site-packages\accelerate\commands\launch.py", line 567, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['C:\\Users\\evil\\anaconda3\\envs\\ST\\python.exe', 'scripts/trainer.py', '--attention=xformers', '--model_variant=base', '--disable_cudnn_benchmark', '--sample_step_interval=100', '--pretrained_model_name_or_path=stabilityai/stable-diffusion-2', '--pretrained_vae_name_or_path=', '--output_dir=models/new_model', '--seed=3434554', '--resolution=768', '--train_batch_size=38', '--num_train_epochs=100', '--mixed_precision=fp16', '--use_bucketing', '--aspect_mode=dynamic', '--aspect_mode_action_preference=add', '--gradient_accumulation_steps=1', '--learning_rate=3e-6', '--lr_warmup_steps=0', '--lr_scheduler=constant', '--train_text_encoder', '--concepts_list=stabletune_concept_list.json', '--num_class_images=200', '--save_every_n_epoch=25', '--n_save_sample=2', '--sample_height=768', '--sample_width=768', '--dataset_repeats=1', '--add_sample_prompt=a photograph of a cyberpunk male warrior by sami2023', '--sample_on_training_start']' returned non-zero exit status 1.
Press any key to continue . . .```
For those who encounter
Environment name is set as ST as per environment.yaml
anaconda3/miniconda3 not found. Install from here https://docs.conda.io/en/latest/miniconda.html
while having anaconda/miniconda setup in a non-default folder, make a custom-conda-path.txt
in a root folder of stabletuner and edit it like:
v_custom_path=<anaconda/miniconda path>
please make it very clear for the users
Can the Ability to right-click within text fields to for instance, paste text be implemented?
Currently, lots of CTRL+V's have been in order.
I may well just have overlooked it, but I am curious how the tool currently handles cropping/resizing of the dataset provided, when working with non-square images?
It could be useful to have settings for choosing e.g. centre crop or randomised crop or stretch to fit, either for datasets or saved per image. Currently not sure what it will do when given non-square images in the dataset!
Traceback (most recent call last):
File "C:\Users\123ky.conda\envs\ST\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\123ky.conda\envs\ST\lib\runpy.py", line 86, in run_code
exec(code, run_globals)
File "C:\Users\123ky.conda\envs\ST\Scripts\accelerate.exe_main.py", line 7, in
File "C:\Users\123ky.conda\envs\ST\lib\site-packages\accelerate\commands\accelerate_cli.py", line 45, in main
args.func(args)
File "C:\Users\123ky.conda\envs\ST\lib\site-packages\accelerate\commands\launch.py", line 1104, in launch_command
simple_launcher(args)
File "C:\Users\123ky.conda\envs\ST\lib\site-packages\accelerate\commands\launch.py", line 567, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['C:\Users\123ky\.conda\envs\ST\python.exe', 'scripts/trainer.py', '--attention=xformers', '--model_variant=inpainting', '--normalize_masked_area_loss', '--unmasked_probability=0.0', '--max_denoising_strength=1.0', '--disable_cudnn_benchmark', '--use_text_files_as_captions', '--sample_step_interval=50', '--stop_text_encoder_training=15', '--pretrained_model_name_or_path=runwayml/stable-diffusion-inpainting', '--pretrained_vae_name_or_path=', '--output_dir=models/new_model', '--seed=3434554', '--resolution=512', '--train_batch_size=5', '--num_train_epochs=50', '--mixed_precision=fp16', '--use_bucketing', '--aspect_mode=dynamic', '--aspect_mode_action_preference=add', '--use_8bit_adam', '--gradient_checkpointing', '--gradient_accumulation_steps=1', '--learning_rate=3e-6', '--lr_warmup_steps=0', '--lr_scheduler=constant', '--regenerate_latent_cache', '--train_text_encoder', '--use_image_names_as_captions', '--concepts_list=stabletune_concept_list.json', '--num_class_images=200', '--save_every_n_epoch=15', '--n_save_sample=1', '--sample_height=512', '--sample_width=512', '--dataset_repeats=2', '--add_sample_prompt=Alex Grey Art', '--sample_on_training_start']' returned non-zero exit status 1.
Press any key to continue . . .
Any idea what im doing wrong? I have a 3070 8GB vram also im overclocking as well
I get through the install process and then I'm met with the Xformers failed to install because it's not a supported wheel on this platform error, (windows 10) 3090rtx
hi guys i'm trying to install the stabletuner like you explained but i get this when i run the bat file
also my miniconda is not in C:\ its in d:
i've taken in to considuration that i need to make custom-conda-path.txt in the folder,but this is what i get when i run it
Environment name is set as ST as per environment.yaml
anaconda3/miniconda3 detected in "e:\Users\SPYBG\miniconda3"
Found Anaconda
The filename, directory name, or volume label syntax is incorrect.
'conda' is not recognized as an internal or external command,
operable program or batch file.
The filename, directory name, or volume label syntax is incorrect.
Traceback (most recent call last):
File "E:\StableTuner\scripts\windows_install.py", line 9, in
import requests
ModuleNotFoundError: No module named 'requests'
Press any key to continue . . .
[!] xformers NOT installed.
Installing xformers with: pip install -U -I --no-deps https://github.com/C43H66N12O12S2/stable-diffusion-webui/releases/download/f/xformers-0.0.14.dev0-cp310-cp310-win_amd64.whl
Installing xformers
Traceback (most recent call last):
File "C:\Users\zhang\StableTuner\scripts\windows_install.py", line 85, in
check_versions()
File "C:\Users\zhang\StableTuner\scripts\windows_install.py", line 76, in check_versions
run(f"pip install {x_cmd}", desc="Installing xformers")
File "C:\Users\zhang\StableTuner\scripts\windows_install.py", line 32, in run
raise RuntimeError(message)
RuntimeError: Error running command.
Command: pip install -U -I --no-deps https://github.com/C43H66N12O12S2/stable-diffusion-webui/releases/download/f/xformers-0.0.14.dev0-cp310-cp310-win_amd64.whl
Error code: 1
stdout: Collecting xformers==0.0.14.dev0
stderr: WARNING: Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:997)'))': /C43H66N12O12S2/stable-diffusion-webui/releases/download/f/xformers-0.0.14.dev0-cp310-cp310-win_amd64.whl
WARNING: Retrying (Retry(total=3, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:997)'))': /C43H66N12O12S2/stable-diffusion-webui/releases/download/f/xformers-0.0.14.dev0-cp310-cp310-win_amd64.whl
WARNING: Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:997)'))': /C43H66N12O12S2/stable-diffusion-webui/releases/download/f/xformers-0.0.14.dev0-cp310-cp310-win_amd64.whl
WARNING: Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:997)'))': /C43H66N12O12S2/stable-diffusion-webui/releases/download/f/xformers-0.0.14.dev0-cp310-cp310-win_amd64.whl
WARNING: Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:997)'))': /C43H66N12O12S2/stable-diffusion-webui/releases/download/f/xformers-0.0.14.dev0-cp310-cp310-win_amd64.whl
ERROR: Could not install packages due to an OSError: HTTPSConnectionPool(host='github.com', port=443): Max retries exceeded with url: /C43H66N12O12S2/stable-diffusion-webui/releases/download/f/xformers-0.0.14.dev0-cp310-cp310-win_amd64.whl (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:997)')))
I was able to install all required packages and get StableTuner up and running. The installation script was run again and completed successfully. But when trying to run train StableTuner with Adam8bit enabled, it crashes with an error
Cudatoolkit is installed in conda env. Is wsl required to run StableTuner with 8bit adam? I found a similar issue in the repository
d8ahazard/sd_dreambooth_extension#3
IMPORTANT: when 8bit adam is disabled, training starts successfully. But OOM vram happens after the first steps (I have 15GB vram).
I was able to install all required packages and get StableTuner up and running. The installation script was run again and completed successfully. But when trying to run train StableTuner crashes with an error
accelerate "launch" "--mixed_precision=no" "scripts/trainer.py" "--model_variant=base" "--disable_cudnn_benchmark" "--sample_step_interval=500" "--pretrained_model_name_or_path=C:/StableTuner/models/wd-1-3-penultimate-ucg-cont" "--pretrained_vae_name_or_path=" "--output_dir=models/new_model" "--seed=3434554" "--resolution=512" "--train_batch_size=24" "--num_train_epochs=100" "--use_bucketing" "--aspect_mode=dynamic" "--aspect_mode_action_preference=add" "--use_8bit_adam" "--gradient_checkpointing" "--gradient_accumulation_steps=1" "--learning_rate=3e-6" "--lr_warmup_steps=0" "--lr_scheduler=constant" "--train_text_encoder" "--concepts_list=stabletune_concept_list.json" "--num_class_images=200" "--save_every_n_epoch=5" "--n_save_sample=1" "--sample_height=512" "--sample_width=512" "--dataset_repeats=1" "--sample_on_training_start" "--clip_penultimate"
The following values were not passed to accelerate launch
and had defaults used instead:
--num_processes
was set to a value of 1
--num_machines
was set to a value of 1
--dynamo_backend
was set to a value of 'no'
To avoid this warning pass in values for each of the problematic parameters or run accelerate config
.
Booting Up StableTuner
Please wait a moment as we load up some stuff...
C:\ProgramData\Anaconda3\lib\site-packages\accelerate\accelerator.py:321: UserWarning: log_with=tensorboard
was passed but no supported trackers are currently installed.
warnings.warn(f"log_with={log_with}
was passed but no supported trackers are currently installed.")
C:\ProgramData\Anaconda3\lib\site-packages\bitsandbytes\cextension.py:101: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {WindowsPath('C')}
warn(msg)
C:\ProgramData\Anaconda3\lib\site-packages\bitsandbytes\cextension.py:101: UserWarning: C:\ProgramData\Anaconda3\envs\ST did not contain libcudart.so as expected! Searching further paths...
warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
C:\ProgramData\Anaconda3\lib\site-packages\bitsandbytes\cextension.py:101: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {WindowsPath('/usr/local/cuda/lib64')}
warn(msg)
CUDA SETUP: WARNING! libcuda.so not found! Do you have a CUDA driver installed? If you are on a cluster, make sure you are on a CUDA machine!
C:\ProgramData\Anaconda3\lib\site-packages\bitsandbytes\cextension.py:101: UserWarning: WARNING: No libcudart.so found! Install CUDA or the cudatoolkit package (anaconda)!
warn(msg)
C:\ProgramData\Anaconda3\lib\site-packages\bitsandbytes\cextension.py:101: UserWarning: WARNING: No GPU detected! Check your CUDA paths. Proceeding to load CPU-only library...
warn(msg)
CUDA SETUP: Loading binary C:\ProgramData\Anaconda3\lib\site-packages\bitsandbytes\libbitsandbytes_cpu.so...
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
CUDA SETUP: WARNING! libcuda.so not found! Do you have a CUDA driver installed? If you are on a cluster, make sure you are on a CUDA machine!
CUDA SETUP: Loading binary C:\ProgramData\Anaconda3\lib\site-packages\bitsandbytes\libbitsandbytes_cpu.so...
CUDA SETUP: Problem: The main issue seems to be that the main CUDA library was not detected.
CUDA SETUP: Solution 1): Your paths are probably not up-to-date. You can update them via: sudo ldconfig.
CUDA SETUP: Solution 2): If you do not have sudo rights, you can do the following:
CUDA SETUP: Solution 2a): Find the cuda library via: find / -name libcuda.so 2>/dev/null
CUDA SETUP: Solution 2b): Once the library is found add it to the LD_LIBRARY_PATH: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:FOUND_PATH_FROM_2a
CUDA SETUP: Solution 2c): For a permanent solution add the export from 2b into your .bashrc file, located at ~/.bashrc
Traceback (most recent call last):
File "C:\diffusion\StableTuner\scripts\trainer.py", line 2380, in
main()
File "C:\diffusion\StableTuner\scripts\trainer.py", line 1530, in main
import bitsandbytes as bnb
File "C:\ProgramData\Anaconda3\lib\site-packages\bitsandbytes_init_.py", line 6, in
from .autograd._functions import (
File "C:\ProgramData\Anaconda3\lib\site-packages\bitsandbytes\autograd_functions.py", line 5, in
import bitsandbytes.functional as F
File "C:\ProgramData\Anaconda3\lib\site-packages\bitsandbytes\functional.py", line 13, in
from .cextension import COMPILED_WITH_CUDA, lib
File "C:\ProgramData\Anaconda3\lib\site-packages\bitsandbytes\cextension.py", line 118, in
raise RuntimeError('''
RuntimeError:
CUDA Setup failed despite GPU being available. Inspect the CUDA SETUP outputs aboveto fix your environment!
If you cannot find any issues and suspect a bug, please open an issue with detals about your environment:
https://github.com/TimDettmers/bitsandbytes/issues
Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\Scripts\accelerate-script.py", line 9, in
sys.exit(main())
File "C:\ProgramData\Anaconda3\lib\site-packages\accelerate\commands\accelerate_cli.py", line 45, in main
args.func(args)
File "C:\ProgramData\Anaconda3\lib\site-packages\accelerate\commands\launch.py", line 1104, in launch_command
simple_launcher(args)
File "C:\ProgramData\Anaconda3\lib\site-packages\accelerate\commands\launch.py", line 567, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['C:\ProgramData\Anaconda3\python.exe', 'scripts/trainer.py', '--model_variant=base', '--disable_cudnn_benchmark', '--sample_step_interval=500', '--pretrained_model_name_or_path=C:/StableTuner/models/wd-1-3-penultimate-ucg-cont', '--pretrained_vae_name_or_path=', '--output_dir=models/new_model', '--seed=3434554', '--resolution=512', '--train_batch_size=24', '--num_train_epochs=100', '--use_bucketing', '--aspect_mode=dynamic', '--aspect_mode_action_preference=add', '--use_8bit_adam', '--gradient_checkpointing', '--gradient_accumulation_steps=1', '--learning_rate=3e-6', '--lr_warmup_steps=0', '--lr_scheduler=constant', '--train_text_encoder', '--concepts_list=stabletune_concept_list.json', '--num_class_images=200', '--save_every_n_epoch=5', '--n_save_sample=1', '--sample_height=512', '--sample_width=512', '--dataset_repeats=1', '--sample_on_training_start', '--clip_penultimate']' returned non-zero exit status 1.
Cudatoolkit is installed in conda env. Is wsl required to run StableTuner? I found a similar issue in the bitsandbytes repository and the developer said that this libcudart.so is not supported on Windows
Os:windows 10
It used to work fine, but since a few updates it no longer works.
I deleted the ST env and installed again - same issue
Traceback (most recent call last):
File "C:\Users\Desktop\Documents\Anaconda\envs\ST\lib\tkinter_init_.py", line 1921, in call
return self.func(*args)
File "C:\Users\Desktop\Documents\Anaconda\envs\ST\lib\site-packages\customtkinter\windows\widgets\ctk_button.py", line 531, in _clicked
self._command()
File "D:\StableTuner\scripts\configuration_gui.py", line 1907, in
self.play_generate_image_button = ctk.CTkButton(self.playground_frame_subframe, text="Generate Image", command=lambda: self.play_generate_image(self.play_model_entry.get(), self.play_prompt_entry.get(), self.play_negative_prompt_entry.get(), self.play_seed_entry.get(), self.play_scheduler_variable.get(), int(self.play_resolution_slider_height.get()), int(self.play_resolution_slider_width.get()), self.play_cfg_slider.get(), self.play_steps_slider.get()))
File "D:\StableTuner\scripts\configuration_gui.py", line 2271, in play_generate_image
del self.play_current_image
AttributeError: play_current_image
Gave the most recent version a try and after downloading/extracting/overwriting the result was the below:
conda activate diffusers
cd C:\StableTuner
(diffusers) C:\StableTuner>python configuration_gui.py
Traceback (most recent call last):
File "C:\StableTuner\configuration_gui.py", line 13, in
from scripts import converters
File "C:\StableTuner\scripts\converters.py", line 44, in
import model_util
ModuleNotFoundError: No module named 'model_util'
The remedy was editing line 44 of
C:\StableTuner\scripts\converters.py
from scripts import model_util
The UI was was able to launch after that.
Also, unless intentional, Small typo, on UI, it reads StableTune
Another issue, disregard if already known.
It seems that if there is no Internet connection, ckpt models can not be converted to diffusers?
.yamls are still trying to be fetched even if the .yaml is already alongside the model.
This could help when training with long captions > 75 tokens in length. The caption text can be split on commas (for example) into tags, and the tag order can be shuffled every epoch, so the part that gets truncated is different between epochs.
EveryDream has the option to exclude the first ("title") tag from this, and then to uniformly shuffle the remaining tags, or else to specify a probability for each tag to be shuffled in a separate json file.
This could be extended to only deal with the problem of truncation, and not shuffle the tags otherwise. For captions with >75 tokens, the first N tokens (or comma separated tags) could be held fixed, while the rest of the caption text could be populated randomly from the truncated section every epoch.
EDIT: I see talk of extended token limits in the PR. Maybe relevant?
Will it work, whats the RAM req?
This could be an user error.
When I used SD 2.1 768 as base with correct 768 training set it created the sample images all fine in the folders but at the end NONE saved set would produce any image in the playground, nor any checkpoint converted would do in A1111. The image would be all blank. (black)
This is curious (especially because the sample images were all fine so it did worked at some point during training)
What could be the issue?
(no problem with 2.1 512 base)
BTW, I absolutely love the interface and how easy it is to run!
Does StableTuner use the optimization methods that ColossalAI uses?
Ability to train with batch size=8 with only 9gb vram
Much faster learning
Stable Diffusion with Colossal-AI provides 6.5x faster training and pretraining cost saving, the hardware cost of fine-tuning can be almost 7X cheaper (from RTX3090/4090 24GB to RTX3050/2070 8GB)
Hi,
i have already some experience with dreambooth (e.g. with automatic1111 extension), which works on a 12 gb vram card.
it also works when i train the text_encoder
.
but the same settings fail on StableTuner, here are my typical VRAM saved settings:
{
"add_controlled_seed_to_sample": [],
"model_path": "C:\\data\\codes\\stabledif\\StableTuner\\models\\v1-5-pruned",
"vae_path": "",
"output_path": "models/new_model",
"send_telegram_updates": 0,
"telegram_token": "",
"telegram_chat_id": "",
"resolution": "512",
"batch_size": "1",
"train_epocs": "100",
"mixed_precision": "fp16",
"use_8bit_adam": 1,
"use_gradient_checkpointing": 1,
"accumulation_steps": "1",
"learning_rate": "2e-6",
"warmup_steps": "0",
"learning_rate_scheduler": "constant",
"use_latent_cache": 0,
"save_latent_cache": 0,
"regenerate_latent_cache": 0,
"train_text_encoder": 1,
"with_prior_loss_preservation": 1,
"prior_loss_preservation_weight": "1.0",
"use_image_names_as_captions": 1,
"auto_balance_concept_datasets": 0,
"add_class_images_to_dataset": 0,
"number_of_class_images": "200",
"save_every_n_epochs": "20",
"number_of_samples_to_generate": "5",
"sample_height": "512",
"sample_width": "512",
"sample_random_aspect_ratio": 0,
"sample_on_training_start": 1,
"aspect_ratio_bucketing": 0,
"seed": "3434554",
"dataset_repeats": "1",
"limit_text_encoder_training": "30%",
"use_text_files_as_captions": 1,
"ckpt_version": null,
"convert_to_ckpt_after_training": 1,
"execute_post_conversion": 1,
"disable_cudnn_benchmark": 1,
"sample_step_interval": "500",
"conditional_dropout": "",
"clip_penultimate": 0,
"use_ema": 0,
"aspect_ratio_bucketing_mode": "Dynamic Fill",
"dynamic_bucketing_mode": "Duplicate"
}
I loaded the model into StableTuner, installed the concept, specified all the settings. I'm trying to run the training and I'm getting an error
loading text encoder:
copy scheduler/tokenizer config from: runwayml/stable-diffusion-v1-5
Diffusers model saved.
(ST) C:\StableTuner>accelerate "launch" "--mixed_precision=fp16" "scripts/trainer.py" "--model_variant=base" "--disable_cudnn_benchmark" "--use_text_files_as_captions" "--sample_step_interval=500" "--pretrained_model_name_or_path=C:\StableTuner\models\wd-1-3-penultimate-ucg-cont" "--pretrained_vae_name_or_path=" "--output_dir=models/new_model" "--seed=3434554" "--resolution=512" "--train_batch_size=24" "--num_train_epochs=100" "--mixed_precision=fp16" "--use_bucketing" "--aspect_mode=dynamic" "--aspect_mode_action_preference=add" "--use_8bit_adam" "--gradient_checkpointing" "--gradient_accumulation_steps=1" "--learning_rate=3e-6" "--lr_warmup_steps=0" "--lr_scheduler=constant" "--train_text_encoder" "--concepts_list=stabletune_concept_list.json" "--num_class_images=200" "--save_every_n_epoch=5" "--n_save_sample=1" "--sample_height=512" "--sample_width=512" "--dataset_repeats=1" "--sample_on_training_start"
Fatal error in launcher: Unable to create process using '"C:\ProgramData\miniconda3\python.exe" "C:\Users\adam\AppData\Roaming\Python\Python310\Scripts\accelerate.exe" "launch" "--mixed_precision=fp16" "scripts/trainer.py" "--model_variant=base" "--disable_cudnn_benchmark" "--use_text_files_as_captions" "--sample_step_interval=500" "--pretrained_model_name_or_path=C:\StableTuner\models\wd-1-3-penultimate-ucg-cont" "--pretrained_vae_name_or_path=" "--output_dir=models/new_model" "--seed=3434554" "--resolution=512" "--train_batch_size=24" "--num_train_epochs=100" "--mixed_precision=fp16" "--use_bucketing" "--aspect_mode=dynamic" "--aspect_mode_action_preference=add" "--use_8bit_adam" "--gradient_checkpointing" "--gradient_accumulation_steps=1" "--learning_rate=3e-6" "--lr_warmup_steps=0" "--lr_scheduler=constant" "--train_text_encoder" "--concepts_list=stabletune_concept_list.json" "--num_class_images=200" "--save_every_n_epoch=5" "--n_save_sample=1" "--sample_height=512" "--sample_width=512" "--
Press any key to continue...
Os: windows 10
Tried reinstalling several times, removed conda, python and all env. Problem still exists
Model: waifu-diffusion. This problem with all models (try sd1.4, sd1.5 and SD2.1)
Hi!
I love the export to cloud option, it would be great if it also included an export to shell script toggle option.
All of the commands are already there just wrapped within a notebook.
Would make it a great way to edit / modify model for training within the existing interface. Export to Cloud --> Upload to Linux box and "press play on tape", it might be a useful way to have a Linux option without having to worry about a GUI yet etc.
Obviously, it needs a script to create and setup the environment first etc.
First of all, thank you for all the hard work with StableTuner. It's really, really awesome!
Well, I've left issues on the Automatic UI, and the dreambooth extension for it asking them to implement proper standard Stable Diffusion fine-tuning (rather than the various variants which they have right now). I'm putting an issue here asking you folks to add linux support on the assumption that neither of them will actually implement it (I don't know why they wouldn't but I figure if they didn't do it before maybe they never will)
I'm really, really not excited by the idea that I'd have to switch over to my windows machine just to do regular fine-tuning if I don't want to inflict myself with the pain of sketchy random notebooks.
Please PLEASE get this running on linux asap. Right now us linux users are stuck in a pretty annoying situation with trying to make custom models.
Install seemed to go fine, I selected my model and added my concepts and clicked begin training.
only thing I see that might be an issue in the log is
ImportError: DLL load failed while importing _lzma: The specified module could not be found.
Small issue, my setup is dark themed and the editable fields within StableTuner's GUI leaves the cursor as solid black. Wondering if it's possible make it blue or something?
It may just be my computer/network connection, but the automatic CUDNN 8.6 installation doesn't work, it says it takes too long to respond and errors.
I got around this by downloading the .ZIP file from NVIDIA's website for the Windows version of CUDNN 8.6, unzipped it, renamed it to "cudnn_windows", and put it in the "resources" folder.
Hey! Just curious if it's on the roadmap at all to enable multi-gpu support or potentially to choose which gpu to train on. Thank you!
Unable to use 'Resume From Last Session', receive the following error:
NotADirectoryError: [WinError 267] The directory name is invalid: 'models/marinmarin/marinmarin_v1001c\stabletuner_768_e100_01-03-14-17.json'
I've swapped '/'s for '\'s with no success, and of course I confirmed that the exact path and filename are in fact correct.
Apart from loading the newly trained dreambooth model, I think it can be a great to have a dedicated way to just resume for a given config where stableTuner will load the output model and continue train it without changing anything.
More as a candy feature than a necessary feature but I hope it will make an addition in this project.
It seems as though only basic functionality is needed if one plans to use the export for cloud function, as they are not running the software on their PC.
Is it possible to only install what's required for the export for cloud computing VMs option?
Don't know where else to put this request. I would love to see this project be made compatible with M1 Macs. It's just a request. I can't really control anything. i think that the more tools available to train and fine-tune these models, the better.
When args.train_text_encoder
is false, CachedLatentsDataset.cache.text_encoder_cache
contains preencoded token embeddings rather than token IDs. However, CachedLatentsDataset.empty_tokens
is still filled with token_ids. When args.conditional_dropout
is nonzero, this results in batches selected for conditional dropout getting token IDs where embeddings are expected, and the training crashes due to mismatching tensor sizes.
I looked into fixing this but gave up for the moment so I'm reporting it as a bug.
If you put limit to 100% then it breaks up when it reaches 100% mark - something about that it can't save the frozen text encoder. Seems like a minor annoyance.
Before you ask - I was testing the result of encoder limit % on the result trained set.
In the interface on the General Settings tab, when you change the 'Quick Select Model', the Diffuses model path updates, however the VAE model path does not.
Trying to convert ckpt that work well in AUTO1111, but fails in ST
Exception in Tkinter callback
Traceback (most recent call last):
File "C:\Users\Desktop\Documents\Anaconda\envs\ST\lib\tkinter_init_.py", line 1921, in call
return self.func(*args)
File "C:\Users\Desktop\Documents\Anaconda\envs\ST\lib\site-packages\customtkinter\windows\widgets\ctk_button.py", line 531, in _clicked
self._command()
File "D:\StableTuner\scripts\configuration_gui.py", line 1859, in
self.convert_ckpt_to_diffusers_button = ctk.CTkButton(self.toolbox_frame_subframe, text="Convert CKPT To Diffusers", command=lambda:self.convert_ckpt_to_diffusers())
File "D:\StableTuner\scripts\configuration_gui.py", line 2212, in convert_ckpt_to_diffusers
version, prediction = self.get_sd_version(ckpt_path)
File "D:\StableTuner\scripts\configuration_gui.py", line 2527, in get_sd_version
checkpoint = checkpoint["state_dict"]
KeyError: 'state_dict'
Receiving weird error in the latest pull
Traceback (most recent call last):
File "D:\StableTuner\scripts\trainer.py", line 16, in
import keyboard
ModuleNotFoundError: No module named 'keyboard'
Traceback (most recent call last):
File "C:\Users\Desktop\Documents\Anaconda\envs\ST\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\Desktop\Documents\Anaconda\envs\ST\lib\runpy.py", line 86, in run_code
exec(code, run_globals)
File "C:\Users\Desktop\Documents\Anaconda\envs\ST\Scripts\accelerate.exe_main.py", line 7, in
File "C:\Users\Desktop\Documents\Anaconda\envs\ST\lib\site-packages\accelerate\commands\accelerate_cli.py", line 45, in main
args.func(args)
File "C:\Users\Desktop\Documents\Anaconda\envs\ST\lib\site-packages\accelerate\commands\launch.py", line 1104, in launch_command
simple_launcher(args)
File "C:\Users\Desktop\Documents\Anaconda\envs\ST\lib\site-packages\accelerate\commands\launch.py", line 567, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
I have a dataset of images + captions in .txt files. Seems like that isn't enough - if I leave the "Data" section empty with no concepts defined, the training script finds no images and crashes. However, my images can contain more than one concept per image, so I'm not sure how to properly set this up.
Let's say I have an image of a woman leaning on a car, with a .txt caption "woman leaning on car", and I want to finetune the model+text encoder to recognize both "car", "leaning" and "woman". How do I actually set this up in the data section? Should I create three separate "concepts" for "car", "leaning" and "woman", and add the same image to all of them? Just add everything to some single "DummyConcept" if it just gets ignored if using captions? Is this type of training entirely unsupported?
I'm just really lost here.
As you already support Xformers it would be great to see some support for Flash-attention as it can help lower the vram requirements.
For example sd_dreambooth extention now has flash attention that lets fine tuning for lower end GPUS.
https://github.com/HazyResearch/flash-attention
Thanks!
When selecting 'fp32' for precision, should pass '--mixed-precision=no' instead of '--mixed-precision=fp32'
trainer.py: error: argument --mixed_precision: invalid choice: 'fp32' (choose from 'no', 'fp16', 'bf16')
Traceback (most recent call last):
File "C:\Users\revis.conda\envs\ST\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\revis.conda\envs\ST\lib\runpy.py", line 86, in run_code
exec(code, run_globals)
File "C:\Users\revis.conda\envs\ST\Scripts\accelerate.exe_main.py", line 7, in
File "C:\Users\revis.conda\envs\ST\lib\site-packages\accelerate\commands\accelerate_cli.py", line 45, in main
args.func(args)
File "C:\Users\revis.conda\envs\ST\lib\site-packages\accelerate\commands\launch.py", line 1104, in launch_command
simple_launcher(args)
File "C:\Users\revis.conda\envs\ST\lib\site-packages\accelerate\commands\launch.py", line 567, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
T4 16g can ues?
Does this utalise python env files that way it doesn't install all the scripts directly to your base system?
I use the latest version.
I feel that only when I use dataset repeat, the training image is inverted and remembered.
Not all of them are, but normal images are mixed in the output.
If I turn off dataset repeat, the images are not inverted, so perhaps I am not mistaken,
but I apologize if this is a problem specific to my environment.
Can't install, getting this error:
stderr: ERROR: Wheel 'torch' located at C:\Users\dmitr\AppData\Local\Temp\pip-unpack-4t9mrkdc\torch-1.13.1-cp310-cp310-win_amd64.whl is invalid.
Getting the following after latents are cached and training attempts to begin:
"Error no kernel image is available for execution on the device at line 167 in file D:\ai\tool\bitsandbytes\csrc\ops.cu
Traceback (most recent call last):
File "C:\Users\unres\miniconda3\envs\ST\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\unres\miniconda3\envs\ST\lib\runpy.py", line 86, in run_code
exec(code, run_globals)
File "C:\Users\unres\miniconda3\envs\ST\Scripts\accelerate.exe_main.py", line 7, in
File "C:\Users\unres\miniconda3\envs\ST\lib\site-packages\accelerate\commands\accelerate_cli.py", line 45, in main
args.func(args)
File "C:\Users\unres\miniconda3\envs\ST\lib\site-packages\accelerate\commands\launch.py", line 1104, in launch_command
simple_launcher(args)
File "C:\Users\unres\miniconda3\envs\ST\lib\site-packages\accelerate\commands\launch.py", line 567, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['C:\Users\unres\miniconda3\envs\ST\python.exe', 'scripts/trainer.py', '--attention=xformers', '--model_variant=base', '--disable_cudnn_benchmark', '--sample_step_interval=500', '--pretrained_model_name_or_path=runwayml/stable-diffusion-v1-5', '--pretrained_vae_name_or_path=', '--output_dir=models/new_model', '--seed=3434554', '--resolution=512', '--train_batch_size=24', '--num_train_epochs=100', '--mixed_precision=fp16', '--use_bucketing', '--aspect_mode=dynamic', '--aspect_mode_action_preference=add', '--use_8bit_adam', '--gradient_checkpointing', '--gradient_accumulation_steps=1', '--learning_rate=3e-6', '--lr_warmup_steps=0', '--lr_scheduler=constant', '--train_text_encoder', '--concepts_list=stabletune_concept_list.json', '--num_class_images=4420', '--save_every_n_epoch=5', '--n_save_sample=1', '--sample_height=512', '--sample_width=512', '--dataset_repeats=1']' returned non-zero exit status 1."
I've only seen 3080 and 3090 mentioned on the main page. Does this repo not support the older Tesla data center 24GB cards like Automatic1111 and InvokeAI do?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.