Giter Club home page Giter Club logo

stabletuner's People

Contributors

allenbenz avatar devilismyfriend avatar entmike avatar grimig avatar jordach avatar nerogar avatar progamergov avatar rossm avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

stabletuner's Issues

RuntimeError thrown while loading CKPT

Hello, I encountered the following RuntimeError while loading the WD 1.4 model from CKPT on disk:

(base) PS C:\Users\alaith\git\github.com\devilismyfriend\StableTuner> .\StableTuner.cmd
Environment name is set as "ST" as per environment.yaml
anaconda3/miniconda3 detected in C:\Users\alaith\anaconda3
Starting conda environment "ST" from C:\Users\alaith\anaconda3
loading model from: C:/Users/alaith/git/github.com/AUTOMATIC1111/stable-diffusion-webui/models/Stable-diffusion/wd-1-4-anime_e1.ckpt
loading u-net: <All keys matched successfully>
loadint vae: <All keys matched successfully>
Exception in Tkinter callback
Traceback (most recent call last):
  File "C:\Users\alaith\anaconda3\envs\ST\lib\tkinter\__init__.py", line 1921, in __call__
    return self.func(*args)
  File "C:\Users\alaith\anaconda3\envs\ST\lib\site-packages\customtkinter\windows\widgets\ctk_button.py", line 531, in _clicked
    self._command()
  File "C:\Users\alaith\git\github.com\devilismyfriend\StableTuner\scripts\configuration_gui.py", line 2673, in choose_model
    convert = converters.Convert_SD_to_Diffusers(sd_file,model_path,prediction_type=prediction,version=version)
  File "C:\Users\alaith\git\github.com\devilismyfriend\StableTuner\scripts\converters.py", line 61, in __init__
    self.main()
  File "C:\Users\alaith\git\github.com\devilismyfriend\StableTuner\scripts\converters.py", line 89, in main
    text_encoder, vae, unet = model_util.load_models_from_stable_diffusion_checkpoint(v2_model, self.checkpoint_path)
  File "C:\Users\alaith\git\github.com\devilismyfriend\StableTuner\scripts\model_util.py", line 1205, in load_models_from_stable_diffusion_checkpoint
    info = text_model.load_state_dict(converted_text_encoder_checkpoint)
  File "C:\Users\alaith\anaconda3\envs\ST\lib\site-packages\torch\nn\modules\module.py", line 1604, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for CLIPTextModel:
        Unexpected key(s) in state_dict: "text_model.encoder.text_model.embeddings.position_ids".

I noticed someone else seemed to have a similar error a few days ago, but as that one was fixed I thought I should open another issue.

Running on Windows 10 at commit 94d2dd28ac1b70d91e76a521e06633b2ebced508 (head of main branch as of this issue being created).

Change wrecked ability to train subjects

One of the very recent changes after commit

cbe04ac

wrecked the ability to train subjects in Dreambooth mode. I can A/B the repositories and there's a stark difference in likeness and quality. Possibly one of the CLIP changes?

Feature request: Add tensor.float32 option for converting diffusers

Currently StabelTuner defaults to a hardcoded tensor.float16 value for converting diffusers to ckpt and safetensor. I trained with FlashAttention and used tf32 for the increased precision and better details and would like to convert the result to fp32 ckpt or safetensor. A quick dialogue asking which version you want for the conversion would be nice. Personally I just changed the value from tensor.float16 to tensor.float32 in converters.py and got the expected result.

Support training with different aspect ratios of than squares

Currently it seems like you can only specific the size of the height and width dimensions together for a square shape. I would like to be able to independently specify different height and width values, for certain kinds of training data that for example uses a 1:2 ratio (768x1536) and requires that the entire image be visible to the model during each training step.

Accelerate not found when clicking train

Capture

  1. Tried fresh install 3 times. Tried miniconda and anaconda fresh install.
  2. Tried also adding anaconda to the path upon installation, still did not work
  3. I also have python 3.10 installed, not sure if that has anything to do with it.

Anyone with more experience that has any ideas, I'd gladly appreciate it.

StableTuner ckpt throwing size mismatch in AUTO1111 Web-SD

Are the converted diffusion to ckpt not currently supported outside of StableTuner? Finally had a decent session and exported it to ckpt to do a x/y graph between checkpoints in AUTO's gui to see which epoch had the better results but it threw up errors about size mismatch.

Loading weights [5d943c6a] from D:\SDV2\models\Stable-diffusion\sdv2-warm-digitalart-300.ckpt Traceback (most recent call last): File "D:\SDV2\venv\lib\site-packages\gradio\routes.py", line 284, in run_predict output = await app.blocks.process_api( File "D:\SDV2\venv\lib\site-packages\gradio\blocks.py", line 982, in process_api result = await self.call_function(fn_index, inputs, iterator) File "D:\SDV2\venv\lib\site-packages\gradio\blocks.py", line 824, in call_function prediction = await anyio.to_thread.run_sync( File "D:\SDV2\venv\lib\site-packages\anyio\to_thread.py", line 31, in run_sync return await get_asynclib().run_sync_in_worker_thread( File "D:\SDV2\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 937, in run_sync_in_worker_thread return await future File "D:\SDV2\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 867, in run result = context.run(func, *args) File "D:\SDV2\modules\ui.py", line 1618, in <lambda> fn=lambda value, k=k: run_settings_single(value, key=k), File "D:\SDV2\modules\ui.py", line 1459, in run_settings_single if not opts.set(key, value): File "D:\SDV2\modules\shared.py", line 473, in set self.data_labels[key].onchange() File "D:\SDV2\modules\call_queue.py", line 15, in f res = func(*args, **kwargs) File "D:\SDV2\webui.py", line 63, in <lambda> shared.opts.onchange("sd_model_checkpoint", wrap_queued_call(lambda: modules.sd_models.reload_model_weights())) File "D:\SDV2\modules\sd_models.py", line 292, in reload_model_weights load_model(checkpoint_info) File "D:\SDV2\modules\sd_models.py", line 261, in load_model load_model_weights(sd_model, checkpoint_info) File "D:\SDV2\modules\sd_models.py", line 192, in load_model_weights model.load_state_dict(sd, strict=False) File "D:\SDV2\venv\lib\site-packages\torch\nn\modules\module.py", line 1604, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for LatentDiffusion: size mismatch for model.diffusion_model.input_blocks.1.1.proj_in.weight: copying a param with shape torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]). size mismatch for model.diffusion_model.input_blocks.1.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 768]). size mismatch for model.diffusion_model.input_blocks.1.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 768]). size mismatch for model.diffusion_model.input_blocks.1.1.proj_out.weight: copying a param with shape torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]). size mismatch for model.diffusion_model.input_blocks.2.1.proj_in.weight: copying a param with shape torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]). size mismatch for model.diffusion_model.input_blocks.2.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 768]). size mismatch for model.diffusion_model.input_blocks.2.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 768]). size mismatch for model.diffusion_model.input_blocks.2.1.proj_out.weight: copying a param with shape torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]). size mismatch for model.diffusion_model.input_blocks.4.1.proj_in.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]). size mismatch for model.diffusion_model.input_blocks.4.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is torch.Size([640, 768]). size mismatch for model.diffusion_model.input_blocks.4.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is torch.Size([640, 768]). size mismatch for model.diffusion_model.input_blocks.4.1.proj_out.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]). size mismatch for model.diffusion_model.input_blocks.5.1.proj_in.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]). size mismatch for model.diffusion_model.input_blocks.5.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is torch.Size([640, 768]). size mismatch for model.diffusion_model.input_blocks.5.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is torch.Size([640, 768]). size mismatch for model.diffusion_model.input_blocks.5.1.proj_out.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]). size mismatch for model.diffusion_model.input_blocks.7.1.proj_in.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]). size mismatch for model.diffusion_model.input_blocks.7.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 768]). size mismatch for model.diffusion_model.input_blocks.7.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 768]). size mismatch for model.diffusion_model.input_blocks.7.1.proj_out.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]). size mismatch for model.diffusion_model.input_blocks.8.1.proj_in.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]). size mismatch for model.diffusion_model.input_blocks.8.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 768]). size mismatch for model.diffusion_model.input_blocks.8.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 768]). size mismatch for model.diffusion_model.input_blocks.8.1.proj_out.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]). size mismatch for model.diffusion_model.middle_block.1.proj_in.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]). size mismatch for model.diffusion_model.middle_block.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 768]). size mismatch for model.diffusion_model.middle_block.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 768]). size mismatch for model.diffusion_model.middle_block.1.proj_out.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]). size mismatch for model.diffusion_model.output_blocks.3.1.proj_in.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]). size mismatch for model.diffusion_model.output_blocks.3.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 768]). size mismatch for model.diffusion_model.output_blocks.3.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 768]). size mismatch for model.diffusion_model.output_blocks.3.1.proj_out.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]). size mismatch for model.diffusion_model.output_blocks.4.1.proj_in.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]). size mismatch for model.diffusion_model.output_blocks.4.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 768]). size mismatch for model.diffusion_model.output_blocks.4.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 768]). size mismatch for model.diffusion_model.output_blocks.4.1.proj_out.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]). size mismatch for model.diffusion_model.output_blocks.5.1.proj_in.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]). size mismatch for model.diffusion_model.output_blocks.5.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 768]). size mismatch for model.diffusion_model.output_blocks.5.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 1024]) from checkpoint, the shape in current model is torch.Size([1280, 768]). size mismatch for model.diffusion_model.output_blocks.5.1.proj_out.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]). size mismatch for model.diffusion_model.output_blocks.6.1.proj_in.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]). size mismatch for model.diffusion_model.output_blocks.6.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is torch.Size([640, 768]). size mismatch for model.diffusion_model.output_blocks.6.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is torch.Size([640, 768]). size mismatch for model.diffusion_model.output_blocks.6.1.proj_out.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]). size mismatch for model.diffusion_model.output_blocks.7.1.proj_in.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]). size mismatch for model.diffusion_model.output_blocks.7.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is torch.Size([640, 768]). size mismatch for model.diffusion_model.output_blocks.7.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is torch.Size([640, 768]). size mismatch for model.diffusion_model.output_blocks.7.1.proj_out.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]). size mismatch for model.diffusion_model.output_blocks.8.1.proj_in.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]). size mismatch for model.diffusion_model.output_blocks.8.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is torch.Size([640, 768]). size mismatch for model.diffusion_model.output_blocks.8.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([640, 1024]) from checkpoint, the shape in current model is torch.Size([640, 768]). size mismatch for model.diffusion_model.output_blocks.8.1.proj_out.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]). size mismatch for model.diffusion_model.output_blocks.9.1.proj_in.weight: copying a param with shape torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]). size mismatch for model.diffusion_model.output_blocks.9.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 768]). size mismatch for model.diffusion_model.output_blocks.9.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 768]). size mismatch for model.diffusion_model.output_blocks.9.1.proj_out.weight: copying a param with shape torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]). size mismatch for model.diffusion_model.output_blocks.10.1.proj_in.weight: copying a param with shape torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]). size mismatch for model.diffusion_model.output_blocks.10.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 768]). size mismatch for model.diffusion_model.output_blocks.10.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 768]). size mismatch for model.diffusion_model.output_blocks.10.1.proj_out.weight: copying a param with shape torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]). size mismatch for model.diffusion_model.output_blocks.11.1.proj_in.weight: copying a param with shape torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]). size mismatch for model.diffusion_model.output_blocks.11.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 768]). size mismatch for model.diffusion_model.output_blocks.11.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([320, 1024]) from checkpoint, the shape in current model is torch.Size([320, 768]). size mismatch for model.diffusion_model.output_blocks.11.1.proj_out.weight: copying a param with shape torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]).

Trying to train but getting returned non-zero exit status 1

trying to train my first model with the dreambooth options each time i hit train this happens.

anaconda3/miniconda3 detected in C:\Users\evil\anaconda3
Starting conda environment "ST" from C:\Users\evil\anaconda3
warning: redirecting to https://github.com/devilismyfriend/StableTuner.git/
Latest git hash: 80af7be

(ST) C:\Users\evil\Desktop\Work\StableTuner>accelerate "launch" "--mixed_precision=fp16" "scripts/trainer.py" "--attention=xformers"  "--model_variant=base"  "--disable_cudnn_benchmark"  "--sample_step_interval=100"  "--pretrained_model_name_or_path=stabilityai/stable-diffusion-2"  "--pretrained_vae_name_or_path="  "--output_dir=models/new_model"  "--seed=3434554"  "--resolution=768"  "--train_batch_size=38"  "--num_train_epochs=100"  "--mixed_precision=fp16"  "--use_bucketing"  "--aspect_mode=dynamic"  "--aspect_mode_action_preference=add"  "--gradient_accumulation_steps=1"  "--learning_rate=3e-6"  "--lr_warmup_steps=0"  "--lr_scheduler=constant"  "--train_text_encoder"  "--concepts_list=stabletune_concept_list.json"  "--num_class_images=200"  "--save_every_n_epoch=25"  "--n_save_sample=2"  "--sample_height=768"  "--sample_width=768"  "--dataset_repeats=1"  "--add_sample_prompt=a photograph of a cyberpunk male warrior by sami2023"  "--sample_on_training_start"
The following values were not passed to `accelerate launch` and had defaults used instead:
        `--num_processes` was set to a value of `1`
        `--num_machines` was set to a value of `1`
        `--dynamo_backend` was set to a value of `'no'`
To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.
 Booting Up StableTuner
 Please wait a moment as we load up some stuff...
You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
C:\Users\evil\anaconda3\envs\ST\lib\site-packages\diffusers\configuration_utils.py:195: FutureWarning: It is deprecated to pass a pretrained model name or path to `from_config`.If you were trying to load a scheduler, please use <class 'diffusers.schedulers.scheduling_ddpm.DDPMScheduler'>.from_pretrained(...) instead. Otherwise, please make sure to pass a configuration dictionary instead. This functionality will be removed in v1.0.0.
  deprecate("config-passed-as-path", "1.0.0", deprecation_message, standard_warn=False)
 Creating Auto Bucketing Dataloader
  Rounded resolution to: 768
  Preloading images...
  ** Processing Y:/Stable Diffusion/Models/Sami/train768-ST: 100%|████████████████████| 14/14 [00:00<00:00, 147.24it/s]
 ** Number of buckets: 1
  ** Bucket (768, 768) found 14 images, will duplicate 24 images due to batch size 38
  Number of image-caption pairs: 38

  ** Validation Set: val, steps: 1, repeats: 1

 Loading Latent Cache from models\new_model\logs\latent_cache
 Latents are ready.
Traceback (most recent call last):
  File "C:\Users\evil\Desktop\Work\StableTuner\scripts\trainer.py", line 2808, in <module>
    main()
  File "C:\Users\evil\Desktop\Work\StableTuner\scripts\trainer.py", line 2175, in main
    args.num_train_epochs = math.ceil(args.max_train_steps / num_update_steps_per_epoch)
ZeroDivisionError: division by zero
Traceback (most recent call last):
  File "C:\Users\evil\anaconda3\envs\ST\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\evil\anaconda3\envs\ST\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "C:\Users\evil\anaconda3\envs\ST\Scripts\accelerate.exe\__main__.py", line 7, in <module>
  File "C:\Users\evil\anaconda3\envs\ST\lib\site-packages\accelerate\commands\accelerate_cli.py", line 45, in main
    args.func(args)
  File "C:\Users\evil\anaconda3\envs\ST\lib\site-packages\accelerate\commands\launch.py", line 1104, in launch_command
    simple_launcher(args)
  File "C:\Users\evil\anaconda3\envs\ST\lib\site-packages\accelerate\commands\launch.py", line 567, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['C:\\Users\\evil\\anaconda3\\envs\\ST\\python.exe', 'scripts/trainer.py', '--attention=xformers', '--model_variant=base', '--disable_cudnn_benchmark', '--sample_step_interval=100', '--pretrained_model_name_or_path=stabilityai/stable-diffusion-2', '--pretrained_vae_name_or_path=', '--output_dir=models/new_model', '--seed=3434554', '--resolution=768', '--train_batch_size=38', '--num_train_epochs=100', '--mixed_precision=fp16', '--use_bucketing', '--aspect_mode=dynamic', '--aspect_mode_action_preference=add', '--gradient_accumulation_steps=1', '--learning_rate=3e-6', '--lr_warmup_steps=0', '--lr_scheduler=constant', '--train_text_encoder', '--concepts_list=stabletune_concept_list.json', '--num_class_images=200', '--save_every_n_epoch=25', '--n_save_sample=2', '--sample_height=768', '--sample_width=768', '--dataset_repeats=1', '--add_sample_prompt=a photograph of a cyberpunk male warrior by sami2023', '--sample_on_training_start']' returned non-zero exit status 1.
Press any key to continue . . .```

Anaconda custom path

For those who encounter

Environment name is set as ST as per environment.yaml
anaconda3/miniconda3 not found. Install from here https://docs.conda.io/en/latest/miniconda.html

while having anaconda/miniconda setup in a non-default folder, make a custom-conda-path.txt in a root folder of stabletuner and edit it like:

v_custom_path=<anaconda/miniconda path>

please make it very clear for the users

Cropping Behaviour

I may well just have overlooked it, but I am curious how the tool currently handles cropping/resizing of the dataset provided, when working with non-square images?

It could be useful to have settings for choosing e.g. centre crop or randomised crop or stretch to fit, either for datasets or saved per image. Currently not sure what it will do when given non-square images in the dataset!

Getting "sample_on_training_start']' returned non-zero exit status 1"

Traceback (most recent call last):
File "C:\Users\123ky.conda\envs\ST\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\123ky.conda\envs\ST\lib\runpy.py", line 86, in run_code
exec(code, run_globals)
File "C:\Users\123ky.conda\envs\ST\Scripts\accelerate.exe_main
.py", line 7, in
File "C:\Users\123ky.conda\envs\ST\lib\site-packages\accelerate\commands\accelerate_cli.py", line 45, in main
args.func(args)
File "C:\Users\123ky.conda\envs\ST\lib\site-packages\accelerate\commands\launch.py", line 1104, in launch_command
simple_launcher(args)
File "C:\Users\123ky.conda\envs\ST\lib\site-packages\accelerate\commands\launch.py", line 567, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['C:\Users\123ky\.conda\envs\ST\python.exe', 'scripts/trainer.py', '--attention=xformers', '--model_variant=inpainting', '--normalize_masked_area_loss', '--unmasked_probability=0.0', '--max_denoising_strength=1.0', '--disable_cudnn_benchmark', '--use_text_files_as_captions', '--sample_step_interval=50', '--stop_text_encoder_training=15', '--pretrained_model_name_or_path=runwayml/stable-diffusion-inpainting', '--pretrained_vae_name_or_path=', '--output_dir=models/new_model', '--seed=3434554', '--resolution=512', '--train_batch_size=5', '--num_train_epochs=50', '--mixed_precision=fp16', '--use_bucketing', '--aspect_mode=dynamic', '--aspect_mode_action_preference=add', '--use_8bit_adam', '--gradient_checkpointing', '--gradient_accumulation_steps=1', '--learning_rate=3e-6', '--lr_warmup_steps=0', '--lr_scheduler=constant', '--regenerate_latent_cache', '--train_text_encoder', '--use_image_names_as_captions', '--concepts_list=stabletune_concept_list.json', '--num_class_images=200', '--save_every_n_epoch=15', '--n_save_sample=1', '--sample_height=512', '--sample_width=512', '--dataset_repeats=2', '--add_sample_prompt=Alex Grey Art', '--sample_on_training_start']' returned non-zero exit status 1.
Press any key to continue . . .
Any idea what im doing wrong? I have a 3070 8GB vram also im overclocking as well

Is there an issue with Xformers

I get through the install process and then I'm met with the Xformers failed to install because it's not a supported wheel on this platform error, (windows 10) 3090rtx

ModuleNotFoundError: No module named 'requests' durring installation

hi guys i'm trying to install the stabletuner like you explained but i get this when i run the bat file
also my miniconda is not in C:\ its in d:
i've taken in to considuration that i need to make custom-conda-path.txt in the folder,but this is what i get when i run it

Environment name is set as ST as per environment.yaml
anaconda3/miniconda3 detected in "e:\Users\SPYBG\miniconda3"
Found Anaconda
The filename, directory name, or volume label syntax is incorrect.
'conda' is not recognized as an internal or external command,
operable program or batch file.
The filename, directory name, or volume label syntax is incorrect.
Traceback (most recent call last):
File "E:\StableTuner\scripts\windows_install.py", line 9, in
import requests
ModuleNotFoundError: No module named 'requests'
Press any key to continue . . .

[!] xformers NOT installed.

[!] xformers NOT installed.
Installing xformers with: pip install -U -I --no-deps https://github.com/C43H66N12O12S2/stable-diffusion-webui/releases/download/f/xformers-0.0.14.dev0-cp310-cp310-win_amd64.whl
Installing xformers
Traceback (most recent call last):
File "C:\Users\zhang\StableTuner\scripts\windows_install.py", line 85, in
check_versions()
File "C:\Users\zhang\StableTuner\scripts\windows_install.py", line 76, in check_versions
run(f"pip install {x_cmd}", desc="Installing xformers")
File "C:\Users\zhang\StableTuner\scripts\windows_install.py", line 32, in run
raise RuntimeError(message)
RuntimeError: Error running command.
Command: pip install -U -I --no-deps https://github.com/C43H66N12O12S2/stable-diffusion-webui/releases/download/f/xformers-0.0.14.dev0-cp310-cp310-win_amd64.whl
Error code: 1
stdout: Collecting xformers==0.0.14.dev0

stderr: WARNING: Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:997)'))': /C43H66N12O12S2/stable-diffusion-webui/releases/download/f/xformers-0.0.14.dev0-cp310-cp310-win_amd64.whl
WARNING: Retrying (Retry(total=3, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:997)'))': /C43H66N12O12S2/stable-diffusion-webui/releases/download/f/xformers-0.0.14.dev0-cp310-cp310-win_amd64.whl
WARNING: Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:997)'))': /C43H66N12O12S2/stable-diffusion-webui/releases/download/f/xformers-0.0.14.dev0-cp310-cp310-win_amd64.whl
WARNING: Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:997)'))': /C43H66N12O12S2/stable-diffusion-webui/releases/download/f/xformers-0.0.14.dev0-cp310-cp310-win_amd64.whl
WARNING: Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:997)'))': /C43H66N12O12S2/stable-diffusion-webui/releases/download/f/xformers-0.0.14.dev0-cp310-cp310-win_amd64.whl
ERROR: Could not install packages due to an OSError: HTTPSConnectionPool(host='github.com', port=443): Max retries exceeded with url: /C43H66N12O12S2/stable-diffusion-webui/releases/download/f/xformers-0.0.14.dev0-cp310-cp310-win_amd64.whl (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:997)')))

8bit Adam not working on windows without wsl

I was able to install all required packages and get StableTuner up and running. The installation script was run again and completed successfully. But when trying to run train StableTuner with Adam8bit enabled, it crashes with an error

Cudatoolkit is installed in conda env. Is wsl required to run StableTuner with 8bit adam? I found a similar issue in the repository
d8ahazard/sd_dreambooth_extension#3

IMPORTANT: when 8bit adam is disabled, training starts successfully. But OOM vram happens after the first steps (I have 15GB vram).

I was able to install all required packages and get StableTuner up and running. The installation script was run again and completed successfully. But when trying to run train StableTuner crashes with an error

accelerate "launch" "--mixed_precision=no" "scripts/trainer.py" "--model_variant=base" "--disable_cudnn_benchmark" "--sample_step_interval=500" "--pretrained_model_name_or_path=C:/StableTuner/models/wd-1-3-penultimate-ucg-cont" "--pretrained_vae_name_or_path=" "--output_dir=models/new_model" "--seed=3434554" "--resolution=512" "--train_batch_size=24" "--num_train_epochs=100" "--use_bucketing" "--aspect_mode=dynamic" "--aspect_mode_action_preference=add" "--use_8bit_adam" "--gradient_checkpointing" "--gradient_accumulation_steps=1" "--learning_rate=3e-6" "--lr_warmup_steps=0" "--lr_scheduler=constant" "--train_text_encoder" "--concepts_list=stabletune_concept_list.json" "--num_class_images=200" "--save_every_n_epoch=5" "--n_save_sample=1" "--sample_height=512" "--sample_width=512" "--dataset_repeats=1" "--sample_on_training_start" "--clip_penultimate"
The following values were not passed to accelerate launch and had defaults used instead:
--num_processes was set to a value of 1
--num_machines was set to a value of 1
--dynamo_backend was set to a value of 'no'
To avoid this warning pass in values for each of the problematic parameters or run accelerate config.
Booting Up StableTuner
Please wait a moment as we load up some stuff...
C:\ProgramData\Anaconda3\lib\site-packages\accelerate\accelerator.py:321: UserWarning: log_with=tensorboard was passed but no supported trackers are currently installed.
warnings.warn(f"log_with={log_with} was passed but no supported trackers are currently installed.")
C:\ProgramData\Anaconda3\lib\site-packages\bitsandbytes\cextension.py:101: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {WindowsPath('C')}
warn(msg)
C:\ProgramData\Anaconda3\lib\site-packages\bitsandbytes\cextension.py:101: UserWarning: C:\ProgramData\Anaconda3\envs\ST did not contain libcudart.so as expected! Searching further paths...
warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
C:\ProgramData\Anaconda3\lib\site-packages\bitsandbytes\cextension.py:101: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {WindowsPath('/usr/local/cuda/lib64')}
warn(msg)
CUDA SETUP: WARNING! libcuda.so not found! Do you have a CUDA driver installed? If you are on a cluster, make sure you are on a CUDA machine!
C:\ProgramData\Anaconda3\lib\site-packages\bitsandbytes\cextension.py:101: UserWarning: WARNING: No libcudart.so found! Install CUDA or the cudatoolkit package (anaconda)!
warn(msg)
C:\ProgramData\Anaconda3\lib\site-packages\bitsandbytes\cextension.py:101: UserWarning: WARNING: No GPU detected! Check your CUDA paths. Proceeding to load CPU-only library...
warn(msg)
CUDA SETUP: Loading binary C:\ProgramData\Anaconda3\lib\site-packages\bitsandbytes\libbitsandbytes_cpu.so...
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
CUDA SETUP: WARNING! libcuda.so not found! Do you have a CUDA driver installed? If you are on a cluster, make sure you are on a CUDA machine!
CUDA SETUP: Loading binary C:\ProgramData\Anaconda3\lib\site-packages\bitsandbytes\libbitsandbytes_cpu.so...
CUDA SETUP: Problem: The main issue seems to be that the main CUDA library was not detected.
CUDA SETUP: Solution 1): Your paths are probably not up-to-date. You can update them via: sudo ldconfig.
CUDA SETUP: Solution 2): If you do not have sudo rights, you can do the following:
CUDA SETUP: Solution 2a): Find the cuda library via: find / -name libcuda.so 2>/dev/null
CUDA SETUP: Solution 2b): Once the library is found add it to the LD_LIBRARY_PATH: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:FOUND_PATH_FROM_2a
CUDA SETUP: Solution 2c): For a permanent solution add the export from 2b into your .bashrc file, located at ~/.bashrc
Traceback (most recent call last):
File "C:\diffusion\StableTuner\scripts\trainer.py", line 2380, in
main()
File "C:\diffusion\StableTuner\scripts\trainer.py", line 1530, in main
import bitsandbytes as bnb
File "C:\ProgramData\Anaconda3\lib\site-packages\bitsandbytes_init_.py", line 6, in
from .autograd._functions import (
File "C:\ProgramData\Anaconda3\lib\site-packages\bitsandbytes\autograd_functions.py", line 5, in
import bitsandbytes.functional as F
File "C:\ProgramData\Anaconda3\lib\site-packages\bitsandbytes\functional.py", line 13, in
from .cextension import COMPILED_WITH_CUDA, lib
File "C:\ProgramData\Anaconda3\lib\site-packages\bitsandbytes\cextension.py", line 118, in
raise RuntimeError('''
RuntimeError:
CUDA Setup failed despite GPU being available. Inspect the CUDA SETUP outputs aboveto fix your environment!
If you cannot find any issues and suspect a bug, please open an issue with detals about your environment:
https://github.com/TimDettmers/bitsandbytes/issues
Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\Scripts\accelerate-script.py", line 9, in
sys.exit(main())
File "C:\ProgramData\Anaconda3\lib\site-packages\accelerate\commands\accelerate_cli.py", line 45, in main
args.func(args)
File "C:\ProgramData\Anaconda3\lib\site-packages\accelerate\commands\launch.py", line 1104, in launch_command
simple_launcher(args)
File "C:\ProgramData\Anaconda3\lib\site-packages\accelerate\commands\launch.py", line 567, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['C:\ProgramData\Anaconda3\python.exe', 'scripts/trainer.py', '--model_variant=base', '--disable_cudnn_benchmark', '--sample_step_interval=500', '--pretrained_model_name_or_path=C:/StableTuner/models/wd-1-3-penultimate-ucg-cont', '--pretrained_vae_name_or_path=', '--output_dir=models/new_model', '--seed=3434554', '--resolution=512', '--train_batch_size=24', '--num_train_epochs=100', '--use_bucketing', '--aspect_mode=dynamic', '--aspect_mode_action_preference=add', '--use_8bit_adam', '--gradient_checkpointing', '--gradient_accumulation_steps=1', '--learning_rate=3e-6', '--lr_warmup_steps=0', '--lr_scheduler=constant', '--train_text_encoder', '--concepts_list=stabletune_concept_list.json', '--num_class_images=200', '--save_every_n_epoch=5', '--n_save_sample=1', '--sample_height=512', '--sample_width=512', '--dataset_repeats=1', '--sample_on_training_start', '--clip_penultimate']' returned non-zero exit status 1.

Cudatoolkit is installed in conda env. Is wsl required to run StableTuner? I found a similar issue in the bitsandbytes repository and the developer said that this libcudart.so is not supported on Windows

Os:windows 10

The Playground no longer works

It used to work fine, but since a few updates it no longer works.
I deleted the ST env and installed again - same issue

Traceback (most recent call last):
File "C:\Users\Desktop\Documents\Anaconda\envs\ST\lib\tkinter_init_.py", line 1921, in call
return self.func(*args)
File "C:\Users\Desktop\Documents\Anaconda\envs\ST\lib\site-packages\customtkinter\windows\widgets\ctk_button.py", line 531, in _clicked
self._command()
File "D:\StableTuner\scripts\configuration_gui.py", line 1907, in
self.play_generate_image_button = ctk.CTkButton(self.playground_frame_subframe, text="Generate Image", command=lambda: self.play_generate_image(self.play_model_entry.get(), self.play_prompt_entry.get(), self.play_negative_prompt_entry.get(), self.play_seed_entry.get(), self.play_scheduler_variable.get(), int(self.play_resolution_slider_height.get()), int(self.play_resolution_slider_width.get()), self.play_cfg_slider.get(), self.play_steps_slider.get()))
File "D:\StableTuner\scripts\configuration_gui.py", line 2271, in play_generate_image
del self.play_current_image
AttributeError: play_current_image

No module named model_util

Gave the most recent version a try and after downloading/extracting/overwriting the result was the below:

conda activate diffusers
cd C:\StableTuner

(diffusers) C:\StableTuner>python configuration_gui.py
Traceback (most recent call last):
File "C:\StableTuner\configuration_gui.py", line 13, in
from scripts import converters
File "C:\StableTuner\scripts\converters.py", line 44, in
import model_util
ModuleNotFoundError: No module named 'model_util'

The remedy was editing line 44 of
C:\StableTuner\scripts\converters.py

from scripts import model_util

The UI was was able to launch after that.

Also, unless intentional, Small typo, on UI, it reads StableTune
Another issue, disregard if already known.

It seems that if there is no Internet connection, ckpt models can not be converted to diffusers?
.yamls are still trying to be fetched even if the .yaml is already alongside the model.

Feature Request: Randomize tags after the first comma

This could help when training with long captions > 75 tokens in length. The caption text can be split on commas (for example) into tags, and the tag order can be shuffled every epoch, so the part that gets truncated is different between epochs.

EveryDream has the option to exclude the first ("title") tag from this, and then to uniformly shuffle the remaining tags, or else to specify a probability for each tag to be shuffled in a separate json file.

This could be extended to only deal with the problem of truncation, and not shuffle the tags otherwise. For captions with >75 tokens, the first N tokens (or comma separated tags) could be held fixed, while the rest of the caption text could be populated randomly from the truncated section every epoch.

EDIT: I see talk of extended token limits in the PR. Maybe relevant?

Blank screen when using SD 2.1 768

This could be an user error.
When I used SD 2.1 768 as base with correct 768 training set it created the sample images all fine in the folders but at the end NONE saved set would produce any image in the playground, nor any checkpoint converted would do in A1111. The image would be all blank. (black)

This is curious (especially because the sample images were all fine so it did worked at some point during training)

What could be the issue?

(no problem with 2.1 512 base)

BTW, I absolutely love the interface and how easy it is to run!

Question. ColossalAI

Does StableTuner use the optimization methods that ColossalAI uses?
Ability to train with batch size=8 with only 9gb vram
Much faster learning

Stable Diffusion with Colossal-AI provides 6.5x faster training and pretraining cost saving, the hardware cost of fine-tuning can be almost 7X cheaper (from RTX3090/4090 24GB to RTX3050/2070 8GB)

Text Encoder out of memory

Hi,
i have already some experience with dreambooth (e.g. with automatic1111 extension), which works on a 12 gb vram card.
it also works when i train the text_encoder.

but the same settings fail on StableTuner, here are my typical VRAM saved settings:

{
   "add_controlled_seed_to_sample": [],
    "model_path": "C:\\data\\codes\\stabledif\\StableTuner\\models\\v1-5-pruned",
    "vae_path": "",
    "output_path": "models/new_model",
    "send_telegram_updates": 0,
    "telegram_token": "",
    "telegram_chat_id": "",
    "resolution": "512",
    "batch_size": "1",
    "train_epocs": "100",
    "mixed_precision": "fp16",
    "use_8bit_adam": 1,
    "use_gradient_checkpointing": 1,
    "accumulation_steps": "1",
    "learning_rate": "2e-6",
    "warmup_steps": "0",
    "learning_rate_scheduler": "constant",
    "use_latent_cache": 0,
    "save_latent_cache": 0,
    "regenerate_latent_cache": 0,
    "train_text_encoder": 1,
    "with_prior_loss_preservation": 1,
    "prior_loss_preservation_weight": "1.0",
    "use_image_names_as_captions": 1,
    "auto_balance_concept_datasets": 0,
    "add_class_images_to_dataset": 0,
    "number_of_class_images": "200",
    "save_every_n_epochs": "20",
    "number_of_samples_to_generate": "5",
    "sample_height": "512",
    "sample_width": "512",
    "sample_random_aspect_ratio": 0,
    "sample_on_training_start": 1,
    "aspect_ratio_bucketing": 0,
    "seed": "3434554",
    "dataset_repeats": "1",
    "limit_text_encoder_training": "30%",
    "use_text_files_as_captions": 1,
    "ckpt_version": null,
    "convert_to_ckpt_after_training": 1,
    "execute_post_conversion": 1,
    "disable_cudnn_benchmark": 1,
    "sample_step_interval": "500",
    "conditional_dropout": "",
    "clip_penultimate": 0,
    "use_ema": 0,
    "aspect_ratio_bucketing_mode": "Dynamic Fill",
    "dynamic_bucketing_mode": "Duplicate"
}

Fatal error in launcher

I loaded the model into StableTuner, installed the concept, specified all the settings. I'm trying to run the training and I'm getting an error

loading text encoder:
copy scheduler/tokenizer config from: runwayml/stable-diffusion-v1-5
Diffusers model saved.

(ST) C:\StableTuner>accelerate "launch" "--mixed_precision=fp16" "scripts/trainer.py" "--model_variant=base" "--disable_cudnn_benchmark" "--use_text_files_as_captions" "--sample_step_interval=500" "--pretrained_model_name_or_path=C:\StableTuner\models\wd-1-3-penultimate-ucg-cont" "--pretrained_vae_name_or_path=" "--output_dir=models/new_model" "--seed=3434554" "--resolution=512" "--train_batch_size=24" "--num_train_epochs=100" "--mixed_precision=fp16" "--use_bucketing" "--aspect_mode=dynamic" "--aspect_mode_action_preference=add" "--use_8bit_adam" "--gradient_checkpointing" "--gradient_accumulation_steps=1" "--learning_rate=3e-6" "--lr_warmup_steps=0" "--lr_scheduler=constant" "--train_text_encoder" "--concepts_list=stabletune_concept_list.json" "--num_class_images=200" "--save_every_n_epoch=5" "--n_save_sample=1" "--sample_height=512" "--sample_width=512" "--dataset_repeats=1" "--sample_on_training_start"
Fatal error in launcher: Unable to create process using '"C:\ProgramData\miniconda3\python.exe" "C:\Users\adam\AppData\Roaming\Python\Python310\Scripts\accelerate.exe" "launch" "--mixed_precision=fp16" "scripts/trainer.py" "--model_variant=base" "--disable_cudnn_benchmark" "--use_text_files_as_captions" "--sample_step_interval=500" "--pretrained_model_name_or_path=C:\StableTuner\models\wd-1-3-penultimate-ucg-cont" "--pretrained_vae_name_or_path=" "--output_dir=models/new_model" "--seed=3434554" "--resolution=512" "--train_batch_size=24" "--num_train_epochs=100" "--mixed_precision=fp16" "--use_bucketing" "--aspect_mode=dynamic" "--aspect_mode_action_preference=add" "--use_8bit_adam" "--gradient_checkpointing" "--gradient_accumulation_steps=1" "--learning_rate=3e-6" "--lr_warmup_steps=0" "--lr_scheduler=constant" "--train_text_encoder" "--concepts_list=stabletune_concept_list.json" "--num_class_images=200" "--save_every_n_epoch=5" "--n_save_sample=1" "--sample_height=512" "--sample_width=512" "--
Press any key to continue...

Os: windows 10
Tried reinstalling several times, removed conda, python and all env. Problem still exists
Model: waifu-diffusion. This problem with all models (try sd1.4, sd1.5 and SD2.1)

Feature request: Export to Cloud option / Script

Hi!

I love the export to cloud option, it would be great if it also included an export to shell script toggle option.
All of the commands are already there just wrapped within a notebook.

Would make it a great way to edit / modify model for training within the existing interface. Export to Cloud --> Upload to Linux box and "press play on tape", it might be a useful way to have a Linux option without having to worry about a GUI yet etc.

Obviously, it needs a script to create and setup the environment first etc.

Linux support soon please

First of all, thank you for all the hard work with StableTuner. It's really, really awesome!

Well, I've left issues on the Automatic UI, and the dreambooth extension for it asking them to implement proper standard Stable Diffusion fine-tuning (rather than the various variants which they have right now). I'm putting an issue here asking you folks to add linux support on the assumption that neither of them will actually implement it (I don't know why they wouldn't but I figure if they didn't do it before maybe they never will)

I'm really, really not excited by the idea that I'd have to switch over to my windows machine just to do regular fine-tuning if I don't want to inflict myself with the pain of sketchy random notebooks.

Please PLEASE get this running on linux asap. Right now us linux users are stuck in a pretty annoying situation with trying to make custom models.

Crash when starting training. LZMA Module cannot be found

Install seemed to go fine, I selected my model and added my concepts and clicked begin training.

only thing I see that might be an issue in the log is

ImportError: DLL load failed while importing _lzma: The specified module could not be found.

GUI QOL

Small issue, my setup is dark themed and the editable fields within StableTuner's GUI leaves the cursor as solid black. Wondering if it's possible make it blue or something?

CUDNN 8.6 Install appears to be broken

It may just be my computer/network connection, but the automatic CUDNN 8.6 installation doesn't work, it says it takes too long to respond and errors.

I got around this by downloading the .ZIP file from NVIDIA's website for the Windows version of CUDNN 8.6, unzipped it, renamed it to "cudnn_windows", and put it in the "resources" folder.

NotADirectoryError: [WinError 267] The directory name is invalid

Unable to use 'Resume From Last Session', receive the following error:

NotADirectoryError: [WinError 267] The directory name is invalid: 'models/marinmarin/marinmarin_v1001c\stabletuner_768_e100_01-03-14-17.json'

I've swapped '/'s for '\'s with no success, and of course I confirmed that the exact path and filename are in fact correct.

Easy resume a training

Apart from loading the newly trained dreambooth model, I think it can be a great to have a dedicated way to just resume for a given config where stableTuner will load the output model and continue train it without changing anything.

More as a candy feature than a necessary feature but I hope it will make an addition in this project.

M1 support

Don't know where else to put this request. I would love to see this project be made compatible with M1 Macs. It's just a request. I can't really control anything. i think that the more tools available to train and fine-tune these models, the better.

Crash when train_text_encoder is false and conditional_dropout > 0

When args.train_text_encoder is false, CachedLatentsDataset.cache.text_encoder_cache contains preencoded token embeddings rather than token IDs. However, CachedLatentsDataset.empty_tokens is still filled with token_ids. When args.conditional_dropout is nonzero, this results in batches selected for conditional dropout getting token IDs where embeddings are expected, and the training crashes due to mismatching tensor sizes.

I looked into fixing this but gave up for the moment so I'm reporting it as a bug.

Crash when using limit text encoder at 100%

If you put limit to 100% then it breaks up when it reaches 100% mark - something about that it can't save the frozen text encoder. Seems like a minor annoyance.
Before you ask - I was testing the result of encoder limit % on the result trained set.

CKPT to Diffusers error

Trying to convert ckpt that work well in AUTO1111, but fails in ST

Exception in Tkinter callback
Traceback (most recent call last):
File "C:\Users\Desktop\Documents\Anaconda\envs\ST\lib\tkinter_init_.py", line 1921, in call
return self.func(*args)
File "C:\Users\Desktop\Documents\Anaconda\envs\ST\lib\site-packages\customtkinter\windows\widgets\ctk_button.py", line 531, in _clicked
self._command()
File "D:\StableTuner\scripts\configuration_gui.py", line 1859, in
self.convert_ckpt_to_diffusers_button = ctk.CTkButton(self.toolbox_frame_subframe, text="Convert CKPT To Diffusers", command=lambda:self.convert_ckpt_to_diffusers())
File "D:\StableTuner\scripts\configuration_gui.py", line 2212, in convert_ckpt_to_diffusers
version, prediction = self.get_sd_version(ckpt_path)
File "D:\StableTuner\scripts\configuration_gui.py", line 2527, in get_sd_version
checkpoint = checkpoint["state_dict"]
KeyError: 'state_dict'

Latest pull - issue

Receiving weird error in the latest pull
Traceback (most recent call last):
File "D:\StableTuner\scripts\trainer.py", line 16, in
import keyboard
ModuleNotFoundError: No module named 'keyboard'
Traceback (most recent call last):
File "C:\Users\Desktop\Documents\Anaconda\envs\ST\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\Desktop\Documents\Anaconda\envs\ST\lib\runpy.py", line 86, in run_code
exec(code, run_globals)
File "C:\Users\Desktop\Documents\Anaconda\envs\ST\Scripts\accelerate.exe_main
.py", line 7, in
File "C:\Users\Desktop\Documents\Anaconda\envs\ST\lib\site-packages\accelerate\commands\accelerate_cli.py", line 45, in main
args.func(args)
File "C:\Users\Desktop\Documents\Anaconda\envs\ST\lib\site-packages\accelerate\commands\launch.py", line 1104, in launch_command
simple_launcher(args)
File "C:\Users\Desktop\Documents\Anaconda\envs\ST\lib\site-packages\accelerate\commands\launch.py", line 567, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)

Proper setup for multi-concept finetuning?

I have a dataset of images + captions in .txt files. Seems like that isn't enough - if I leave the "Data" section empty with no concepts defined, the training script finds no images and crashes. However, my images can contain more than one concept per image, so I'm not sure how to properly set this up.

Let's say I have an image of a woman leaning on a car, with a .txt caption "woman leaning on car", and I want to finetune the model+text encoder to recognize both "car", "leaning" and "woman". How do I actually set this up in the data section? Should I create three separate "concepts" for "car", "leaning" and "woman", and add the same image to all of them? Just add everything to some single "DummyConcept" if it just gets ignored if using captions? Is this type of training entirely unsupported?

I'm just really lost here.

Bug: When selecting 'fp32' for precision, invalid value for --mixed-precision is passed to command line

When selecting 'fp32' for precision, should pass '--mixed-precision=no' instead of '--mixed-precision=fp32'

trainer.py: error: argument --mixed_precision: invalid choice: 'fp32' (choose from 'no', 'fp16', 'bf16')
Traceback (most recent call last):
File "C:\Users\revis.conda\envs\ST\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\revis.conda\envs\ST\lib\runpy.py", line 86, in run_code
exec(code, run_globals)
File "C:\Users\revis.conda\envs\ST\Scripts\accelerate.exe_main
.py", line 7, in
File "C:\Users\revis.conda\envs\ST\lib\site-packages\accelerate\commands\accelerate_cli.py", line 45, in main
args.func(args)
File "C:\Users\revis.conda\envs\ST\lib\site-packages\accelerate\commands\launch.py", line 1104, in launch_command
simple_launcher(args)
File "C:\Users\revis.conda\envs\ST\lib\site-packages\accelerate\commands\launch.py", line 567, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)

Not issue just a question

Does this utalise python env files that way it doesn't install all the scripts directly to your base system?

Is the learned image inverted even though it is not set?

I use the latest version.
I feel that only when I use dataset repeat, the training image is inverted and remembered.
Not all of them are, but normal images are mixed in the output.

If I turn off dataset repeat, the images are not inverted, so perhaps I am not mistaken,
but I apologize if this is a problem specific to my environment.

Cannot install

Can't install, getting this error:
stderr: ERROR: Wheel 'torch' located at C:\Users\dmitr\AppData\Local\Temp\pip-unpack-4t9mrkdc\torch-1.13.1-cp310-cp310-win_amd64.whl is invalid.

Getting Error trying to start training (potential lack of support for Datacenter Tesla P40 24GB card)

Getting the following after latents are cached and training attempts to begin:

"Error no kernel image is available for execution on the device at line 167 in file D:\ai\tool\bitsandbytes\csrc\ops.cu
Traceback (most recent call last):
File "C:\Users\unres\miniconda3\envs\ST\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\unres\miniconda3\envs\ST\lib\runpy.py", line 86, in run_code
exec(code, run_globals)
File "C:\Users\unres\miniconda3\envs\ST\Scripts\accelerate.exe_main
.py", line 7, in
File "C:\Users\unres\miniconda3\envs\ST\lib\site-packages\accelerate\commands\accelerate_cli.py", line 45, in main
args.func(args)
File "C:\Users\unres\miniconda3\envs\ST\lib\site-packages\accelerate\commands\launch.py", line 1104, in launch_command
simple_launcher(args)
File "C:\Users\unres\miniconda3\envs\ST\lib\site-packages\accelerate\commands\launch.py", line 567, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['C:\Users\unres\miniconda3\envs\ST\python.exe', 'scripts/trainer.py', '--attention=xformers', '--model_variant=base', '--disable_cudnn_benchmark', '--sample_step_interval=500', '--pretrained_model_name_or_path=runwayml/stable-diffusion-v1-5', '--pretrained_vae_name_or_path=', '--output_dir=models/new_model', '--seed=3434554', '--resolution=512', '--train_batch_size=24', '--num_train_epochs=100', '--mixed_precision=fp16', '--use_bucketing', '--aspect_mode=dynamic', '--aspect_mode_action_preference=add', '--use_8bit_adam', '--gradient_checkpointing', '--gradient_accumulation_steps=1', '--learning_rate=3e-6', '--lr_warmup_steps=0', '--lr_scheduler=constant', '--train_text_encoder', '--concepts_list=stabletune_concept_list.json', '--num_class_images=4420', '--save_every_n_epoch=5', '--n_save_sample=1', '--sample_height=512', '--sample_width=512', '--dataset_repeats=1']' returned non-zero exit status 1."

I've only seen 3080 and 3090 mentioned on the main page. Does this repo not support the older Tesla data center 24GB cards like Automatic1111 and InvokeAI do?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.