kohya-ss / sd-scripts Goto Github PK

View Code? Open in Web Editor NEW

4.3K 46.0 737.0 9.61 MB

License: Apache License 2.0

Python 100.00%

sd-scripts's People

Contributors

Stargazers

Watchers

Forkers

bmaltais inu-ai shirayu breakcore2 giga-bytes-dev gdtiti space-nuko crois-san huna90 fannovel16 b1sounours k2iwai mgz-dev sectaai wwfs mrwho alleniver p1atdev lltcggie t-sumida ddpn08 shrleeknsword vrgz2022 lordxan treksis s60912frank serick4126 anon-1337 sheldonchiu xiong-jie-y akegarasu julianko13 cian0 regnore cloud0fly hitomi qiu-bot tsukimiya pepepgadev hg8a3 satyamssj10 5l1v3r1 nerumaro baejun10 kushima8 roman20692 jsfs11 aka7774 bootsoflagrangian sitatec ai-casanova sewrath overcomersinc hirsaeki isotr0py rockerboo savearray2 autunite qautsar hsdsacxa1 fur0ut0 fanta444 koudkunstje pha123661 mjmichaela airkingzy tuanshu yameprogrammer bellatrix8 sgusvhsuwu uaksz555 darenkain hufeihu feffy380 ukaserge hrb518 chalecao tkskkurumi leondelee wangxuqi zxonvacation qhsakura tananjo sophronia798 lillcry shadyshadman69 luna3239 zhenhua32 welchs0102 xjlin lupinzema ristellise froggie3 if-ai asderel372 leontsai2021 e1100x qisikai stupidgame gitbenxing

sd-scripts's Issues

Add an option for the number of training epochs instead of steps

The number of batches per epoch cannot be calculated from the steps due to bucketing, so it is useful.

Example of training LoRA

Hi, thank you for nice work.

I have been trying to train LoRA, but have not yet succeeded.
To be precise, the loss goes down and last.safetensors is generated, but using them hardly changes the image produced like this.

I have tried for days with different parameters, but have not been able to identify the problem.

I uploaded the training data and scripts at https://github.com/shirayu/example_lora_training .
Any advice would be appreciated.

The command for training: https://github.com/shirayu/example_lora_training/blob/cdf08770e41d0cf82ee5c7e20dc1dfaed8ea824b/train_lora.zsh

Add `accelerate` torch.compile() support for faster training on Pytorch 2.0

When selecting this in accelerate config:

Do you wish to optimize your script with torch dynamo?[yes/NO]:yes
---------------------------------------------------------------------------------------------------------Which dynamo backend would you like to use?
Please select a choice using the arrow or number keys, and selecting with enter
    eager
    aot_eager
 ➔  inductor
    nvfuser
    aot_nvfuser
    aot_cudagraphs
    ofi
    fx2trt
    onnxrt
    ipex

The LORA training script errors out with:

steps:   0%|                                                                    | 0/1600 [00:00<?, ?it/s]epoch 1/2
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/alpha/.local/lib/python3.10/site-packages/torch/_dynamo/output_graph.py:674 in             │
│ call_user_compiler                                                                               │
│                                                                                                  │
│   671 │   │   │   elif config.DO_NOT_USE_legacy_non_fake_example_inputs:                         │
│   672 │   │   │   │   compiled_fn = compiler_fn(gm, self.example_inputs())                       │
│   673 │   │   │   else:                                                                          │
│ ❱ 674 │   │   │   │   compiled_fn = compiler_fn(gm, self.fake_example_inputs())                  │
│   675 │   │   │   _step_logger()(logging.INFO, f"done compiler function {name}")                 │
│   676 │   │   │   assert callable(compiled_fn), "compiler_fn did not return callable"            │
│   677 │   │   except Exception as e:                                                             │
│                                                                                                  │
│ /home/alpha/.local/lib/python3.10/site-packages/torch/_dynamo/debug_utils.py:1032 in             │
│ debug_wrapper                                                                                    │
│                                                                                                  │
│   1029 │   │   │   │   │   )                                                                     │
│   1030 │   │   │   │   │   raise                                                                 │
│   1031 │   │   else:                                                                             │
│ ❱ 1032 │   │   │   compiled_gm = compiler_fn(gm, example_inputs, **kwargs)                       │
│   1033 │   │                                                                                     │
│   1034 │   │   return compiled_gm                                                                │
│   1035                                                                                           │
│                                                                                                  │
│ /home/alpha/.local/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:398 in compile_fx  │
│                                                                                                  │
│   395 │   │   # TODO: can add logging before/after the call to create_aot_dispatcher_function    │
│   396 │   │   # in torch._functorch/aot_autograd.py::aot_module_simplified::aot_function_simpl   │
│   397 │   │   # once torchdynamo is merged into pytorch                                          │
│ ❱ 398 │   │   return aot_autograd(                                                               │
│   399 │   │   │   fw_compiler=fw_compiler,                                                       │
│   400 │   │   │   bw_compiler=bw_compiler,                                                       │
│   401 │   │   │   decompositions=select_decomp_table(),                                          │
│                                                                                                  │
│ /home/alpha/.local/lib/python3.10/site-packages/torch/_dynamo/optimizations/training.py:78 in    │
│ compiler_fn                                                                                      │
│                                                                                                  │
│    75 │   │   try:                                                                               │
│    76 │   │   │   # NB: NOT cloned!                                                              │
│    77 │   │   │   with enable_aot_logging():                                                     │
│ ❱  78 │   │   │   │   cg = aot_module_simplified(gm, example_inputs, **kwargs)                   │
│    79 │   │   │   │   counters["aot_autograd"]["ok"] += 1                                        │
│    80 │   │   │   │   return eval_frame.disable(cg)                                              │
│    81 │   │   except Exception:                                                                  │
│                                                                                                  │
│ /home/alpha/.local/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py:2355 in         │
│ aot_module_simplified                                                                            │
│                                                                                                  │
│   2352 │   full_args.extend(params_flat)                                                         │
│   2353 │   full_args.extend(args)                                                                │
│   2354 │                                                                                         │
│ ❱ 2355 │   compiled_fn = create_aot_dispatcher_function(                                         │
│   2356 │   │   functional_call,                                                                  │
│   2357 │   │   full_args,                                                                        │
│   2358 │   │   aot_config,                                                                       │
│                                                                                                  │
│ /home/alpha/.local/lib/python3.10/site-packages/torch/_dynamo/utils.py:94 in time_wrapper        │
│                                                                                                  │
│     91 │   │   if key not in compilation_metrics:                                                │
│     92 │   │   │   compilation_metrics[key] = []                                                 │
│     93 │   │   t0 = time.time()                                                                  │
│ ❱   94 │   │   r = func(*args, **kwargs)                                                         │
│     95 │   │   latency = time.time() - t0                                                        │
│     96 │   │   # print(f"Dynamo timer: key={key}, latency={latency:.2f} sec")                    │
│     97 │   │   compilation_metrics[key].append(latency)                                          │
│                                                                                                  │
│ /home/alpha/.local/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py:2052 in         │
│ create_aot_dispatcher_function                                                                   │
│                                                                                                  │
│   2049 │   │   compiler_fn = partial(aot_wrapper_dedupe, compiler_fn=compiler_fn)                │
│   2050 │   │   # You can put more passes here                                                    │
│   2051 │   │                                                                                     │
│ ❱ 2052 │   │   compiled_fn = compiler_fn(flat_fn, fake_flat_tensor_args, aot_config)             │
│   2053 │   │                                                                                     │
│   2054 │   │   if not hasattr(compiled_fn, '_boxed_call'):                                       │
│   2055 │   │   │   compiled_fn = make_boxed_func(compiled_fn)                                    │
│                                                                                                  │
│ /home/alpha/.local/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py:1273 in         │
│ aot_wrapper_dedupe                                                                               │
│                                                                                                  │
│   1270 │   # or not                                                                              │
│   1271 │   try:                                                                                  │
│   1272 │   │   with enable_python_dispatcher():                                                  │
│ ❱ 1273 │   │   │   fw_metadata, _out, _num_aliasing_metadata_outs = run_functionalized_fw_and_c  │
│   1274 │   │   │   │   flat_fn                                                                   │
│   1275 │   │   │   )(*flat_args)                                                                 │
│   1276 │   except RuntimeError as e:                                                             │
│                                                                                                  │
│ /home/alpha/.local/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py:289 in inner    │
│                                                                                                  │
│    286 │   │                                                                                     │
│    287 │   │   torch._enable_functionalization(reapply_views=True)                               │
│    288 │   │   try:                                                                              │
│ ❱  289 │   │   │   outs = f(*f_args)                                                             │
│    290 │   │   finally:                                                                          │
│    291 │   │   │   torch._disable_functionalization()                                            │
│    292                                                                                           │
│                                                                                                  │
│ /home/alpha/.local/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py:2327 in         │
│ functional_call                                                                                  │
│                                                                                                  │
│   2324 │   │   │   │   │   │   "ignore", "Anomaly Detection has been enabled."                   │
│   2325 │   │   │   │   │   )                                                                     │
│   2326 │   │   │   │   │   with torch.autograd.detect_anomaly(check_nan=False):                  │
│ ❱ 2327 │   │   │   │   │   │   out = Interpreter(mod).run(*args[params_len:], **kwargs)          │
│   2328 │   │   │   else:                                                                         │
│   2329 │   │   │   │   out = mod(*args[params_len:], **kwargs)                                   │
│   2330                                                                                           │
│                                                                                                  │
│ /home/alpha/.local/lib/python3.10/site-packages/torch/fx/interpreter.py:136 in run               │
│                                                                                                  │
│   133 │   │   │   │   continue                                                                   │
│   134 │   │   │                                                                                  │
│   135 │   │   │   try:                                                                           │
│ ❱ 136 │   │   │   │   self.env[node] = self.run_node(node)                                       │
│   137 │   │   │   except Exception as e:                                                         │
│   138 │   │   │   │   msg = f"While executing {node.format_node()}"                              │
│   139 │   │   │   │   msg = '{}\n\n{}'.format(e.args[0], msg) if e.args else str(msg)            │
│                                                                                                  │
│ /home/alpha/.local/lib/python3.10/site-packages/torch/fx/interpreter.py:177 in run_node          │
│                                                                                                  │
│   174 │   │   │   args, kwargs = self.fetch_args_kwargs_from_env(n)                              │
│   175 │   │   │   assert isinstance(args, tuple)                                                 │
│   176 │   │   │   assert isinstance(kwargs, dict)                                                │
│ ❱ 177 │   │   │   return getattr(self, n.op)(n.target, args, kwargs)                             │
│   178 │                                                                                          │
│   179 │   # Main Node running APIs                                                               │
│   180 │   @compatibility(is_backward_compatible=True)                                            │
│                                                                                                  │
│ /home/alpha/.local/lib/python3.10/site-packages/torch/fx/interpreter.py:294 in call_module       │
│                                                                                                  │
│   291 │   │   assert isinstance(target, str)                                                     │
│   292 │   │   submod = self.fetch_attr(target)                                                   │
│   293 │   │                                                                                      │
│ ❱ 294 │   │   return submod(*args, **kwargs)                                                     │
│   295 │                                                                                          │
│   296 │   @compatibility(is_backward_compatible=True)                                            │
│   297 │   def output(self, target : 'Target', args : Tuple[Argument, ...], kwargs : Dict[str,    │
│                                                                                                  │
│ /home/alpha/.local/lib/python3.10/site-packages/torch/nn/modules/module.py:1482 in _call_impl    │
│                                                                                                  │
│   1479 │   │   if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks   │
│   1480 │   │   │   │   or _global_backward_pre_hooks or _global_backward_hooks                   │
│   1481 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1482 │   │   │   return forward_call(*args, **kwargs)                                          │
│   1483 │   │   # Do not call functions when jit is used                                          │
│   1484 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1485 │   │   backward_pre_hooks = []                                                           │
│                                                                                                  │
│ /home/alpha/clone/sd-scripts/networks/lora.py:44 in forward                                      │
│                                                                                                  │
│    41 │   del self.org_module                                                                    │
│    42                                                                                            │
│    43   def forward(self, x):                                                                    │
│ ❱  44 │   return self.org_forward(x) + self.lora_up(self.lora_down(x)) * self.multiplier         │
│    45                                                                                            │
│    46                                                                                            │
│    47 def create_network(multiplier, network_dim, vae, text_encoder, unet, **kwargs):            │
│                                                                                                  │
│ /home/alpha/.local/lib/python3.10/site-packages/torch/nn/modules/module.py:1482 in _call_impl    │
│                                                                                                  │
│   1479 │   │   if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks   │
│   1480 │   │   │   │   or _global_backward_pre_hooks or _global_backward_hooks                   │
│   1481 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1482 │   │   │   return forward_call(*args, **kwargs)                                          │
│   1483 │   │   # Do not call functions when jit is used                                          │
│   1484 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1485 │   │   backward_pre_hooks = []                                                           │
│                                                                                                  │
│ /home/alpha/.local/lib/python3.10/site-packages/torch/nn/modules/linear.py:114 in forward        │
│                                                                                                  │
│   111 │   │   │   init.uniform_(self.bias, -bound, bound)                                        │
│   112 │                                                                                          │
│   113 │   def forward(self, input: Tensor) -> Tensor:                                            │
│ ❱ 114 │   │   return F.linear(input, self.weight, self.bias)                                     │
│   115 │                                                                                          │
│   116 │   def extra_repr(self) -> str:                                                           │
│   117 │   │   return 'in_features={}, out_features={}, bias={}'.format(                          │
│                                                                                                  │
│ /home/alpha/.local/lib/python3.10/site-packages/torch/_inductor/overrides.py:37 in               │
│ __torch_function__                                                                               │
│                                                                                                  │
│    34 │   │   │   and replacements[func] in replacements_using_triton_random                     │
│    35 │   │   ):                                                                                 │
│    36 │   │   │   return replacements[func](*args, **kwargs)                                     │
│ ❱  37 │   │   return func(*args, **kwargs)                                                       │
│    38                                                                                            │
│    39                                                                                            │
│    40 patch_functions = AutogradMonkeypatch                                                      │
│                                                                                                  │
│ /home/alpha/.local/lib/python3.10/site-packages/torch/_subclasses/fake_tensor.py:825 in          │
│ __torch_dispatch__                                                                               │
│                                                                                                  │
│    822 │   │   │   ), f"{args} {kwargs}"                                                         │
│    823 │   │   │   return converter(self, args[0])                                               │
│    824 │   │                                                                                     │
│ ❱  825 │   │   args, kwargs = self.validate_and_convert_non_fake_tensors(                        │
│    826 │   │   │   func, converter, args, kwargs                                                 │
│    827 │   │   )                                                                                 │
│    828                                                                                           │
│                                                                                                  │
│ /home/alpha/.local/lib/python3.10/site-packages/torch/_subclasses/fake_tensor.py:973 in          │
│ validate_and_convert_non_fake_tensors                                                            │
│                                                                                                  │
│    970 │   │   │   │   return converter(self, x)                                                 │
│    971 │   │   │   return x                                                                      │
│    972 │   │                                                                                     │
│ ❱  973 │   │   return tree_map_only(                                                             │
│    974 │   │   │   torch.Tensor,                                                                 │
│    975 │   │   │   validate,                                                                     │
│    976 │   │   │   (args, kwargs),                                                               │
│                                                                                                  │
│ /home/alpha/.local/lib/python3.10/site-packages/torch/utils/_pytree.py:259 in tree_map_only      │
│                                                                                                  │
│   256 │   ...                                                                                    │
│   257                                                                                            │
│   258 def tree_map_only(ty: TypeAny, fn: FnAny[Any], pytree: PyTree) -> PyTree:                  │
│ ❱ 259 │   return tree_map(map_only(ty)(fn), pytree)                                              │
│   260                                                                                            │
│   261 def tree_all(pred: Callable[[Any], bool], pytree: PyTree) -> bool:                         │
│   262 │   flat_args, _ = tree_flatten(pytree)                                                    │
│                                                                                                  │
│ /home/alpha/.local/lib/python3.10/site-packages/torch/utils/_pytree.py:195 in tree_map           │
│                                                                                                  │
│   192                                                                                            │
│   193 def tree_map(fn: Any, pytree: PyTree) -> PyTree:                                           │
│   194 │   flat_args, spec = tree_flatten(pytree)                                                 │
│ ❱ 195 │   return tree_unflatten([fn(i) for i in flat_args], spec)                                │
│   196                                                                                            │
│   197 Type2 = Tuple[Type[T], Type[S]]                                                            │
│   198 TypeAny = Union[Type[Any], Tuple[Type[Any], ...]]                                          │
│                                                                                                  │
│ /home/alpha/.local/lib/python3.10/site-packages/torch/utils/_pytree.py:195 in <listcomp>         │
│                                                                                                  │
│   192                                                                                            │
│   193 def tree_map(fn: Any, pytree: PyTree) -> PyTree:                                           │
│   194 │   flat_args, spec = tree_flatten(pytree)                                                 │
│ ❱ 195 │   return tree_unflatten([fn(i) for i in flat_args], spec)                                │
│   196                                                                                            │
│   197 Type2 = Tuple[Type[T], Type[S]]                                                            │
│   198 TypeAny = Union[Type[Any], Tuple[Type[Any], ...]]                                          │
│                                                                                                  │
│ /home/alpha/.local/lib/python3.10/site-packages/torch/utils/_pytree.py:244 in inner              │
│                                                                                                  │
│   241 │   │   @functools.wraps(f)                                                                │
│   242 │   │   def inner(x: T) -> Any:                                                            │
│   243 │   │   │   if isinstance(x, ty):                                                          │
│ ❱ 244 │   │   │   │   return f(x)                                                                │
│   245 │   │   │   else:                                                                          │
│   246 │   │   │   │   return x                                                                   │
│   247 │   │   return inner                                                                       │
│                                                                                                  │
│ /home/alpha/.local/lib/python3.10/site-packages/torch/_subclasses/fake_tensor.py:965 in validate │
│                                                                                                  │
│    962 │   │   │   │   │   │   f"Can't call metadata mutating ops on non-Fake Tensor inputs. Fo  │
│    963 │   │   │   │   │   )                                                                     │
│    964 │   │   │   │   if not self.allow_non_fake_inputs:                                        │
│ ❱  965 │   │   │   │   │   raise Exception(                                                      │
│    966 │   │   │   │   │   │   f"Please convert all Tensors to FakeTensors first or instantiate  │
│    967 │   │   │   │   │   │   f"with 'allow_non_fake_inputs'. Found in {func}(*{args}, **{kwar  │
│    968 │   │   │   │   │   )                                                                     │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
Exception: Please convert all Tensors to FakeTensors first or instantiate FakeTensorMode with
'allow_non_fake_inputs'. Found in aten._to_copy.default(*(Parameter containing:
tensor([[ 0.0292,  0.0266,  0.0296,  ...,  0.0353, -0.0317, -0.0230],
        [ 0.0112, -0.0135,  0.0291,  ..., -0.0087,  0.0124,  0.0297],
        [-0.0299,  0.0291, -0.0143,  ..., -0.0097,  0.0106, -0.0191],
        [-0.0344, -0.0083,  0.0227,  ...,  0.0093,  0.0345, -0.0343]],
       device='cuda:0', requires_grad=True),), **{'dtype': torch.float16})

While executing %self_text_model_encoder_layers_0_self_attn_q_proj : [#users=1] =
call_module[target=self_text_model_encoder_layers_0_self_attn_q_proj](args =
(%self_text_model_encoder_layers_0_layer_norm1,), kwargs = {})
Original traceback:
  File "/home/alpha/.local/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line
209, in forward
    query_states = self.q_proj(hidden_states) * self.scale
 |   File "/home/alpha/.local/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py",
line 317, in forward
    hidden_states, attn_weights = self.self_attn(
 |   File "/home/alpha/.local/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py",
line 574, in forward
    layer_outputs = encoder_layer(
 |   File "/home/alpha/.local/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py",
line 643, in forward
    encoder_outputs = self.encoder(
 |   File "/home/alpha/.local/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py",
line 722, in forward
    return self.text_model(


The above exception was the direct cause of the following exception:

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/alpha/clone/sd-scripts/train_network.py:419 in <module>                                    │
│                                                                                                  │
│   416 │   │   │   │   │     help="only training Text Encoder part / Text Encoder関連部分のみ学   │
│   417                                                                                            │
│   418   args = parser.parse_args()                                                               │
│ ❱ 419   train(args)                                                                              │
│   420                                                                                            │
│                                                                                                  │
│ /home/alpha/clone/sd-scripts/train_network.py:283 in train                                       │
│                                                                                                  │
│   280 │   │   with torch.set_grad_enabled(train_text_encoder):                                   │
│   281 │   │     # Get the text embedding for conditioning                                        │
│   282 │   │     input_ids = batch["input_ids"].to(accelerator.device)                            │
│ ❱ 283 │   │     encoder_hidden_states = train_util.get_hidden_states(args, input_ids, tokenize   │
│   284 │   │                                                                                      │
│   285 │   │   # Sample noise that we'll add to the latents                                       │
│   286 │   │   noise = torch.randn_like(latents, device=latents.device)                           │
│                                                                                                  │
│ /home/alpha/clone/sd-scripts/library/train_util.py:1257 in get_hidden_states                     │
│                                                                                                  │
│   1254   if args.clip_skip is None:                                                              │
│   1255 │   encoder_hidden_states = text_encoder(input_ids)[0]                                    │
│   1256   else:                                                                                   │
│ ❱ 1257 │   enc_out = text_encoder(input_ids, output_hidden_states=True, return_dict=True)        │
│   1258 │   encoder_hidden_states = enc_out['hidden_states'][-args.clip_skip]                     │
│   1259 │   if weight_dtype is not None:                                                          │
│   1260 │     # this is required for additional network training                                  │
│                                                                                                  │
│ /home/alpha/.local/lib/python3.10/site-packages/torch/nn/modules/module.py:1482 in _call_impl    │
│                                                                                                  │
│   1479 │   │   if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks   │
│   1480 │   │   │   │   or _global_backward_pre_hooks or _global_backward_hooks                   │
│   1481 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1482 │   │   │   return forward_call(*args, **kwargs)                                          │
│   1483 │   │   # Do not call functions when jit is used                                          │
│   1484 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1485 │   │   backward_pre_hooks = []                                                           │
│                                                                                                  │
│ /home/alpha/.local/lib/python3.10/site-packages/accelerate/utils/operations.py:490 in __call__   │
│                                                                                                  │
│   487 │   │   update_wrapper(self, model_forward)                                                │
│   488 │                                                                                          │
│   489 │   def __call__(self, *args, **kwargs):                                                   │
│ ❱ 490 │   │   return convert_to_fp32(self.model_forward(*args, **kwargs))                        │
│   491 │                                                                                          │
│   492 │   def __getstate__(self):                                                                │
│   493 │   │   raise pickle.PicklingError(                                                        │
│                                                                                                  │
│ /home/alpha/.local/lib/python3.10/site-packages/torch/amp/autocast_mode.py:14 in                 │
│ decorate_autocast                                                                                │
│                                                                                                  │
│    11 │   @functools.wraps(func)                                                                 │
│    12 │   def decorate_autocast(*args, **kwargs):                                                │
│    13 │   │   with autocast_instance:                                                            │
│ ❱  14 │   │   │   return func(*args, **kwargs)                                                   │
│    15 │   decorate_autocast.__script_unsupported = '@autocast() decorator is not supported in    │
│    16 │   return decorate_autocast                                                               │
│    17                                                                                            │
│                                                                                                  │
│ /home/alpha/.local/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py:83 in forward        │
│                                                                                                  │
│    80 │   │   return getattr(self._orig_mod, name)                                               │
│    81 │                                                                                          │
│    82 │   def forward(self, *args, **kwargs):                                                    │
│ ❱  83 │   │   return self.dynamo_ctx(self._orig_mod.forward)(*args, **kwargs)                    │
│    84                                                                                            │
│    85                                                                                            │
│    86 def remove_from_cache(f):                                                                  │
│                                                                                                  │
│ /home/alpha/.local/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py:212 in _fn           │
│                                                                                                  │
│   209 │   │   │   dynamic_ctx = enable_dynamic(self.dynamic)                                     │
│   210 │   │   │   dynamic_ctx.__enter__()                                                        │
│   211 │   │   │   try:                                                                           │
│ ❱ 212 │   │   │   │   return fn(*args, **kwargs)                                                 │
│   213 │   │   │   finally:                                                                       │
│   214 │   │   │   │   set_eval_frame(prior)                                                      │
│   215 │   │   │   │   dynamic_ctx.__exit__(None, None, None)                                     │
│                                                                                                  │
│ /home/alpha/.local/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py:333 in catch_errors  │
│                                                                                                  │
│   330 │   │   │   │   │   return hijacked_callback(frame, cache_size, hooks)                     │
│   331 │   │                                                                                      │
│   332 │   │   with compile_lock:                                                                 │
│ ❱ 333 │   │   │   return callback(frame, cache_size, hooks)                                      │
│   334 │                                                                                          │
│   335 │   catch_errors._torchdynamo_orig_callable = callback  # type: ignore[attr-defined]       │
│   336 │   return catch_errors                                                                    │
│                                                                                                  │
│ /home/alpha/.local/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py:480 in            │
│ _convert_frame                                                                                   │
│                                                                                                  │
│   477 │   def _convert_frame(frame: types.FrameType, cache_size: int, hooks: Hooks):             │
│   478 │   │   counters["frames"]["total"] += 1                                                   │
│   479 │   │   try:                                                                               │
│ ❱ 480 │   │   │   result = inner_convert(frame, cache_size, hooks)                               │
│   481 │   │   │   counters["frames"]["ok"] += 1                                                  │
│   482 │   │   │   return result                                                                  │
│   483 │   │   except (NotImplementedError, Unsupported):                                         │
│                                                                                                  │
│ /home/alpha/.local/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py:103 in _fn        │
│                                                                                                  │
│   100 │   │   prior_fwd_from_src = torch.fx.graph_module._forward_from_src                       │
│   101 │   │   torch.fx.graph_module._forward_from_src = fx_forward_from_src_skip_result          │
│   102 │   │   try:                                                                               │
│ ❱ 103 │   │   │   return fn(*args, **kwargs)                                                     │
│   104 │   │   finally:                                                                           │
│   105 │   │   │   torch._C._set_grad_enabled(prior_grad_mode)                                    │
│   106 │   │   │   torch.random.set_rng_state(rng_state)                                          │
│                                                                                                  │
│ /home/alpha/.local/lib/python3.10/site-packages/torch/_dynamo/utils.py:94 in time_wrapper        │
│                                                                                                  │
│     91 │   │   if key not in compilation_metrics:                                                │
│     92 │   │   │   compilation_metrics[key] = []                                                 │
│     93 │   │   t0 = time.time()                                                                  │
│ ❱   94 │   │   r = func(*args, **kwargs)                                                         │
│     95 │   │   latency = time.time() - t0                                                        │
│     96 │   │   # print(f"Dynamo timer: key={key}, latency={latency:.2f} sec")                    │
│     97 │   │   compilation_metrics[key].append(latency)                                          │
│                                                                                                  │
│ /home/alpha/.local/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py:339 in            │
│ _convert_frame_assert                                                                            │
│                                                                                                  │
│   336 │   │   global initial_grad_state                                                          │
│   337 │   │   initial_grad_state = torch.is_grad_enabled()                                       │
│   338 │   │                                                                                      │
│ ❱ 339 │   │   return _compile(                                                                   │
│   340 │   │   │   frame.f_code,                                                                  │
│   341 │   │   │   frame.f_globals,                                                               │
│   342 │   │   │   frame.f_locals,                                                                │
│                                                                                                  │
│ /home/alpha/.local/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py:400 in _compile   │
│                                                                                                  │
│   397 │   try:                                                                                   │
│   398 │   │   for attempt in itertools.count():                                                  │
│   399 │   │   │   try:                                                                           │
│ ❱ 400 │   │   │   │   out_code = transform_code_object(code, transform)                          │
│   401 │   │   │   │   orig_code_map[out_code] = code                                             │
│   402 │   │   │   │   break                                                                      │
│   403 │   │   │   except exc.RestartAnalysis:                                                    │
│                                                                                                  │
│ /home/alpha/.local/lib/python3.10/site-packages/torch/_dynamo/bytecode_transformation.py:341 in  │
│ transform_code_object                                                                            │
│                                                                                                  │
│   338 │   instructions = cleaned_instructions(code, safe)                                        │
│   339 │   propagate_line_nums(instructions)                                                      │
│   340 │                                                                                          │
│ ❱ 341 │   transformations(instructions, code_options)                                            │
│   342 │                                                                                          │
│   343 │   fix_vars(instructions, code_options)                                                   │
│   344                                                                                            │
│                                                                                                  │
│ /home/alpha/.local/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py:387 in transform  │
│                                                                                                  │
│   384 │   │   │   export,                                                                        │
│   385 │   │   │   mutated_closure_cell_contents,                                                 │
│   386 │   │   )                                                                                  │
│ ❱ 387 │   │   tracer.run()                                                                       │
│   388 │   │   output = tracer.output                                                             │
│   389 │   │   assert output is not None                                                          │
│   390 │   │   assert output.output_instructions                                                  │
│                                                                                                  │
│ /home/alpha/.local/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py:1692 in run    │
│                                                                                                  │
│   1689 │                                                                                         │
│   1690 │   def run(self):                                                                        │
│   1691 │   │   _step_logger()(logging.INFO, f"torchdynamo start tracing {self.f_code.co_name}")  │
│ ❱ 1692 │   │   super().run()                                                                     │
│   1693 │                                                                                         │
│   1694 │   def match_nested_cell(self, name, cell):                                              │
│   1695 │   │   """Match a cell in this method to one in a function we are inlining"""            │
│                                                                                                  │
│ /home/alpha/.local/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py:538 in run     │
│                                                                                                  │
│    535 │   │   │   while (                                                                       │
│    536 │   │   │   │   self.instruction_pointer is not None                                      │
│    537 │   │   │   │   and not self.output.should_exit                                           │
│ ❱  538 │   │   │   │   and self.step()                                                           │
│    539 │   │   │   ):                                                                            │
│    540 │   │   │   │   pass                                                                      │
│    541 │   │   except BackendCompilerFailed:                                                     │
│                                                                                                  │
│ /home/alpha/.local/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py:501 in step    │
│                                                                                                  │
│    498 │   │   try:                                                                              │
│    499 │   │   │   if not hasattr(self, inst.opname):                                            │
│    500 │   │   │   │   unimplemented(f"missing: {inst.opname}")                                  │
│ ❱  501 │   │   │   getattr(self, inst.opname)(inst)                                              │
│    502 │   │   │                                                                                 │
│    503 │   │   │   return inst.opname != "RETURN_VALUE"                                          │
│    504 │   │   except BackendCompilerFailed:                                                     │
│                                                                                                  │
│ /home/alpha/.local/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py:1758 in        │
│ RETURN_VALUE                                                                                     │
│                                                                                                  │
│   1755 │   │   │   f"torchdynamo done tracing {self.f_code.co_name} (RETURN_VALUE)",             │
│   1756 │   │   )                                                                                 │
│   1757 │   │   log.debug("RETURN_VALUE triggered compile")                                       │
│ ❱ 1758 │   │   self.output.compile_subgraph(self)                                                │
│   1759 │   │   self.output.add_output_instructions([create_instruction("RETURN_VALUE")])         │
│   1760                                                                                           │
│   1761                                                                                           │
│                                                                                                  │
│ /home/alpha/.local/lib/python3.10/site-packages/torch/_dynamo/output_graph.py:551 in             │
│ compile_subgraph                                                                                 │
│                                                                                                  │
│   548 │   │   │   output = []                                                                    │
│   549 │   │   │   if count_calls(self.graph) != 0 or len(pass2.graph_outputs) != 0:              │
│   550 │   │   │   │   output.extend(                                                             │
│ ❱ 551 │   │   │   │   │   self.compile_and_call_fx_graph(tx, pass2.graph_output_vars(), root)    │
│   552 │   │   │   │   )                                                                          │
│   553 │   │   │   │                                                                              │
│   554 │   │   │   │   if len(pass2.graph_outputs) != 0:                                          │
│                                                                                                  │
│ /home/alpha/.local/lib/python3.10/site-packages/torch/_dynamo/output_graph.py:598 in             │
│ compile_and_call_fx_graph                                                                        │
│                                                                                                  │
│   595 │   │                                                                                      │
│   596 │   │   assert_no_fake_params_or_buffers(gm)                                               │
│   597 │   │   with tracing(self.tracing_context):                                                │
│ ❱ 598 │   │   │   compiled_fn = self.call_user_compiler(gm)                                      │
│   599 │   │   compiled_fn = disable(compiled_fn)                                                 │
│   600 │   │                                                                                      │
│   601 │   │   counters["stats"]["unique_graphs"] += 1                                            │
│                                                                                                  │
│ /home/alpha/.local/lib/python3.10/site-packages/torch/_dynamo/output_graph.py:679 in             │
│ call_user_compiler                                                                               │
│                                                                                                  │
│   676 │   │   │   assert callable(compiled_fn), "compiler_fn did not return callable"            │
│   677 │   │   except Exception as e:                                                             │
│   678 │   │   │   compiled_fn = gm.forward                                                       │
│ ❱ 679 │   │   │   raise BackendCompilerFailed(self.compiler_fn, e) from e                        │
│   680 │   │   return compiled_fn                                                                 │
│   681 │                                                                                          │
│   682 │   def fake_example_inputs(self) -> List[torch.Tensor]:                                   │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
BackendCompilerFailed: compile_fx raised Exception: Please convert all Tensors to FakeTensors first or
instantiate FakeTensorMode with 'allow_non_fake_inputs'. Found in aten._to_copy.default(*(Parameter
containing:
tensor([[ 0.0292,  0.0266,  0.0296,  ...,  0.0353, -0.0317, -0.0230],
        [ 0.0112, -0.0135,  0.0291,  ..., -0.0087,  0.0124,  0.0297],
        [-0.0299,  0.0291, -0.0143,  ..., -0.0097,  0.0106, -0.0191],
        [-0.0344, -0.0083,  0.0227,  ...,  0.0093,  0.0345, -0.0343]],
       device='cuda:0', requires_grad=True),), **{'dtype': torch.float16})

While executing %self_text_model_encoder_layers_0_self_attn_q_proj : [#users=1] =
call_module[target=self_text_model_encoder_layers_0_self_attn_q_proj](args =
(%self_text_model_encoder_layers_0_layer_norm1,), kwargs = {})
Original traceback:
  File "/home/alpha/.local/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line
209, in forward
    query_states = self.q_proj(hidden_states) * self.scale
 |   File "/home/alpha/.local/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py",
line 317, in forward
    hidden_states, attn_weights = self.self_attn(
 |   File "/home/alpha/.local/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py",
line 574, in forward
    layer_outputs = encoder_layer(
 |   File "/home/alpha/.local/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py",
line 643, in forward
    encoder_outputs = self.encoder(
 |   File "/home/alpha/.local/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py",
line 722, in forward
    return self.text_model(


Set torch._dynamo.config.verbose=True for more information


You can suppress this exception and fall back to eager by setting:
    torch._dynamo.config.suppress_errors = True

steps:   0%|                                                                    | 0/1600 [00:03<?, ?it/s]
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/alpha/.local/bin/accelerate:8 in <module>                                                  │
│                                                                                                  │
│   5 from accelerate.commands.accelerate_cli import main                                          │
│   6 if __name__ == '__main__':                                                                   │
│   7 │   sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0])                         │
│ ❱ 8 │   sys.exit(main())                                                                         │
│   9                                                                                              │
│                                                                                                  │
│ /home/alpha/.local/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py:45 in main │
│                                                                                                  │
│   42 │   │   exit(1)                                                                             │
│   43 │                                                                                           │
│   44 │   # Run                                                                                   │
│ ❱ 45 │   args.func(args)                                                                         │
│   46                                                                                             │
│   47                                                                                             │
│   48 if __name__ == "__main__":                                                                  │
│                                                                                                  │
│ /home/alpha/.local/lib/python3.10/site-packages/accelerate/commands/launch.py:1104 in            │
│ launch_command                                                                                   │
│                                                                                                  │
│   1101 │   elif defaults is not None and defaults.compute_environment == ComputeEnvironment.AMA  │
│   1102 │   │   sagemaker_launcher(defaults, args)                                                │
│   1103 │   else:                                                                                 │
│ ❱ 1104 │   │   simple_launcher(args)                                                             │
│   1105                                                                                           │
│   1106                                                                                           │
│   1107 def main():                                                                               │
│                                                                                                  │
│ /home/alpha/.local/lib/python3.10/site-packages/accelerate/commands/launch.py:567 in             │
│ simple_launcher                                                                                  │
│                                                                                                  │
│    564 │   process = subprocess.Popen(cmd, env=current_env)                                      │
│    565 │   process.wait()                                                                        │
│    566 │   if process.returncode != 0:                                                           │
│ ❱  567 │   │   raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)       │
│    568                                                                                           │
│    569                                                                                           │
│    570 def multi_gpu_launcher(args):                                                             │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
CalledProcessError: Command '['/usr/bin/python', 'train_network.py',
'--pretrained_model_name_or_path=/home/alpha/Storage/AIModels/Stable-diffusion/panatomy05full_0.7-AIModel
s_Anything-V3.0-pruned-fp16_0.3-Weighted_sum-merged.ckpt',
'--train_data_dir=/home/alpha/Storage/TrainingData/test/training_data',
'--output_dir=/home/alpha/Storage/TrainingOutput/test/', '--prior_loss_weight=1.0',
'--resolution=512,512', '--train_batch_size=1', '--learning_rate=1e-5', '--max_train_steps=1600',
'--use_8bit_adam', '--xformers', '--mixed_precision=fp16', '--cache_latents', '--save_precision=fp16',
'--save_model_as=safetensors', '--clip_skip=2', '--network_module=networks.lora']' returned non-zero exit
status 1.

(While the same arguments work with TorchDynamo disabled.

Maybe torch.compile() needs to be added conditionally and manually, instead of automatically with accelerate?

Accelerator acting like my GPU is not present?

accelerator.py not detecting my GPU

File "E:\sd-scripts\train_network.py", line 1453, in
train(args)
File "E:\sd-scripts\train_network.py", line 1017, in train
accelerator = Accelerator(gradient_accumulation_steps=args.gradient_accumulation_steps, mixed_precision=args.mixed_precision,
File "E:\sd-scripts\venv\lib\site-packages\accelerate\accelerator.py", line 355, in init
raise ValueError(err.format(mode="fp16", requirement="a GPU"))
ValueError: fp16 mixed precision requires a GPU

(venv) PS E:\sd-scripts> python
Python 3.10.9 (tags/v3.10.9:1dd9be6, Dec 6 2022, 20:01:21) [MSC v.1934 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.

import torch
torch.cuda.is_available()
True
torch.cuda.device_count()
1
torch.cuda.current_device()
0
torch.cuda.device(0)
<torch.cuda.device object at 0x000002DA22E42F50>
torch.cuda.get_device_name(0)
'NVIDIA GeForce RTX 4090'

DB Warning "accelerate does not support gradient_accumulation_steps when training multiple models"

When running the db training script with gradient accumulation it warns accelerate does not support gradient_accumulation_steps when training multiple models (U-Net and Text Encoder)

But this same warning does not appear when using the finetune script.
Is this an actual issue for both scripts or just the db one?

Also I'm not even training the text encoder, so I'm wondering if I should be concerned at all?

'--max_token_length None' doesn't seem to work

PowerShell 7.3.1
PS D:\git\sd-scripts> .\venv\Scripts\activate
(venv) PS D:\git\sd-scripts> accelerate launch .\train_network.py --max_token_length "None"
usage: train_network.py [-h] [--v2] [--v_parameterization]
                        [--pretrained_model_name_or_path PRETRAINED_MODEL_NAME_OR_PATH]
                        [--train_data_dir TRAIN_DATA_DIR] [--shuffle_caption] [--caption_extension CAPTION_EXTENSION]
                        [--caption_extention CAPTION_EXTENTION] [--keep_tokens KEEP_TOKENS] [--color_aug] [--flip_aug]
                        [--face_crop_aug_range FACE_CROP_AUG_RANGE] [--random_crop] [--debug_dataset]
                        [--resolution RESOLUTION] [--cache_latents] [--enable_bucket]
                        [--min_bucket_reso MIN_BUCKET_RESO] [--max_bucket_reso MAX_BUCKET_RESO]
                        [--reg_data_dir REG_DATA_DIR] [--in_json IN_JSON] [--dataset_repeats DATASET_REPEATS]
                        [--output_dir OUTPUT_DIR] [--output_name OUTPUT_NAME]
                        [--save_precision {None,float,fp16,bf16}] [--save_every_n_epochs SAVE_EVERY_N_EPOCHS]
                        [--save_last_n_epochs SAVE_LAST_N_EPOCHS] [--save_state] [--resume RESUME]
                        [--train_batch_size TRAIN_BATCH_SIZE] [--max_token_length {None,150,225}] [--use_8bit_adam]
                        [--mem_eff_attn] [--xformers] [--vae VAE] [--learning_rate LEARNING_RATE]
                        [--max_train_steps MAX_TRAIN_STEPS] [--seed SEED] [--gradient_checkpointing]
                        [--gradient_accumulation_steps GRADIENT_ACCUMULATION_STEPS] [--mixed_precision {no,fp16,bf16}]
                        [--full_fp16] [--clip_skip CLIP_SKIP] [--logging_dir LOGGING_DIR] [--log_prefix LOG_PREFIX]
                        [--lr_scheduler LR_SCHEDULER] [--lr_warmup_steps LR_WARMUP_STEPS]
                        [--prior_loss_weight PRIOR_LOSS_WEIGHT] [--save_model_as {None,ckpt,pt,safetensors}]
                        [--unet_lr UNET_LR] [--text_encoder_lr TEXT_ENCODER_LR] [--network_weights NETWORK_WEIGHTS]
                        [--network_module NETWORK_MODULE] [--network_dim NETWORK_DIM]
                        [--network_args [NETWORK_ARGS ...]] [--network_train_unet_only]
                        [--network_train_text_encoder_only]
train_network.py: error: argument --max_token_length: invalid int value: 'None'
Traceback (most recent call last):
  File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "D:\git\sd-scripts\venv\Scripts\accelerate.exe\__main__.py", line 7, in <module>
  File "D:\git\sd-scripts\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 45, in main
    args.func(args)
  File "D:\git\sd-scripts\venv\lib\site-packages\accelerate\commands\launch.py", line 1104, in launch_command
    simple_launcher(args)
  File "D:\git\sd-scripts\venv\lib\site-packages\accelerate\commands\launch.py", line 567, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['D:\\git\\sd-scripts\\venv\\Scripts\\python.exe', '.\\train_network.py', '--max_token_length', 'None']' returned non-zero exit status 2.
(venv) PS D:\git\sd-scripts>

Support Karras Pipeline on gen_img_diffusers.py

References

https://github.com/huggingface/diffusers/blob/main/examples/community/README.md#stable-diffusion-with-k-diffusion
https://github.com/crowsonkb/k-diffusion

What does face aug do

I looked at the code and it seem that face aug trains on the faces

what i want to know is if it trains on the faces addtionally or if it only trains on the faces also how does the random crop setting relate to this

Feature Request - Interrupt and Image Generation previews.

Much like the stable diffusion A1111 repo and the other dreambooth scripts out there, I would like to recommend adding the Interrupt function and support for image generation at each epoch or steps.

Features

Generate image preview of all concepts every X steps or Epochs (allows the user to decide when the model is fully trained and prevent over training)
Allow to interrupt / save at current step
Allow save state to support saving at X number of steps (this could be set the same amount as the generate preview)

Toggle between steps / epochs for save states and image generation

I have had my pc crash during training on occasion or automatic windows updates while being away from the pc, so being able to control how often it in steps would be useful for myself cause most of the time the end of a epoch is occasionally too far away in steps to reach before something bad happens.

Question. Webp support

Is there support for webp format for training images? Webp provides a smaller file size and better quality than jpg. And there are no jpeg artifacts that can degrade learning outcomes.

prepare_buckets does not work when image file name contains multiple dots

If the image file name is "aaa.bbb.jpg", the npz file name becomes "aaa.npz". So, if there are "aaa.bbb.jpg" and "aaa.ccc.jpg", npz files are overwritten.

ModuleNotFoundError: No module named 'albumentations'

Traceback (most recent call last):
File "C:\Users\Siddhesh\Desktop\kohya_ss\train_network.py", line 21, in
import albumentations as albu
ModuleNotFoundError: No module named 'albumentations'
Traceback (most recent call last):
File "C:\Users\Siddhesh\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\Siddhesh\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in run_code
exec(code, run_globals)
File "C:\Users\Siddhesh\AppData\Local\Programs\Python\Python310\Scripts\accelerate.exe_main.py", line 7, in
File "C:\Users\Siddhesh\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\commands\accelerate_cli.py", line 45, in main
args.func(args)
File "C:\Users\Siddhesh\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\commands\launch.py", line 1104, in launch_command
simple_launcher(args)
File "C:\Users\Siddhesh\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\commands\launch.py", line 567, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['C:\Users\Siddhesh\AppData\Local\Programs\Python\Python310\python.exe', 'train_network.py', '--cache_latents', '--enable_bucket', '--use_8bit_adam', '--xformers', '--pretrained_model_name_or_path=C:/Users/Siddhesh/Desktop/stable-diffusion-webui/models/Stable-diffusion/v1-5-pruned.ckpt', '--train_data_dir=C:/Users/Siddhesh/Desktop/test\img', '--resolution=512,512', '--output_dir=C:/Users/Siddhesh/Desktop/test\model', '--train_batch_size=1', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--max_train_steps=800', '--use_8bit_adam', '--xformers', '--mixed_precision=fp16', '--save_every_n_epochs=1', '--seed=1234', '--save_precision=fp16', '--logging_dir=C:/Users/Siddhesh/Desktop/test\log', '--network_module=networks.lora', '--text_encoder_lr=1e-06', '--unet_lr=0.0001', '--network_dim=4']' returned non-zero exit status 1.

Cache latents optionaly

It would be great to make caching the image latents optional. It takes 12 hours to convert 100k images with left/right flip enabled on a 3090 :/ (2.6 it/s). If I'm doing quick test runs on 1 or 2 epocs this is not worth the time currently.

If that is too much work, something as simple as improving the speed would be fine as well, as my gpu is never at 100% while processing.

lr_schedulers currently do not take in num_cycles or power parameters

Current version of diffusers.optimization.get_scheduler used does not expose power parameter (for polynomial) or num_cycles (for cosine_with_restarts).

This means, for example:

the current implementation of cosine, and cosine_with_restarts produce the same scheduler as the num_cycles defaults to 1.
the current implementation of polynomial only produces a linear scheduler as the power defaults to 1.

This is fixed in a future implementation of diffusers: huggingface/diffusers@d87cc15#diff-8702f762e46a3b5363085930b0b045de554909d32560864031ca7b12ddd349d5

Posting this as an issue for awareness and also as something to look into in the future if the repo is tested and updated for a later version of diffusers which includes this patch from the above diffusers commit.

Saving training states at different intervals than trained models

Hallo,

i want my training to be resumable, but I don't want to create a state directory every time a model file is created ("save_every_n_epochs" parameter).

Is there currently a way to separate those two jobs?
Like a different parameter for state dir creation "save_state_every_n_epochs"?

Normally for training i would want only the last state to be saved. So setting "save_state_every_n_epochs" to something high like 9999 should save only the last state (basically the same behavior as it is now with models and "save_every_n_epochs" parameter).

If there is currently no way to do it, would you consider implementing it?

P.S.
Also thanks a lot for creating a very fast and uncomplicated way to fine-tune models :)
LoRA training is amazingly fast with it.

[Questions] Is this scripts applied pivotal tuning like cloneofsimo repo?

https://github.com/cloneofsimo/lora
In this repository, they say applied dreambooth, textual inversion and pivotal tuning to lora.

Is Pivotal tuning working in this repo too?

CUDA_SETUP: WARNING!

It's not dreambooth!!! Hallo, i have a problem. I tried to train LoRA with this script https://github.com/derrian-distro/LoRA_Easy_Training_Scripts but got an error: WARNING: No libcudart.so found! Install CUDA or the cudatoolkit package (anaconda)! but I have already downloaded CUDA, I had uninstalled CUDA 12 and downloaded version 11.6 and cuDNN v8.7.0 but it still didn't help. I also have anaconda installed, but maybe I need to enter its address somewhere

CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
C:\Artem\ai\SD-вещи\kohya-ss-sd-scripts\sd-scripts\venv\lib\site-packages\bitsandbytes\cuda_setup\paths.py:27: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {WindowsPath('/usr/local/cuda/lib64')}
warn(
WARNING: No libcudart.so found! Install CUDA or the cudatoolkit package (anaconda)!
CUDA SETUP: Loading binary C:\Artem\ai\SD-вещи\kohya-ss-sd-scripts\sd-scripts\venv\lib\site-packages\bitsandbytes\libbitsandbytes_cpu.so...
Traceback (most recent call last):
File "C:\Artem\ai\SD-вещи\kohya-ss-sd-scripts\sd-scripts\lora_train_popup.py", line 432, in
main()
File "C:\Artem\ai\SD-вещи\kohya-ss-sd-scripts\sd-scripts\lora_train_popup.py", line 197, in main
train_network.train(args)
File "C:\Artem\ai\SD-вещи\kohya-ss-sd-scripts\sd-scripts\train_network.py", line 114, in train
import bitsandbytes as bnb
File "C:\Artem\ai\SD-вещи\kohya-ss-sd-scripts\sd-scripts\venv\lib\site-packages\bitsandbytes_init_.py", line 6, in
from .autograd._functions import (
File "C:\Artem\ai\SD-вещи\kohya-ss-sd-scripts\sd-scripts\venv\lib\site-packages\bitsandbytes\autograd_functions.py", line 5, in
import bitsandbytes.functional as F
File "C:\Artem\ai\SD-вещи\kohya-ss-sd-scripts\sd-scripts\venv\lib\site-packages\bitsandbytes\functional.py", line 13, in
from .cextension import COMPILED_WITH_CUDA, lib
File "C:\Artem\ai\SD-вещи\kohya-ss-sd-scripts\sd-scripts\venv\lib\site-packages\bitsandbytes\cextension.py", line 41, in
lib = CUDALibrary_Singleton.get_instance().lib
File "C:\Artem\ai\SD-вещи\kohya-ss-sd-scripts\sd-scripts\venv\lib\site-packages\bitsandbytes\cextension.py", line 37, in get_instance
cls.instance.initialize()
File "C:\Artem\ai\SD-вещи\kohya-ss-sd-scripts\sd-scripts\venv\lib\site-packages\bitsandbytes\cextension.py", line 31, in initialize
self.lib = ct.cdll.LoadLibrary(binary_path)
File "C:\Users\Admin\AppData\Local\Programs\Python\Python310\lib\ctypes_init.py", line 452, in LoadLibrary
return self.dlltype(name)
File "C:\Users\Admin\AppData\Local\Programs\Python\Python310\lib\ctypes_init.py", line 364, in init
if '/' in name or '\' in name:
TypeError: argument of type 'WindowsPath' is not iterable
Traceback (most recent call last):
File "C:\Users\Admin\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\Admin\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in run_code
exec(code, run_globals)
File "C:\Artem\ai\SD-вещи\kohya-ss-sd-scripts\sd-scripts\venv\Scripts\accelerate.exe_main.py", line 7, in
File "C:\Artem\ai\SD-вещи\kohya-ss-sd-scripts\sd-scripts\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 45, in main
args.func(args)
File "C:\Artem\ai\SD-вещи\kohya-ss-sd-scripts\sd-scripts\venv\lib\site-packages\accelerate\commands\launch.py", line 1104, in launch_command
simple_launcher(args)
File "C:\Artem\ai\SD-вещи\kohya-ss-sd-scripts\sd-scripts\venv\lib\site-packages\accelerate\commands\launch.py", line 567, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['C:\Artem\ai\SD-вещи\kohya-ss-sd-scripts\sd-scripts\venv\Scripts\python.exe', 'lora_train_popup.py']' returned non-zero exit status 1.

Feature Request: conditional dropout

Hi!
Thank you for your great works.

It would be nice if you could support for conditional dropout.
https://github.com/victorchall/EveryDream2trainer/blob/c816e2577379060cd1015b6a1bca5a109049ab6a/doc/ATWEAKING.md#conditional-dropout
https://github.com/victorchall/EveryDream2trainer/blob/98f9a7302d3325354e69bfa87f3851e98c4735e9/data/every_dream.py#L145-L156

Enable Training on 6GB Cards... with DeepSpeed?

I am trying to squeeze training onto my 6GB laptop RTX 2060, and cant quite manage it with the "low memory" config:

accelerate launch --num_cpu_threads_per_process 8 train_db.py \
--pretrained_model_name_or_path="/home/alpha/Storage/AIModels/Stable-diffusion/panatomy05full_0.7-AIModels_Anything-V3.0-pruned-fp16_0.3-Weighted_sum-merged.ckpt" \
--train_data_dir="/home/alpha/Storage/TrainingData/test/training_data" \
--output_dir="/home/alpha/Storage/TrainingOutput/test/" \
--prior_loss_weight=1.0 \
--resolution=512 \
--train_batch_size=1 \
--learning_rate=1e-6 \
--max_train_steps=1600 \
--use_8bit_adam \
--xformers \
--mixed_precision="fp16" \
--cache_latents \
--gradient_checkpointing \
--save_precision="fp16" \
--full_fp16 \
--save_model_as="safetensors" \

So, I figured I would investigate Deepspeed cpu offloading with the accelerate config... but I keep running into errors on both the git version and the 0.7.7 release from pypi. Here is an error from the pypi release:

Traceback (most recent call last):
  File "/home/alpha/clone/sd-scripts/train_db.py", line 332, in <module>
    train(args)
  File "/home/alpha/clone/sd-scripts/train_db.py", line 154, in train
    unet, text_encoder, optimizer, train_dataloader, lr_scheduler = accelerator.prepare(
  File "/home/alpha/.local/lib/python3.10/site-packages/accelerate/accelerator.py", line 619, in prepare
    result = self._prepare_deepspeed(*args)
  File "/home/alpha/.local/lib/python3.10/site-packages/accelerate/accelerator.py", line 805, in _prepare_deepspeed
    engine, optimizer, _, lr_scheduler = deepspeed.initialize(**kwargs)
  File "/home/alpha/.local/lib/python3.10/site-packages/deepspeed/__init__.py", line 125, in initialize
    engine = DeepSpeedEngine(args=args,
  File "/home/alpha/.local/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 330, in __init__
    self._configure_optimizer(optimizer, model_parameters)
  File "/home/alpha/.local/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1210, in _configure_optimizer
    self.optimizer = self._configure_zero_optimizer(basic_optimizer)
  File "/home/alpha/.local/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1455, in _configure_zero_optimizer
    optimizer = DeepSpeedZeroOptimizer(
  File "/home/alpha/.local/lib/python3.10/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 532, in __init__
    self._param_slice_mappings = self._create_param_mapping()
  File "/home/alpha/.local/lib/python3.10/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 544, in _create_param_mapping
    lp_name = self.param_names[lp]
KeyError: <exception str() failed>
[2023-01-12 13:13:52,241] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 5398
[2023-01-12 13:13:52,244] [ERROR] [launch.py:324:sigkill_handler] ['/usr/bin/python', '-u', 'train_db.py', '--pretrained_model_name_or_path=/home/alpha/Storage/AIModels/Stable-diffusion/panatomy05full_0.7-AIModels_Anything-V3.0-pruned-fp16_0.3-Weighted_sum-merged.ckpt', '--train_data_dir=/home/alpha/Storage/TrainingData/test/training_data', '--output_dir=/home/alpha/Storage/TrainingOutput/test/', '--prior_loss_weight=1.0', '--resolution=512', '--train_batch_size=1', '--learning_rate=1e-6', '--max_train_steps=1600', '--use_8bit_adam', '--xformers', '--mixed_precision=fp16', '--cache_latents', '--gradient_checkpointing', '--save_precision=fp16', '--full_fp16', '--save_model_as=safetensors'] exits with return code = 1
Traceback (most recent call last):
  File "/home/alpha/.local/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/home/alpha/.local/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 43, in main
    args.func(args)
  File "/home/alpha/.local/lib/python3.10/site-packages/accelerate/commands/launch.py", line 827, in launch_command
    deepspeed_launcher(args)
  File "/home/alpha/.local/lib/python3.10/site-packages/accelerate/commands/launch.py", line 540, in deepspeed_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['deepspeed', '--no_local_rank', '--num_gpus', '1', 'train_db.py', '--pretrained_model_name_or_path=/home/alpha/Storage/AIModels/Stable-diffusion/panatomy05full_0.7-AIModels_Anything-V3.0-pruned-fp16_0.3-Weighted_sum-merged.ckpt', '--train_data_dir=/home/alpha/Storage/TrainingData/test/training_data', '--output_dir=/home/alpha/Storage/TrainingOutput/test/', '--prior_loss_weight=1.0', '--resolution=512', '--train_batch_size=1', '--learning_rate=1e-6', '--max_train_steps=1600', '--use_8bit_adam', '--xformers', '--mixed_precision=fp16', '--cache_latents', '--gradient_checkpointing', '--save_precision=fp16', '--full_fp16', '--save_model_as=safetensors']' returned non-zero exit status 1.

Is there anything in particular that needs to be changed for this repo to support deepspeed? Or maybe there is some other tweak to squeeze LORA onto 6GB?

[Question?] different result from gen_img_diffusers.py and AUTOMATIC1111 web ui

I don't think this is an issue for this repo but I am curious why it show different results on same parameters? Any ideas, inputs?

AUTOMATIC1111 web ui

gen_img_diffusers.py

Here is the prompt:

python gen_img_diffusers.py --outdir ./images_output --xformers --fp16 --max_embeddings_multiples 1 --vae stabilityai/sd-vae-ft-mse --prompt "christmas Award winning beautiful portrait commission of a zwx supermodel with a beautiful hyperdetailed attractive outfit and face wearing a golden red and green winter cozy outfit with red background and white snow falling around. character design by charlie bowater, ross tran, and makoto shinkai, detailed, inked, western comic book art --n ((((ugly)))), (((duplicate))), ((morbid)), ((mutilated)), [out of frame], extra fingers, mutated hands, ((poorly drawn hands)), ((poorly drawn face)), (((mutation))), (((deformed))), ((ugly)), blurry, ((bad anatomy)), (((bad proportions))), ((extra limbs)), cloned face, (((disfigured))), out of frame, ugly, extra limbs, (bad anatomy), gross proportions, (malformed limbs), ((missing arms)), ((missing legs)), (((extra arms))), (((extra legs))), mutated hands, (fused fingers), (too many fingers), (((long neck)))" --ckpt= ./analog-20-supermodel-2800-zwx.ckpt --sampler k_euler_a --steps 80 --scale 8 --images_per_prompt 1 --seed 3771183235

I am using a custom model. You may try the same prompt on your models and see.

PS: This is @aivandroid from Twitter. Nice to see you here 🥳

"OSError: [WinError 1455]" when attempting to start the training.

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Users\NAME\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "C:\Users\NAME\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 125, in _main
    prepare(preparation_data)
  File "C:\Users\NAME\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 236, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "C:\Users\NAME\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 287, in _fixup_main_from_path
    main_content = runpy.run_path(main_path,
  File "C:\Users\NAME\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 289, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "C:\Users\NAME\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 96, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "C:\Users\NAME\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "C:\Users\NAME\sd-scripts\train_network.py", line 8, in <module>
    import torch
  File "C:\Users\NAME\sd-scripts\venv\lib\site-packages\torch\__init__.py", line 129, in <module>
    raise err
OSError: [WinError 1455] The paging file is too small for this operation to complete. Error loading "C:\Users\NAME\sd-scripts\venv\lib\site-packages\torch\lib\cusolver64_11.dll" or one of its dependencies.

Running this on an RTX 2060 with 6GB VRAM. I don't know if specs have an influence.

returned non-zero exit status 2 + no such file or directory ...PC\\AppData\\Local\\Programs\\Python\\Python310\\Scripts\\accelerate.exe\\main.py

I tried running the lora train popup script and got through the question when after trying to run it gave me this error
I think this is about as far as I have ever gotten but I have gotten the "no such file or directory" with the "accelerate.exe\main.py" before so whatever is causing that seems to be the issue. I did pip freeze and got a long list which in the rentry lora training page said was bad so I followed the instructions but I still get the same results even after reinstalling everything in the venv and moving over the bits and bytes files. Honestly at this point I'm stumped. I have tried the kohya-ss gui lora script and still no luck there because I think I had the same issue. Any help would be greatly appreciated. Thanks so much.

(venv) C:\SD-SCRIPTS\sd-scripts>accelerate launch --num_cpu_threads_per_process 11 lora_train_popup.py
usage: lora_train_popup.py [-h] [--v2] [--v_parameterization]
[--pretrained_model_name_or_path PRETRAINED_MODEL_NAME_OR_PATH]
[--train_data_dir TRAIN_DATA_DIR] [--shuffle_caption]
[--caption_extension CAPTION_EXTENSION] [--caption_extention CAPTION_EXTENTION]
[--keep_tokens KEEP_TOKENS] [--color_aug] [--flip_aug]
[--face_crop_aug_range FACE_CROP_AUG_RANGE] [--random_crop] [--debug_dataset]
[--resolution RESOLUTION] [--cache_latents] [--enable_bucket]
[--min_bucket_reso MIN_BUCKET_RESO] [--max_bucket_reso MAX_BUCKET_RESO]
[--reg_data_dir REG_DATA_DIR] [--in_json IN_JSON] [--dataset_repeats DATASET_REPEATS]
[--output_dir OUTPUT_DIR] [--output_name OUTPUT_NAME]
[--save_precision {None,float,fp16,bf16}] [--save_every_n_epochs SAVE_EVERY_N_EPOCHS]
[--save_last_n_epochs SAVE_LAST_N_EPOCHS] [--save_state] [--resume RESUME]
[--train_batch_size TRAIN_BATCH_SIZE] [--max_token_length {None,150,225}] [--use_8bit_adam]
[--mem_eff_attn] [--xformers] [--vae VAE] [--learning_rate LEARNING_RATE]
[--max_train_steps MAX_TRAIN_STEPS] [--max_train_epochs MAX_TRAIN_EPOCHS]
[--max_data_loader_n_workers MAX_DATA_LOADER_N_WORKERS] [--seed SEED]
[--gradient_checkpointing] [--gradient_accumulation_steps GRADIENT_ACCUMULATION_STEPS]
[--mixed_precision {no,fp16,bf16}] [--full_fp16] [--clip_skip CLIP_SKIP]
[--logging_dir LOGGING_DIR] [--log_prefix LOG_PREFIX] [--lr_scheduler LR_SCHEDULER]
[--lr_warmup_steps LR_WARMUP_STEPS] [--prior_loss_weight PRIOR_LOSS_WEIGHT] [--no_metadata]
[--save_model_as {None,ckpt,pt,safetensors}] [--unet_lr UNET_LR]
[--text_encoder_lr TEXT_ENCODER_LR] [--network_weights NETWORK_WEIGHTS]
[--network_module NETWORK_MODULE] [--network_dim NETWORK_DIM]
[--network_args [NETWORK_ARGS ...]] [--network_train_unet_only]
[--network_train_text_encoder_only]
lora_train_popup.py: error: argument --max_token_length: invalid choice: 75 (choose from None, 150, 225)
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ C:\Users\My PC\AppData\Local\Programs\Python\Python310\lib\runpy.py:196 in │
│ _run_module_as_main │
│ │
│ 193 │ main_globals = sys.modules["main"].dict │
│ 194 │ if alter_argv: │
│ 195 │ │ sys.argv[0] = mod_spec.origin │
│ ❱ 196 │ return _run_code(code, main_globals, None, │
│ 197 │ │ │ │ │ "main", mod_spec) │
│ 198 │
│ 199 def run_module(mod_name, init_globals=None, │
│ │
│ C:\Users\My PC\AppData\Local\Programs\Python\Python310\lib\runpy.py:86 in _run_code │
│ │
│ 83 │ │ │ │ │ loader = loader, │
│ 84 │ │ │ │ │ package = pkg_name, │
│ 85 │ │ │ │ │ spec = mod_spec) │
│ ❱ 86 │ exec(code, run_globals) │
│ 87 │ return run_globals │
│ 88 │
│ 89 def run_module_code(code, init_globals=None, │
│ │
│ C:\Users\My PC\AppData\Local\Programs\Python\Python310\Scripts\accelerate.exe_main.py:7 │
│ in │
│ │
│ [Errno 2] No such file or directory: "C:\Users\My │
│ PC\AppData\Local\Programs\Python\Python310\Scripts\accelerate.exe\main.py" │
│ │
│ C:\Users\My │
│ PC\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\commands\accelerate_cli. │
│ py:45 in main │
│ │
│ 42 │ │ exit(1) │
│ 43 │ │
│ 44 │ # Run │
│ ❱ 45 │ args.func(args) │
│ 46 │
│ 47 │
│ 48 if name == "main": │
│ │
│ C:\Users\My │
│ PC\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\commands\launch.py:1104 │
│ in launch_command │
│ │
│ 1101 │ elif defaults is not None and defaults.compute_environment == ComputeEnvironment.AMA │
│ 1102 │ │ sagemaker_launcher(defaults, args) │
│ 1103 │ else: │
│ ❱ 1104 │ │ simple_launcher(args) │
│ 1105 │
│ 1106 │
│ 1107 def main(): │
│ │
│ C:\Users\My │
│ PC\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\commands\launch.py:567 │
│ in simple_launcher │
│ │
│ 564 │ process = subprocess.Popen(cmd, env=current_env) │
│ 565 │ process.wait() │
│ 566 │ if process.returncode != 0: │
│ ❱ 567 │ │ raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) │
│ 568 │
│ 569 │
│ 570 def multi_gpu_launcher(args): │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
CalledProcessError: Command '["C:\Users\My PC\AppData\Local\Programs\Python\Python310\python.exe",
'lora_train_popup.py']' returned non-zero exit status 2.

CUDA Error: no kernel image is available for execution on the device

Hi there,
I am brand new with Neural Network and when I tried to input a test video into a Computer, this is what I got:
nvidia@nvidia-desktop:~$ "/home/nvidia/test.sh"
ci: Using default 'data/ci.txt'
coord: Using default 'data/coord.txt'
Polaris Object Detection
layer filters size input output
0 conv 32 3 x 3 / 1 480 x 352 x 3 -> 480 x 352 x 32 0.292 BFLOPs
1 max 2 x 2 / 2 480 x 352 x 32 -> 240 x 176 x 32
2 conv 64 3 x 3 / 1 240 x 176 x 32 -> 240 x 176 x 64 1.557 BFLOPs
3 max 2 x 2 / 2 240 x 176 x 64 -> 120 x 88 x 64
4 conv 128 3 x 3 / 1 120 x 88 x 64 -> 120 x 88 x 128 1.557 BFLOPs
5 conv 64 1 x 1 / 1 120 x 88 x 128 -> 120 x 88 x 64 0.173 BFLOPs
6 conv 128 3 x 3 / 1 120 x 88 x 64 -> 120 x 88 x 128 1.557 BFLOPs
7 max 2 x 2 / 2 120 x 88 x 128 -> 60 x 44 x 128
8 conv 256 3 x 3 / 1 60 x 44 x 128 -> 60 x 44 x 256 1.557 BFLOPs
9 conv 128 1 x 1 / 1 60 x 44 x 256 -> 60 x 44 x 128 0.173 BFLOPs
10 conv 256 3 x 3 / 1 60 x 44 x 128 -> 60 x 44 x 256 1.557 BFLOPs
11 max 2 x 2 / 2 60 x 44 x 256 -> 30 x 22 x 256
12 conv 512 3 x 3 / 1 30 x 22 x 256 -> 30 x 22 x 512 1.557 BFLOPs
13 conv 256 1 x 1 / 1 30 x 22 x 512 -> 30 x 22 x 256 0.173 BFLOPs
14 conv 512 3 x 3 / 1 30 x 22 x 256 -> 30 x 22 x 512 1.557 BFLOPs
15 conv 256 1 x 1 / 1 30 x 22 x 512 -> 30 x 22 x 256 0.173 BFLOPs
16 conv 512 3 x 3 / 1 30 x 22 x 256 -> 30 x 22 x 512 1.557 BFLOPs
17 max 2 x 2 / 2 30 x 22 x 512 -> 15 x 11 x 512
18 conv 1024 3 x 3 / 1 15 x 11 x 512 -> 15 x 11 x1024 1.557 BFLOPs
19 conv 512 1 x 1 / 1 15 x 11 x1024 -> 15 x 11 x 512 0.173 BFLOPs
20 conv 1024 3 x 3 / 1 15 x 11 x 512 -> 15 x 11 x1024 1.557 BFLOPs
21 conv 512 1 x 1 / 1 15 x 11 x1024 -> 15 x 11 x 512 0.173 BFLOPs
22 conv 1024 3 x 3 / 1 15 x 11 x 512 -> 15 x 11 x1024 1.557 BFLOPs
23 conv 1024 3 x 3 / 1 15 x 11 x1024 -> 15 x 11 x1024 3.114 BFLOPs
24 conv 1024 3 x 3 / 1 15 x 11 x1024 -> 15 x 11 x1024 3.114 BFLOPs
25 route 16
26 conv 64 1 x 1 / 1 30 x 22 x 512 -> 30 x 22 x 64 0.043 BFLOPs
27 reorg / 2 30 x 22 x 64 -> 15 x 11 x 256
28 route 27 24
29 conv 1024 3 x 3 / 1 15 x 11 x1280 -> 15 x 11 x1024 3.893 BFLOPs
30 conv 35 1 x 1 / 1 15 x 11 x1024 -> 15 x 11 x 35 0.012 BFLOPs
31 detection
mask_scale: Using default '1.000000'
CUDA Error: no kernel image is available for execution on the device
polarisnnet: ./src/cuda.c:36: check_error: Assertion `0' failed.
/home/nvidia/test.sh: line 4: 4502 Aborted (core dumped) ./polarisnnet detector line data/test/t.data data/test/t.cfg data/test/t.weights data/test/t.mp4
Do you guys know what happened?

Support custom output model name

Currently it is always last.ckpt.
It is good if user can specify the model name, for example with --output_name option.

[Question] Is it necessary to prepare cache latents (regenerate '.npz' file) when every resuming training with 'fine_tune.py' ?

Same as title. Because it will take too much time when dealing with a huge dataset composed of more than 10k images.

No such file or directory: venv\\Scripts\\accelerate.exe\\main.py

`╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│                                                                                                  │
│ G:\AI\sd-scripts\lora_train_popup.py:432 in <module>                                             │
│                                                                                                  │
│   429                                                                                            │
│   430                                                                                            │
│   431 if __name__ == "__main__":                                                                 │
│ ❱ 432 │   main()                                                                                 │
│   433                                                                                            │
│ G:\AI\sd-scripts\lora_train_popup.py:189 in main                                                 │
│                                                                                                  │
│   186 │   arg_class = ArgStore()                                                                 │
│   187 │   ret = mb.askyesno(message="Do you want to load a json config file?")                   │
│   188 │   if ret:                                                                                │
│ ❱ 189 │   │   load_json(ask_file("json to load from", {"json"}), arg_class)                      │
│   190 │   │   arg_class = ask_elements_trunc(arg_class)                                          │
│   191 │   else:                                                                                  │
│   192 │   │   arg_class = ask_elements(arg_class)                                                │
│                                                                                                  │
│ G:\AI\sd-scripts\lora_train_popup.py:403 in load_json                                            │
│                                                                                                  │
│   400 │   with open(path) as f:                                                                  │
│   401 │   │   json_obj = json.loads(f.read())                                                    │
│   402 │   print("json loaded, setting variables...")                                             │
│ ❱ 403 │   obj.net_dim = json_obj["net_dim"]                                                      │
│   404 │   obj.scheduler = json_obj["scheduler"]                                                  │
│   405 │   obj.warmup_lr_ratio = json_obj["warmup_lr_ratio"]                                      │
│   406 │   obj.learning_rate = json_obj["learning_rate"]                                          │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
KeyError: 'net_dim'
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│                                                                                                  │
│ C:\Users\satya\AppData\Local\Programs\Python\Python310\lib\runpy.py:196 in _run_module_as_main   │
│                                                                                                  │
│   193 │   main_globals = sys.modules["__main__"].__dict__                                        │
│   194 │   if alter_argv:                                                                         │
│   195 │   │   sys.argv[0] = mod_spec.origin                                                      │
│ ❱ 196 │   return _run_code(code, main_globals, None,                                             │
│   197 │   │   │   │   │    "__main__", mod_spec)                                                 │
│   198                                                                                            │
│   199 def run_module(mod_name, init_globals=None,                                                │
│ C:\Users\satya\AppData\Local\Programs\Python\Python310\lib\runpy.py:86 in _run_code              │
│                                                                                                  │
│    83 │   │   │   │   │      __loader__ = loader,                                                │
│    84 │   │   │   │   │      __package__ = pkg_name,                                             │
│    85 │   │   │   │   │      __spec__ = mod_spec)                                                │
│ ❱  86 │   exec(code, run_globals)                                                                │
│    87 │   return run_globals                                                                     │
│    88                                                                                            │
│    89 def _run_module_code(code, init_globals=None,                                              │
│                                                                                                  │
│ G:\AI\sd-scripts\venv\Scripts\accelerate.exe\__main__.py:7 in <module>                           │
│                                                                                                  │
│ [Errno 2] No such file or directory:                                                             │
│ 'G:\\AI\\sd-scripts\\venv\\Scripts\\accelerate.exe\\__main__.py'                                 │
│                                                                                                  │
│ G:\AI\sd-scripts\venv\lib\site-packages\accelerate\commands\accelerate_cli.py:45 in main         │
│                                                                                                  │
│   42 │   │   exit(1)                                                                             │
│   43 │                                                                                           │
│   44 │   # Run                                                                                   │
│ ❱ 45 │   args.func(args)                                                                         │
│   46                                                                                             │
│   47                                                                                             │
│   48 if __name__ == "__main__":                                                                  │
│                                                                                                  │
│ G:\AI\sd-scripts\venv\lib\site-packages\accelerate\commands\launch.py:1104 in launch_command     │
│                                                                                                  │
│   1101 │   elif defaults is not None and defaults.compute_environment == ComputeEnvironment.AMA  │
│   1102 │   │   sagemaker_launcher(defaults, args)                                                │
│   1103 │   else:                                                                                 │
│ ❱ 1104 │   │   simple_launcher(args)                                                             │
│   1105                                                                                           │
│   1106                                                                                           │
│   1107 def main():                                                                               │
│                                                                                                  │
│ G:\AI\sd-scripts\venv\lib\site-packages\accelerate\commands\launch.py:567 in simple_launcher     │
│                                                                                                  │
│    564 │   process = subprocess.Popen(cmd, env=current_env)                                      │
│    565 │   process.wait()                                                                        │
│    566 │   if process.returncode != 0:                                                           │
│ ❱  567 │   │   raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)       │
│    568                                                                                           │
│    569                                                                                           │
│    570 def multi_gpu_launcher(args):                                                             │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
CalledProcessError: Command '['G:\\AI\\sd-scripts\\venv\\Scripts\\python.exe', 'lora_train_popup.py']' returned non-zero`

I can't start training

Traceback (most recent call last):
File "G:\kohya\sd-scripts\train_db.py", line 1229, in
train(args)
File "G:\kohya\sd-scripts\train_db.py", line 1043, in train
encoder_hidden_states = text_encoder.text_model.final_layer_norm(encoder_hidden_states)
File "G:\kohya\sd-scripts\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "G:\kohya\sd-scripts\venv\lib\site-packages\torch\nn\modules\normalization.py", line 189, in forward
return F.layer_norm(
File "G:\kohya\sd-scripts\venv\lib\site-packages\torch\nn\functional.py", line 2503, in layer_norm
return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: expected scalar type Float but found Half
steps: 0%| | 0/3000 [00:34<?, ?it/s]
Traceback (most recent call last):
File "C:\Users\username\miniconda3\envs\kohya\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\username\miniconda3\envs\kohya\lib\runpy.py", line 86, in run_code
exec(code, run_globals)
File "G:\kohya\sd-scripts\venv\Scripts\accelerate.exe_main.py", line 7, in
File "G:\kohya\sd-scripts\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 45, in main
args.func(args)
File "G:\kohya\sd-scripts\venv\lib\site-packages\accelerate\commands\launch.py", line 1104, in launch_command
simple_launcher(args)
File "G:\kohya\sd-scripts\venv\lib\site-packages\accelerate\commands\launch.py", line 567, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['G:\kohya\sd-scripts\venv\Scripts\python.exe', 'train_db.py', '--pretrained_model_name_or_path=G:\stable-diffusion-webui\models\Stable-diffusion\Anything-V3.0-pruned.ckpt', '--train_data_dir=G:\kohya\dataset\train', '--output_dir=G:\kohya\results', '--prior_loss_weight=1.0', '--resolution=512', '--train_batch_size=1', '--learning_rate=1e-6', '--max_train_steps=3000', '--use_8bit_adam', '--xformers', '--mixed_precision=fp16', '--cache_latents', '--caption_extention=.txt', '--clip_skip=2', '--full_fp16', '--gradient_checkpointing']' returned non-zero exit status 1.

i'm using windows10 and 3080 10gb.
may using conda cause this problem?

thanks in advance.

No data found. Please verify arguments

accelerate launch --num_cpu_threads_per_process=16 "train_network.py" --pretrained_model_name_or_path="C:/Programs/stable-diffusion-webui/models/Stable-diffusion/sd-v1-4.ckpt" --train_data_dir="C:/Users/pdept/Desktop/AI pics/training/512x512" --resolution=512,512 --output_dir="C:/Users/pdept/Desktop/AI pics/training/Nowy folder" --use_8bit_adam --xformers --logging_dir="" --network_module=networks.lora --text_encoder_lr=5e-5 --unet_lr=1e-3 --network_dim=8 --output_name="last" --learning_rate="1e-5" --lr_scheduler="cosine" --train_batch_size="1" --save_every_n_epochs="1" --mixed_precision="fp16" --save_precision="fp16" --seed="1234" --cache_latents --max_data_loader_n_workers="1" --gradient_checkpointing --xformers --use_8bit_adam
prepare tokenizer
Use DreamBooth method.
prepare train images.
0 train images with repeating.
loading image sizes.
0it [00:00, ?it/s]
prepare dataset
No data found. Please verify arguments / 画像がありません。引数指定を確認してください

"The paging file is too small for this operation to complete."

This is the error i get while trying to train with the script. Is my 1070ti with 8GB Vram the issue or did i mess up applying the script or a dependency issue?

steps:   0%|                                                                                    | 0/25 [00:00<?, ?it/s]epoch 1/1
Traceback (most recent call last):
  File "C:\Users\...\sd-scripts\venv\lib\site-packages\tensorflow\python\pywrap_tensorflow.py", line 62, in <module>
    from tensorflow.python._pywrap_tensorflow_internal import *
ImportError: DLL load failed while importing _pywrap_tensorflow_internal: The paging file is too small for this operation to complete.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Users\...\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "C:\Users\...\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 125, in _main
    prepare(preparation_data)
  File "C:\Users\...\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 236, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "C:\Users\...\AppData\Local\Programs\Python\Python310\lib\multiprocessing\spawn.py", line 287, in _fixup_main_from_path
    main_content = runpy.run_path(main_path,
  File "C:\Users\...\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 289, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "C:\Users\...\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 96, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "C:\Users\...\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "C:\Users\...\sd-scripts\lora_train_popup.py", line 8, in <module>
    import train_network
  File "C:\Users\...\sd-scripts\train_network.py", line 9, in <module>
    from accelerate.utils import set_seed
  File "C:\Users\...\sd-scripts\venv\lib\site-packages\accelerate\__init__.py", line 7, in <module>
    from .accelerator import Accelerator
  File "C:\Users\...\sd-scripts\venv\lib\site-packages\accelerate\accelerator.py", line 27, in <module>
    from .checkpointing import load_accelerator_state, load_custom_state, save_accelerator_state, save_custom_state
  File "C:\Users\...\sd-scripts\venv\lib\site-packages\accelerate\checkpointing.py", line 24, in <module>
    from .utils import (
  File "C:\Users\...\sd-scripts\venv\lib\site-packages\accelerate\utils\__init__.py", line 103, in <module>
    from .megatron_lm import (
  File "C:\Users\...\sd-scripts\venv\lib\site-packages\accelerate\utils\megatron_lm.py", line 32, in <module>
    from transformers.modeling_outputs import (
  File "C:\Users\...\sd-scripts\venv\lib\site-packages\transformers\__init__.py", line 30, in <module>
    from . import dependency_versions_check
  File "C:\Users\...\sd-scripts\venv\lib\site-packages\transformers\dependency_versions_check.py", line 17, in <module>
    from .utils.versions import require_version, require_version_core
  File "C:\Users\...\sd-scripts\venv\lib\site-packages\transformers\utils\__init__.py", line 34, in <module>
    from .generic import (
  File "C:\Users\...\sd-scripts\venv\lib\site-packages\transformers\utils\generic.py", line 33, in <module>
    import tensorflow as tf
  File "C:\Users\...\sd-scripts\venv\lib\site-packages\tensorflow\__init__.py", line 37, in <module>
    from tensorflow.python.tools import module_util as _module_util
  File "C:\Users\...\sd-scripts\venv\lib\site-packages\tensorflow\python\__init__.py", line 36, in <module>
    from tensorflow.python import pywrap_tensorflow as _pywrap_tensorflow
  File "C:\Users\...\sd-scripts\venv\lib\site-packages\tensorflow\python\pywrap_tensorflow.py", line 77, in <module>
    raise ImportError(
ImportError: Traceback (most recent call last):
  File "C:\Users\...\sd-scripts\venv\lib\site-packages\tensorflow\python\pywrap_tensorflow.py", line 62, in <module>
    from tensorflow.python._pywrap_tensorflow_internal import *
ImportError: DLL load failed while importing _pywrap_tensorflow_internal: The paging file is too small for this operation to complete.


Failed to load the native TensorFlow runtime.
See https://www.tensorflow.org/install/errors for some common causes and solutions.
If you need help, create an issue at https://github.com/tensorflow/tensorflow/issues and include the entire stack trace above this error message.

Feature Request - Flash Attention

Hi!

Would be great to see some support for flash-attention (flash-attn in pip) as you already support xformers.
https://github.com/HazyResearch/flash-attention

My understanding is from using it on some other projects it can lower the vRAM requirements a little lower than xformers.
For example the sd_dreambooth extension supports flash attention (d8ahazard/sd_dreambooth_extension#283)
Would be useful for people who are unable to get xformers to work at all for training. (for example I can use it to generate images with out issues, but when training with xformers CUDA errors out)

Thanks!

"--v2" for prepare bucket latents not used?

I see that the arg --v2 is optional, but it does not get used at all in "prepare bucket latents.py". Might want to remove it if its not going to be used.

LORA caption training: extremely long pauses between epochs

For some reason there is a large delay when epochs change, making training much slower, what could cause this?
My settings:
accelerate launch --num_cpu_threads_per_process 10 train_network.py --pretrained_model_name_or_path=B:\AIimages\stable-diffusion-webui\models\Stable-diffusion\model.ckpt --train_data_dir=B:\AIimages\training\data --output_dir=B:\train\out\ --in_json=B:\AIimages\training\data\meta_lat.json --resolution=512,512 --prior_loss_weight=1.0 --train_batch_size=4 --learning_rate=1e-3 --max_train_steps=15000 --use_8bit_adam --xformers --gradient_checkpointing --mixed_precision=fp16 --save_every_n_epochs=10 --network_module=networks.lora --shuffle_caption --unet_lr=3e-4 --text_encoder_lr=3e-5 --lr_scheduler=constant --save_model_as=safetensors --seed=115

Winerror 2 The system cannot find the file specified

Anytime I try and start a training session I get hit with the blow error. I have no idea what file it's not finding. I tried to follow the installation instructions to the letter but I'm very new to all this so it's pretty possible I messed something up. Thanks for the help.

max_train_steps = 0
stop_text_encoder_training = 0
lr_warmup_steps = 0
accelerate launch --num_cpu_threads_per_process=16 "train_db.py" --cache_latents --enable_bucket --use_8bit_adam --xformers --pretrained_model_name_or_path=E:/Ai/stable-diffusion-webui/models/Stable-diffusion/NAImodel.ckpt --train_data_dir="C:/Users/user/kohya_ss/PQ2/image" --resolution=512,512 --output_dir=C:/Users/user/kohya_ss/PQ2/model --train_batch_size=1 --learning_rate=1e-06 --lr_scheduler=constant --lr_warmup_steps=0 --max_train_steps=0 --use_8bit_adam --xformers --mixed_precision=fp16 --save_every_n_epochs=1 --seed=1234 --save_precision=fp16 --logging_dir=C:/Users/user/kohya_ss/PQ2/log --caption_extention=
Traceback (most recent call last):
File "C:\Users\user\kohya_ss\venv\lib\site-packages\gradio\routes.py", line 321, in run_predict
output = await app.blocks.process_api(
File "C:\Users\user\kohya_ss\venv\lib\site-packages\gradio\blocks.py", line 1015, in process_api
result = await self.call_function(fn_index, inputs, iterator, request)
File "C:\Users\user\kohya_ss\venv\lib\site-packages\gradio\blocks.py", line 856, in call_function
prediction = await anyio.to_thread.run_sync(
File "C:\Users\user\kohya_ss\venv\lib\site-packages\anyio\to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "C:\Users\user\kohya_ss\venv\lib\site-packages\anyio_backends_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "C:\Users\user\kohya_ss\venv\lib\site-packages\anyio_backends_asyncio.py", line 867, in run
result = context.run(func, *args)
File "C:\Users\user\kohya_ss\dreambooth_gui.py", line 413, in train_model
subprocess.run(run_cmd)
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.2544.0_x64__qbz5n2kfra8p0\lib\subprocess.py", line 503, in run
with Popen(*popenargs, **kwargs) as process:
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.2544.0_x64__qbz5n2kfra8p0\lib\subprocess.py", line 971, in init
self._execute_child(args, executable, preexec_fn, close_fds,
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.10_3.10.2544.0_x64__qbz5n2kfra8p0\lib\subprocess.py", line 1440, in _execute_child
hp, ht, pid, tid = _winapi.CreateProcess(executable, args,

JPEG not glob'ed in merge_dd_tags_to_metadata

You probably intended to add .jpeg format to merge_dd_tags_to_metadata's glob since I see it now exists in train_util.py

Feature: Print() where captions are being sourced from for clarity

Hey again. Today I realized that captions can be sourced from 3 different locations, but we can't tell where from the training log.

metadata file
directory name
captions file

There is currently no distinction in the logs which source the captions are coming from. It would be great to have a simple print line that states this clearly, to make sure you are training on the correct captions.

Ex: Earlier I was training LoRA Dreambooth and was accidentally using directory name captions instead of captions file, but had no idea until training was done.

no kernel image is available for execution on the device

Error no kernel image is available for execution on the device at line 89 in file D:\ai\tool\bitsandbytes\csrc\ops.cu
Traceback (most recent call last):
File "C:\Users\Siddhesh\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\Siddhesh\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in run_code
exec(code, run_globals)
File "C:\Users\Siddhesh\Desktop\kohya_ss\venv\Scripts\accelerate.exe_main.py", line 7, in
File "C:\Users\Siddhesh\Desktop\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 45, in main
args.func(args)
File "C:\Users\Siddhesh\Desktop\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1104, in launch_command
simple_launcher(args)
File "C:\Users\Siddhesh\Desktop\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 567, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['C:\Users\Siddhesh\Desktop\kohya_ss\venv\Scripts\python.exe', 'train_network.py', '--cache_latents', '--enable_bucket', '--use_8bit_adam', '--xformers', '--pretrained_model_name_or_path=C:/Users/Siddhesh/Desktop/stable-diffusion-webui/models/Stable-diffusion/v1-5-pruned.ckpt', '--train_data_dir=C:/Users/Siddhesh/Desktop/test\img', '--resolution=512,512', '--output_dir=C:/Users/Siddhesh/Desktop/test\model', '--train_batch_size=1', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--max_train_steps=400', '--use_8bit_adam', '--xformers', '--mixed_precision=fp16', '--save_every_n_epochs=1', '--seed=1234', '--save_precision=fp16', '--logging_dir=C:/Users/Siddhesh/Desktop/test\log', '--network_module=networks.lora', '--text_encoder_lr=1e-06', '--unet_lr=0.0001', '--network_dim=4']' returned non-zero exit status 1.

(venv) PS C:\Users\Siddhesh\Desktop\kohya_ss> python
Python 3.10.9 (tags/v3.10.9:1dd9be6, Dec 6 2022, 20:01:21) [MSC v.1934 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.

import torch
import sys
print('A', sys.version)
A 3.10.9 (tags/v3.10.9:1dd9be6, Dec 6 2022, 20:01:21) [MSC v.1934 64 bit (AMD64)]
print('B', torch.version)
B 1.12.1+cu116
print('C', torch.cuda.is_available())
C True
print('D', torch.backends.cudnn.enabled)
D True
device = torch.device('cuda')
print('E', torch.cuda.get_device_properties(device))
E _CudaDeviceProperties(name='NVIDIA GeForce GTX 1060 6GB', major=6, minor=1, total_memory=6143MB, multi_processor_count=10)
print('F', torch.tensor([1.0, 2.0]).cuda())
F tensor([1., 2.], device='cuda:0')

EDIT: Error no kernel image is available for execution on the device at line 89 in file D:\ai\tool\bitsandbytes\csrc\ops.cu

^^ The bold part must be the error because I have no D:\ drive on my system!

text encoderの学習が途中で止まらない

sd-scripts/train_db.py

Lines 1011 to 1013 in d9bb4aa

 if stop_text_encoder_training: 

 print(f"stop text encoder training at step {global_step}") 

 text_encoder.train(False)

train(False)は推論に使わないdropoutを無効化するなどの機能で、パラメータの更新を止めるものではないようです。
https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.eval

勾配計算を無効化するには、
text_encoder.requires_grad_(False)
でできますが、勾配が0になるだけでAdamの残ったモーメント分は更新されてしまうらしいです。

止まっていないことはtext_encoder/pytorch_model.binのhash値で確認しました。

Support custom embeddings (Textual Inversion) in gen_img_diffusers

The implementation of Textual Inversion in Diffusers is different from CompVis. So, it might be required to implement that from scratch.

Set initial epoch number when resuming

Currently the epoch starts from 1 when resuming. It is better to be able to set arbitrary number by argument.
(Or automatically taken from the file name.)

Add "repeat" feature for fine tuning

Each concept can have a different repeat in DreamBooth method. But metadata .json (fine tuning method) does not have the feature.

One idea is that the folder name for fine tuning can have repeats like <repeat>_<concept>. concept is ignored. If the repeat is not provided, it become 1. And merge_captions or merge_dd_tags script will append the repeat value to json.

xformers-0.0.14.dev0-cp310-cp310-win_amd64.whl is not a supported wheel on this platform

Following the install instruction then copied and pasted the line for xformers and BOOM, YET that is the one Automatic1111 is using.

TypeError: 'type' object is not subscriptable

An error called "TypeError: 'type' object is not subscriptable" occurs when using 'train_network.py' to finetune model.

The whole message is:

import network module: networks.lora
Traceback (most recent call last):
File "/root/autodl-tmp/sd-scripts/train_network.py", line 1455, in
train(args)
File "/root/autodl-tmp/sd-scripts/train_network.py", line 1092, in train
network = network_module.create_network(1.0, args.network_dim, vae, text_encoder, unet, **net_kwargs)
File "/root/autodl-tmp/sd-scripts/networks/lora.py", line 50, in create_network
network = LoRANetwork(text_encoder, unet, multiplier=multiplier, lora_dim=network_dim)
File "/root/autodl-tmp/sd-scripts/networks/lora.py", line 66, in init
def create_modules(prefix, root_module: torch.nn.Module, target_replace_modules) -> list[LoRAModule]:
TypeError: 'type' object is not subscriptable
Traceback (most recent call last):
File "/root/miniconda3/bin/accelerate", line 8, in
sys.exit(main())
File "/root/miniconda3/lib/python3.8/site-packages/accelerate/commands/accelerate_cli.py", line 45, in main
args.func(args)
File "/root/miniconda3/lib/python3.8/site-packages/accelerate/commands/launch.py", line 1104, in launch_command
simple_launcher(args)
File "/root/miniconda3/lib/python3.8/site-packages/accelerate/commands/launch.py", line 567, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/root/miniconda3/bin/python', '/root/autodl-tmp/sd-scripts/train_network.py', '--pretrained_model_name_or_path=/root/autodl-tmp/finalmodel.ckpt', '--in_json=/root/autodl-tmp/liangxing-lat.json', '--shuffle_caption', '--keep_tokens=1', '--train_data_dir=/root/autodl-tmp/liangxing', '--dataset_repeats=10', '--output_dir=/root/autodl-tmp/liangxing-lora-test', '--save_precision=float', '--save_model_as=ckpt', '--save_every_n_epochs=1', '--save_state', '--color_aug', '--flip_aug', '--resolution=640,640', '--train_batch_size=4', '--max_token_length=225', '--learning_rate=1e-4', '--prior_loss_weight=1.0', '--seed=2998', '--unet_lr=1e-4', '--text_encoder_lr=1e-6', '--max_train_steps=8955', '--gradient_checkpointing', '--gradient_accumulation_steps=2', '--mixed_precision=no', '--clip_skip=2', '--logging_dir=logs', '--lr_scheduler=polynomial', '--lr_warmup_steps=450', '--network_module=networks.lora']' returned non-zero exit status 1.

It seems to be related to the 'networks.lora' module. How to fix this ? Btw, the '--network_module' cannot be set as 'None', otherwise an another error will appear:

import network module: None
Traceback (most recent call last):
File "/root/autodl-tmp/sd-scripts/train_network.py", line 1455, in
train(args)
File "/root/autodl-tmp/sd-scripts/train_network.py", line 1084, in train
network_module = importlib.import_module(args.network_module)
File "/root/miniconda3/lib/python3.8/importlib/init.py", line 118, in import_module
if name.startswith('.'):
AttributeError: 'NoneType' object has no attribute 'startswith'
Traceback (most recent call last):
File "/root/miniconda3/bin/accelerate", line 8, in
sys.exit(main())
File "/root/miniconda3/lib/python3.8/site-packages/accelerate/commands/accelerate_cli.py", line 45, in main
args.func(args)
File "/root/miniconda3/lib/python3.8/site-packages/accelerate/commands/launch.py", line 1104, in launch_command
simple_launcher(args)
File "/root/miniconda3/lib/python3.8/site-packages/accelerate/commands/launch.py", line 567, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/root/miniconda3/bin/python', '/root/autodl-tmp/sd-scripts/train_network.py', '--pretrained_model_name_or_path=/root/autodl-tmp/finalmodel.ckpt', '--in_json=/root/autodl-tmp/liangxing-lat.json', '--shuffle_caption', '--keep_tokens=1', '--train_data_dir=/root/autodl-tmp/liangxing', '--dataset_repeats=10', '--output_dir=/root/autodl-tmp/liangxing-lora-test', '--save_precision=float', '--save_model_as=ckpt', '--save_every_n_epochs=1', '--save_state', '--color_aug', '--flip_aug', '--resolution=640,640', '--train_batch_size=4', '--max_token_length=225', '--learning_rate=1e-4', '--prior_loss_weight=1.0', '--seed=2998', '--unet_lr=1e-4', '--text_encoder_lr=1e-6', '--max_train_steps=8955', '--gradient_checkpointing', '--gradient_accumulation_steps=2', '--mixed_precision=no', '--clip_skip=2', '--logging_dir=logs', '--lr_scheduler=polynomial', '--lr_warmup_steps=450']' returned non-zero exit status 1.

Can somebody modifies this to run on colab?

I don't have an decent video card for training.if some one can modify this to run on colab will be much appreciated.
or if I can run this totally on CPU,without video card support?

No data found. Please verify arguments

Hi, i don't really get why i'm getting the message

No data found. Please verify arguments / 画像がありません。引数指定を確認してください

Did the

.\venv\Scripts\activate

Inside sd-scripts folder

Then

accelerate launch --num_cpu_threads_per_process 8 train_network.py --pretrained_model_name_or_path=B:\AIimages\stable-diffusion-webui\models\Stable-diffusion\model.ckpt --train_data_dir=B:\AIimages\training\images\input\ --output_dir=B:\AIimages\sd-scripts\output\ --prior_loss_weight=1.0 --resolution=512,512 --train_batch_size=4 --learning_rate=1e-4 --max_train_steps=200 --use_8bit_adam --xformers --mixed_precision=fp16 --save_every_n_epochs=1 --save_model_as=safetensors --clip_skip=1 --seed=42 --color_aug --network_module=networks.lora --unet_lr=5e-4 --text_encoder_lr=5e-5

There are 61 .pngs with .caption files inside input folder

Add more arguments for Cosine_with_restarts scheduler

Please add some parameters in your finetuner, like Minimum and maximum learning rate, rate of increasing/decreasing learning rate per cycle, numbers of epochs per cycle and warmup steps per cycle, like in Aria1th's monkey patch

https://github.com/aria1th/Hypernetwork-MonkeyPatch-Extension

extract_lora does not work because module keys don't match any SD1.x models

Hey again.

Edit: I see it works for SD2.x models so I guess the SD1.x keys are not the same and need to be added. Is it SpatialTransformer that's missing?

I was attempting to try out the extract_lora_from_models.py but realized that UNET_TARGET_REPLACE_MODULE = ["Transformer2DModel", "Attention"] never matches any layers in any models I throw at it, so the result is always create LoRA for U-Net: 0 modules., and an empty output file.

Are these the correct keys for SD1.x models?

Converted v1 checkpoints in conversion script cause error in generation

Some shape of weights is wrong.

RuntimeError: Error(s) in loading state_dict for LatentDiffusion:
        size mismatch for model.diffusion_model.input_blocks.1.1.proj_in.weight: copying a param with shape torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]).
        size mismatch for model.diffusion_model.input_blocks.1.1.proj_out.weight: copying a param with shape torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]).
        size mismatch for model.diffusion_model.input_blocks.2.1.proj_in.weight: copying a param with shape torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]).
...

Error during windows installation, RESOLVED

I'm going along the steps and I'm getting an error during the :
cp .\bitsandbytes_windows*.dll .\venv\Lib\site-packages\bitsandbytes

I'm on windows 11

Save training metadata to model outputs

Hi thanks so much for your work with the LoRA training, its been a blast

I was thinking it would be a good help if the training parameters used for a LoRA model were saved to the resulting .pt file. I might want to remember how I configured a model and right now I have to remember to write down the parameters myself every time. Adding the data automatically would also help if I receive a model from somewhere else and want to know how they trained it

Some examples of things I would find useful in this metadata

SD model name/hash that was trained on
Directory structure/number of images/list of concepts/repeats
Epoch count
Batches per epoch
Regularization image count
Total number of optimization steps
LR scheduler/warmup rate
Training batch size
Learning rates

Given that the output files are PyTorch models (.pt) they seem to just be .zip files, so maybe just putting the training parameters as a .json file inside would suffice. And the .safetensors format has a JSON header also

From the additional_networks extension this data could later be inspected from a new tab or similar

It is important that the data is embedded into the .pt file itself so it is retained if the model is distributed later

Folder name system to Config File or JSON

The folder name system sucks.
when we are literally passing so many configs each time why using the folder system. Also in the folder system we can't pass some characters which are necessary. Like in the concept.

	if stop_text_encoder_training:
	print(f"stop text encoder training at step {global_step}")
	text_encoder.train(False)