I have started work on supporting stable-cascade in the GUI,,, hope it will not be too

I have figured it out... <div class="snippet-clipboard-content notranslate positio

Using the latest updated code in sd-s produce better results... still not perfec

Stable-cascade support,about bmaltais/kohya_ss

Comments (45)

GamingDaveUk commented on September 2, 2024 6

Happy to see this, not much I can add to the discussion other than a thank you for undertaking this. People like yourself creating and maintaining these tools are the reason so much content exists. Thank you

from kohya_ss.

futureflix87 commented on September 2, 2024 3

Thank you so much!!

from kohya_ss.

bmaltais commented on September 2, 2024 3

I have figured it out...

[[datasets]]
resolution = 1024
batch_size = 4
keep_tokens = 1
enable_bucket = true

  [[datasets.subsets]]
  image_dir = 'd:\kohya_ss\examples\stable_cascade\test_dataset'
  num_repeats = 10
  class_tokens = 'toy'
  caption_extension = '.txt'

from kohya_ss.

bmaltais commented on September 2, 2024 3

Using the latest updated code in sd-scripts produce better results... still not perfect... kohya is working on allowing to train stage_b... hoping this will fix the issue with the final look:

from kohya_ss.

bmaltais commented on September 2, 2024 2

I have shared the test dataset in the stable_cascade branch. Look under the examples folder. You can play with it for now.

from kohya_ss.

bmaltais commented on September 2, 2024 1

I did my test with 8… I don’t think the disappointing result is due to that… I tried using other optimiser but I don’t have enough vram.

from kohya_ss.

bmaltais commented on September 2, 2024 1

I did provide everything in the stable_cascade branch. Look in the example folder in that branch. You will find the dataset, toml file for the dataset, etc. The new way of configuring the image for finetuning in the latest sd-scripts code is to use a .toml file... this is what the new SC Finetuning tab is configured to use...

from kohya_ss.

bmaltais commented on September 2, 2024 1

Ok, so the SC fine tuning tab always looks at the examples folder for the toml file, got it. I will edit the toml with images path.

Trying this out again today!

Actually it does not. Just make sure to put the path to your toml in the SC Finetuning and it should work. It does not need to be in the example.

from kohya_ss.

bmaltais commented on September 2, 2024 1

No worries, I keep editing mine to :-)

as fast as a GUI to manage and create the toml dataset file it might be possible but I feel it might just be easier to just create one by hand. The complexity of building a gradio interface for that is beyond my current knowledge… but I am sure someone could do it.

if someone want to take a crack at creating a toml dataset gradio gui class I will gladly add it to the interface.

from kohya_ss.

bmaltais commented on September 2, 2024 1

Thank you for sharing this. I will test it out later tonight after work and familly stuff ;-) I will update the content of the branch with you updates so it can help others who want to cut their teeth on this ;-)

The sample you provided is actually pretty great. Probably a combination of your parameters, source data and training the text encoder

from kohya_ss.

311-code commented on September 2, 2024 1

Ok, I spent I spent a couple more days testing. Tried a few things, no captions, changing classes, general tokens, training self with 60 photos like I would do on SDXL.

Overall it's very difficult to figure out the right combination of unet model and text encoder model to use in comfyui, or what number of steps for 60 photos is the best for Cascade. Maybe this will change with future diffusers updates? To complicate things, the 13 ted photos look good at 800 steps in the samples, then falls off, but then got decent again at 1800 steps. It makes me wonder if it looks more like the Ted likeness if I did more epochs.

It takes a long time to fully finetune Cascade at time of writing this it seems, and I'm struggling to figure it out. It didn't look like me overall and looked pretty undertrained at 3400 steps at batch size 3 with 60 photos. Thinking this is going to need a lot more steps, which doesn't seem in line with it "training faster than SDXL". I could increase learning rate on everything here again, but in SDXL that always seemed to make the results works.

This guy is getting pretty decent results of his cat though at 8000 steps (but overfitting) and is using a very large 7 batch size https://www.reddit.com/r/StableDiffusion/comments/1azmhte/my_cat_in_different_stylesstablecascade_stagec/ with kohya scripts directly.

I ran out of disk space though because I fell asleep and it was saving too often at 300 steps.

from kohya_ss.

vgaggia commented on September 2, 2024 1

I'm actually busy trying to training stable cascade with around a dataset of 180k images, although i am using onetrainer cause it seems to be less memory intensive for some reason.

I have also noticed the fact that the training gets better and worse constantly as it trains, it's gonna be a while for my training to finish on a single gpu so no clue when i can actually show some results

from kohya_ss.

segalinc commented on September 2, 2024 1

Have you checked what one trainer is doind that seems like people are getting really nice results using it?

…

On Mon, Mar 11, 2024, 9:41 PM brentjohnston ***@***.***> wrote: Just want to give an update, I tried all the stuff I mentioned earlier, and tried to train a 60's celebrity with 74 photos on Cascade, tried a ton of settings and text encoder mode/unet model combinations, LR settings, steps. I can't get sdxl dreambooth level results with a trained human or a flexible model. I tried a ton of stuff over like 8 hours. I think now that SD3 is coming out I may just wait it out, and focus on sdxl again for now. I hope SD3 not as difficult to train as this is, but we'll see. I'm could still be doing something wrong here. I can't produce the results I had above with Ted. — Reply to this email directly, view it on GitHub <#1982 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACHLPEUBQASHPKN73XAVNKTYX2IRRAVCNFSM6AAAAABDOHW3TWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOJQGM2DQNJVGU> . You are receiving this because you commented.Message ID: ***@***.***>

from kohya_ss.

bmaltais commented on September 2, 2024

Can someone share a toml config file for a simple one concept finetuning? I never do finetuning and apparently using .toml is the way to go now... and I have no clue how to configure it ;-)

My 1st quest in making the GUI is getting to properly finetune a Stable Cascade model... and I need a proper .toml to run this example command:

& accelerate launch --mixed_precision bf16 --num_cpu_threads_per_process 2 stable_cascade_train_stage_c.py `
  --mixed_precision bf16 --save_precision bf16 --max_data_loader_n_workers 0 --persistent_data_loader_workers `
  --gradient_checkpointing --learning_rate 1e-4 `
  --optimizer_type adafactor --optimizer_args "scale_parameter=False" "relative_step=False" "warmup_init=False" `
  --max_train_epochs 10 --save_every_n_epochs 1 --save_precision bf16 `
  --output_dir e:\model\test --output_name sc_test `
  --stage_c_checkpoint_path "E:\models\stable_cascade\stage_c_bf16.safetensors" `
  --effnet_checkpoint_path "E:\models\stable_cascade\effnet_encoder.safetensors" `
  --previewer_checkpoint_path "E:\models\stable_cascade\previewer.safetensors" `
  --dataset_config "D:\kohya_ss\examples\stable_cascade\test_dataset.toml" `
  --sample_every_n_epochs 1 --sample_prompts "D:\kohya_ss\examples\stable_cascade\prompt.txt" `
  --adaptive_loss_weight

Once I am successful I will be in a better place to judge how to put the GUI together... At 1st I tought I would just extend the finetuning tab to support Stable Cascade... but I think it might just be better to create a dedicated tab for it... still unsure...

from kohya_ss.

bmaltais commented on September 2, 2024

Look like I am successful in finetuning...

Finetuning as zxc class toy and prompting with zxc toy posing at the beach --W 800 --H 1200... so there is hope

Look like the best epoch was 7... after that it went downhill

from kohya_ss.

bmaltais commented on September 2, 2024

I tested the results of th model in COmfyUI and they are not great... sort of washed out... Most certainly bad training parameters... Will take a while to figureout proper SC finetuning parameters...

from kohya_ss.

bmaltais commented on September 2, 2024

If you find better parameters for better results please share. Training SC is hugely VRAM intensive.

from kohya_ss.

311-code commented on September 2, 2024

I have 4090 24gb. I'll dive into this today and report back.

How many photos do you recommend I use for ideal use in Cascade to test this?

from kohya_ss.

gesen2egee commented on September 2, 2024

Maybe because this

from kohya_ss.

311-code commented on September 2, 2024

Ok I feel like I'm close, but I'm not familiar this new code. Is there any basic info you can provide of where to put the training images and the format of sample .json for cascade? It's very different.

from kohya_ss.

311-code commented on September 2, 2024

Thank you, I completely missed your examples folder.

I just read everything here also: https://github.com/kohya-ss/sd-scripts/tree/stable-cascade as per your main page link. Went and read the stable cascade branch. (good info + docs folder for general fine-tune) but had to translate the Japanese.

After replacing examples folder with additions/images/toml where do I place all of files? I am assuming either leave them there or to your empty "dataset" folder. Edit/Update: The .toml file in examples folder controls dataset location.

from kohya_ss.

bmaltais commented on September 2, 2024

The dataset can be anywhere. Simply edit the toml file to point to it and specify the repeats, resolution, etc.

from kohya_ss.

311-code commented on September 2, 2024

Ok, so the SC fine tuning tab always looks at the examples folder for the toml file. I will edit the toml with images path.

Trying this out again today!

from kohya_ss.

311-code commented on September 2, 2024

Ok got it, I see it under SC Finetuning tab > folders > Dataset toml path (it looks for .json by default) "selected all filetypes" then chose the model files for each field, then the .toml. in the \examples\stable_cascade folder. It's training now.

Not sure if you have plans to map the .toma file to gradio interface input fields with some instructions in there, but I think it would help a lot for novices like me in getting into this.

Thanks again for working on this btw, it's pretty huge for the community having easy to use training like this imo.

Edit: I keep editing my posts because my brain can't think straight the last few days and want the info to be as clear as possible for users.

from kohya_ss.

311-code commented on September 2, 2024

I had some luck with slightly higher quality outputs and a few issues, here are some samples:

The first one has a prompt censorship joke if you look closely haha.

Problems: Likeness not completely there, samples during training look good at 800 steps (Should be telling you epochs to make this easier, apologies) but not as good a 800 steps in comfyui. So I used 1800 checkpoint. Another issue is samples stuck at 192x192.

For some reason I have to use an overtrained checkpoint with a text_model for clip node that is less steps than the checkpoint to get decent results, or even mix a text_model from another training of Ted gets better results somehow.

from kohya_ss.

bmaltais commented on September 2, 2024

Interesting results...

Unet and TE:

TE only:

UNet only:

Conclusion... TE has the most importance as far as likeness goes... but without the trained UNet the result is quite fuzzy...

from kohya_ss.

311-code commented on September 2, 2024

That first one is much better likeness than I ever got.

I really had to fight with text encoder unet model combinations. Maybe increasing text encoder learning rate a bit could help as it has the biggest impact?

from kohya_ss.

bmaltais commented on September 2, 2024

Look like TE is overfitting while UNet is way underfitted. Maybe increasing UNet LR to 0.0001 might help balance learning between both and prevent overfit.

from kohya_ss.

vgaggia commented on September 2, 2024

I sure will find out if it's a massive fail!

Have you considered trying a very high learn rate maybe it trains differently than we're used to, it is supposed to be easier to train if i remember right.

from kohya_ss.

betterftr commented on September 2, 2024

for me it generates samples 192x192 during training, trying to figure out why, since I set w and h 1024

from kohya_ss.

bmaltais commented on September 2, 2024

The small samples is related to how the as-script code is actually being used. Nothing I can do. This is something only Koby’s can address… but given how heavy creating samples is I suspect this was by design.

from kohya_ss.

betterftr commented on September 2, 2024

well as a temporary solution one can increase the --w and --h to 4096 for 4x size :D

from kohya_ss.

jordoh commented on September 2, 2024

Someone posted a workflow for converting the unet models here to work with official comfyui workflow (to get rid of that error) Simple enough. I've been out of town but will try it when I get back.

comfyanonymous/ComfyUI#2893 (comment)

Note that only loads the unet, not the clip, so you aren't able to utilize the (more effective) text encoder training.

from kohya_ss.

311-code commented on September 2, 2024

Thanks for the info. Can we convert the clip model also and and just drag it into positive and negative prompt then? (with load clip node for official comfyui workflow)

and wondering if there's any point in doing this over just using unet, was hoping it might give better results.

from kohya_ss.

jordoh commented on September 2, 2024

Thanks for the info. Can we convert the clip model also and and just drag it into positive and negative prompt then? (with load clip node for official comfyui workflow)

Maybe? I've been trying this with a model trained by the original Stable Cascade repo code and get errors as the model it produces isn't loadable as a clip model (I don't have a separate text encoder model from that process). It might work for kohya-ss trained models though - I'd be very interested to know if it does.

and wondering if there's any point in doing this over just using unet, was hoping it might give better results.

Yes, there's definitely a point, see this comment upthread for comparisons. For person-trained models, I'm unable to achieve any likeness with just the UNET (vs. generating with Stable Cascade repo that uses the trained CLIP).

from kohya_ss.

311-code commented on September 2, 2024

Yup saw that before. Sorry for confusion, I meant is there "any point" to using the official comfyui workflow vs unet workflow for this. I wonder if there would be a difference.

from kohya_ss.

jordoh commented on September 2, 2024

Yup saw that before. Sorry for confusion, I meant "any point" to comfyui workflow vs unet workflow if we got unet and clip working in both workflows. I wonder if there would be a difference.

Oh, thanks for clarifying, I think I understand what you meant now: is there any difference between saving off a checkpoint with the trained unet then using that saved checkpoint vs. using the trained unet? Seems unlikely that would affect the output, as it's the same model, clip, and VAE either way, but might save some memory or load time to use the saved off checkpoint.

from kohya_ss.

311-code commented on September 2, 2024

Yes, thanks for info.

Something I just discovered I never knew about kohya gui. You can change prompt the prompt.txt in the samples folder as it's training to change the samples.

This was pretty helpful. Finding it useful if you are saving a lot of checkpoints every however many epochs/steps and want to see something different.

from kohya_ss.

segalinc commented on September 2, 2024

will this feature work on multiple gpus?

from kohya_ss.

sapkun commented on September 2, 2024

When the controlnet training script will be release for stable cascade?

from kohya_ss.

paboum commented on September 2, 2024

Look like TE is overfitting while UNet is way underfitted. Maybe increasing UNet LR to 0.0001 might help balance learning between both and prevent overfit.

Please try adaptive optimizers already, e.g Prodigy. I'm a newbie here, never even used those LR parameters. Also, I hope this new feature will work fine with those, so at least one test is in order.

from kohya_ss.

311-code commented on September 2, 2024

I will need to look into prodigy also. I've heard good things.

Just want to give an update though, I tried to train a 60's celebrity with 74 photos on Cascade, tried a ton of settings and text encoder mode/unet model combinations, LR settings, steps.

Can't get sdxl dreambooth or full finetuning level results with a trained human. Tried a ton of stuff over like 8 hours. I think now that SD3 is coming out I may just wait it out.

from kohya_ss.

311-code commented on September 2, 2024

Was thinking of trying that out but heard it may not train the text encoder like this does, Edit: Nm I believe it can

I will give it a go though just to see how it compares, thanks!

from kohya_ss.

311-code commented on September 2, 2024

Some info from Kohya Cascade branch since things have stagnated here, if anyone want to try:

Official learning rate for Cascade default is 1e-4 or (0.0001) and official settings use bf16 for training.

The first time, specify --text_model_checkpoint_path and --save_text_model to save
the Text Encoder weights. From the next time, specify --text_model_checkpoint_path to load the saved weights.

Note:

A quick clarification, Stable Cascade uses Stage A & B to compress images and Stage C is used for the text-conditional
learning. Therefore, it makes sense to train a LoRA or ControlNet only for Stage C. You also don't train a LoRA or
ControlNet for the Stable Diffusion VAE right?

If your GPU allows for it, you should definitely go for the large Stage C, which has 3.6 billion parameters.
It is a lot better and was finetuned a lot more. Also, the ControlNet and Lora examples are only for the large Stage C at the moment.
For Stage B the difference is not so big. The large Stage B is better at reconstructing small details,
but if your GPU is not so powerful, just go for the smaller one.

I finally got Onetrainer working to compare, will report back.

Edit: Comparing to Kohya gui but had side issue. Onetrainer seems to have a custom-made diffusers to .safetensors converter after training and it's not great imo. I would recommend doing a manual conversion of a backup from diffusers loader node to checkpoint save node in comfyui if comparing.

from kohya_ss.

mhaines94108 commented on September 2, 2024

I tested the results of th model in COmfyUI and they are not great... sort of washed out... Most certainly bad training parameters... Will take a while to figureout proper SC finetuning parameters...

I have spent several weeks trying to fine-tune Stable Cascade on a dataset of ~50K photos, and my results have a very similar finger-painted look. I've been using the sample code straight from Stable Diffusion. I guess I'll try Kohya's scripts.

from kohya_ss.

Stable-cascade support about kohya_ss HOT 45 OPEN

Comments (45)

Note:

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent