Comments (10)
You say you selected the Multi-GPU checkbox? Just unselect it, and it should be able to launch with the given GPU IDs in the CUDA_VISIBLE_DEVICES
var for you. If it's already unselected, it sounds like the option might be turned on elsewhere like the default config for accelerate.
from kohya_ss.
Makes sense. Will try now. BTW, do you know a command to follow each gpu individual process as when I launch the CUDA process in activated env to monitor the steps?
from kohya_ss.
I don't know of any easy way to track multiple training processes.
Maybe you could launch several instances of kohya gui in separate terminal shells and note which port numbers they use (7860, 7861, etc), then launch training from each gui session, and observe from each terminal.
But if you're manually running the training commands from a shell instead of the web gui, have you tried using multiplexers like screen or tmux? It'd allow you to have split-screen terminals without requiring a full desktop session or window manager.
from kohya_ss.
from kohya_ss.
I unchecked Multi GPU and selected GPU IDs 1 (i have two gpu's currently on runpod, so 0 and 1)
But I get the following message, that doesnt make sense:
The following values were not passed to `accelerate launch` and had defaults used instead:
More than one GPU was found, enabling multi-GPU training.
If this was unintended please pass in `--num_processes=1`.
To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.
This worked correctly. It automatically assigned GPU=1 instead of 0
But to follow the exposed port I suppose I would add the port on Main process Port.
How I would then open this port's terminal is my only question
from kohya_ss.
But I get the following message, that doesnt make sense:
The following values were not passed to `accelerate launch` and had defaults used instead: More than one GPU was found, enabling multi-GPU training. If this was unintended please pass in `--num_processes=1`. To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.
Did you set the "Number of processes" option to 1?
But to follow the exposed port I suppose I would add the port on Main process Port.
How I would then open this port's terminal is my only question
What are you using to get a terminal, is it SSH? You should be able to open more SSH sessions and start additional kohya gui instances (without needing to change any server configs). That's essentially what I suggested earlier with multiple ports and terminal shells.
from kohya_ss.
from kohya_ss.
I think you've got the right idea. Each kohya instance gives a gui on a new port number (and URL) which is listed on the log output.
I don't know how it behaves on runpod (haven't used it) but if I run a local instance of the gui, the gradio port number would start at 7860. Running more instances would bump it to 7861 and 7862 in a new URL, all of which I could access directly in my web browser.
Using shell=True when running external commands...
IMPORTANT: You are using gradio version 4.26.0, however version 4.29.0 is available, please upgrade.
--------
Running on local URL: http://127.0.0.1:7860
Running on public URL: https://_______.gradio.live
Now if you're talking about what do with the "Main process port" option, you can ignore that. It's used for training on multiple GPUs across different machines, which is not what we're doing here.
from kohya_ss.
from kohya_ss.
from kohya_ss.
Related Issues (20)
- Trouble with TensorRT... can't be found, though it is installed (Kubuntu 24.04) HOT 2
- AttributeError: 'DistributedDataParallel' object has no attribute 'text_model'
- Nothing pops up when I click start training
- when loading saved Preset, Learning Rate always auto-fills HOT 4
- Docker Container Approach -- Permission issues for generating BLIP captioning HOT 3
- RuntimeError: The size of tensor a (8) must match the size of tensor b (6) at non-singleton dimension 3 HOT 1
- Conflict between venv configuration and conda in Linux environment HOT 1
- LORA training does not start, it keeps crashing - W socket.cpp:663] [c10d] The client socket has failed to connect to [DESKTOP-413GD2B]:12345 HOT 4
- No AVX CPU (former mining rig) - workaround (?) HOT 4
- Cannot use sdxl fine tune HOT 2
- Error with accelerate site package - unknown keys HOT 1
- IMPORTANT: You are using gradio version 4.26.0, however version 4.29.0 is available, please upgrade. HOT 1
- How to train LORA with multiple GPUs HOT 5
- Stable Diffusion 3 HOT 25
- Cannot start DoRa training due to error HOT 6
- controlnet lllite-RuntimeError: Sizes of tensors must match except in dimension 2. Expected size 0 but got size 16 for tensor number 1 in the list.
- How to extract LyCORIS keep small size by good effect.
- voluptuous.error.MultipleInvalid: extra keys not allowed @ data['settings'] why? HOT 1
- Is there a way to train SD3 with LoRA in free Colab? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from kohya_ss.