Comments (5)
My first error was that my training got killed immediately after it starts training. So I tried for some time to train again until this error popped up.
from dreambooth-stable-diffusion.
Having similar issues
This might just be a coincidence, but I got the RuntimeError: No CUDA GPUs are available
error, when running on a RTX 3090
Switched to a RTX A5000 and that error went away.
But I am now having the issue where like dadiwonton described where it is killed right after starting to train:
Epoch 0: 0%| | 0/2020 [00:00<?, ?it/s]/venv/lib/python3.8/site-packages/pytorch_lightning/utilities/data.py:72: UserWarning: Trying to infer the `batch_size` from an ambiguous collection. The batch size we found is 1. To avoid any miscalculations, use `self.log(..., batch_size=batch_size)`.
warning_cache.warn(
/venv/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/logger_connector/result.py:229: UserWarning: You called `self.log('global_step', ...)` in your `training_step` but the value needs to be floating point. Converting it to torch.float32.
warning_cache.warn(
Epoch 0: 0%| | 1/2020 [00:02<1:40:08, 2.98s/it, loss=0.0382, v_num=0, train/lHere comes the checkpoint...
Killed
from dreambooth-stable-diffusion.
Having similar issues This might just be a coincidence, but I got the
RuntimeError: No CUDA GPUs are available
error, when running on a RTX 3090 Switched to a RTX A5000 and that error went away. But I am now having the issue where like dadiwonton described where it is killed right after starting to train:Epoch 0: 0%| | 0/2020 [00:00<?, ?it/s]/venv/lib/python3.8/site-packages/pytorch_lightning/utilities/data.py:72: UserWarning: Trying to infer the `batch_size` from an ambiguous collection. The batch size we found is 1. To avoid any miscalculations, use `self.log(..., batch_size=batch_size)`. warning_cache.warn( /venv/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/logger_connector/result.py:229: UserWarning: You called `self.log('global_step', ...)` in your `training_step` but the value needs to be floating point. Converting it to torch.float32. warning_cache.warn( Epoch 0: 0%| | 1/2020 [00:02<1:40:08, 2.98s/it, loss=0.0382, v_num=0, train/lHere comes the checkpoint... Killed
Saw a YT Comment about your issue:
If you get the error about it being killed after 1 step, open the terminal, type "ps aux" and look for the pid for both python relauncher and webui, then type "kill (the id for either)" and kill both of them. Was stuck on that error for a while with an A5000 but this fixed my problem.
I'm having the same issue on an A5000 as @dadiwonton where it doesn't even start an iteration. Same error
from dreambooth-stable-diffusion.
Ahh thanks mate! Killing those processes seemed to clear it up, and it is training now.
Will see it finishes 🤞
from dreambooth-stable-diffusion.
Lots of good help on discord.
from dreambooth-stable-diffusion.
Related Issues (20)
- NameError: name 'trainer' is not defined HOT 8
- Does dreambooth support multi-subjects training? HOT 12
- Training on a model other than SD 1.5 HOT 6
- Regularization step always stops almost at the end HOT 1
- Conda OOM issue locally HOT 1
- Upload Images in Dreambooth Training Environment Setup fails on dreambooth_joepenna.ipynb HOT 4
- num_samples should be a positive integer value, but got num_samples=0 HOT 1
- "No training images provided" error HOT 5
- Establish a baseline with a sample set of training images HOT 1
- Ubuntu Running Error HOT 9
- ImportError: cannot import name '_PATH' from 'pytorch_lightning.utilities.types'
- support freezing text_encoder layers for OpenCLIP
- how to train x4-upscaling? HOT 1
- OutOfMemoryError: CUDA out of memory (WHY?) HOT 2
- Failure in installation step 2: ERROR: file:///content does not appear to be a Python project: neither 'setup.py' nor 'pyproject.toml' found. HOT 2
- pickle.UnpicklingError: invalid load key - Issue Using Safetensor training model HOT 1
- Torch Install Failure: "raise AssertionError("Torch not compiled with CUDA enabled") AssertionError: Torch not compiled with CUDA enabled" HOT 1
- Where is PPL implemented
- Error: HeaderTooLarge HOT 4
- Does this program support training based on muti-GPUS ?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dreambooth-stable-diffusion.