Comments (13)
That just means you don't have enough memory in your GPU to run this.
Try reducing batch_size and max_len in config.
from styletts2.
That just means you don't have enough memory in your GPU to run this. Try reducing batch_size and max_len in config.
But my batch size is already 2 and batch percentage is 0.5 .
I am sharing my config file here:
log_dir: "/hdd2/Sandipan/SDhar-Projects/StyleTTS2/Models/New_Hindi_Speech_2nd"
first_stage_path: "/hdd5/Sandipan/SDhar-Projects/StyleTTS2/Log_files/epoch_1st_00037.pth"
save_freq: 2
#save_freq: 2
log_interval: 10
device: "cuda"
#epochs_1st: 50
epochs_1st: 200 # number of epochs for first stage training (pre-training)
#epochs_2nd: 30
epochs_2nd: 100 # number of peochs for second stage training (joint training)
batch_size: 2
max_len: 100
#max_len: 100 # maximum number of frames
pretrained_model: ""
second_stage_load_pretrained: true # set to true if the pre-trained model is for 2nd stage
load_only_params: false # set to true if do not want to load epoch numbers and optimizer parameters
F0_path: "Utils/JDC/bst.t7"
ASR_config: "Utils/ASR/config.yml"
ASR_path: "Utils/ASR/epoch_00080.pth"
#/hdd5/Sandipan/SDhar-Projects/StyleTTS2/Utils/PLBERT_all_languages
PLBERT_dir: 'Utils/PLBERT_all_languages/'
#"/hdd5/Sandipan/SDhar-Projects/StyleTTS2/Hindi_Data_Phoneme/val_list.txt"
data_params:
train_data: "/hdd5/Sandipan/SDhar-Projects/StyleTTS2/Hindi_Data_Phoneme/train.txt"
val_data: "/hdd5/Sandipan/SDhar-Projects/StyleTTS2/Hindi_Data_Phoneme/valid.txt"
root_path: "/hdd2/Sandipan/database/Hindi_ASR_200/Hindi_Clean/"
OOD_data: "/hdd5/Sandipan/SDhar-Projects/StyleTTS2/Hindi_Data_Phoneme/odd.txt"
min_length: 50 # sample until texts with this size are obtained for OOD texts
data_params:
train_data: "Data/train_list_new.txt"
val_data: "Data/valid_list_new.txt"
root_path: "/hdd5/Sandipan/SDhar-Projects/Grad-TTS-Libri/Speech-Backbones/Grad-TTS/LJSpeech-1.1/wavs"
OOD_data: "Data/OOD_texts.txt"
min_length: 50 # sample until texts with this size are obtained for OOD texts
preprocess_params:
sr: 24000
spect_params:
n_fft: 2048
win_length: 1200
hop_length: 300
model_params:
multispeaker: true #true #false
dim_in: 64
hidden_dim: 512
max_conv_dim: 512
n_layer: 3
n_mels: 80
n_token: 178 # number of phoneme tokens
max_dur: 50 # maximum duration of a single phoneme
style_dim: 128 # style vector size
dropout: 0.2
######### config for decoder
decoder:
type: 'istftnet' # either hifigan or istftnet
resblock_kernel_sizes: [3,7,11]
upsample_rates : [10, 6]
upsample_initial_channel: 512
resblock_dilation_sizes: [[1,3,5], [1,3,5], [1,3,5]]
upsample_kernel_sizes: [20, 12]
gen_istft_n_fft: 20
gen_istft_hop_size: 5
##############################
decoder:
type: 'hifigan' # either hifigan or istftnet
resblock_kernel_sizes: [3,7,11]
upsample_rates : [10,5,3,2]
upsample_initial_channel: 512
resblock_dilation_sizes: [[1,3,5], [1,3,5], [1,3,5]]
upsample_kernel_sizes: [20,10,6,4]
speech language model config
slm:
model: 'microsoft/wavlm-base-plus'
sr: 16000 # sampling rate of SLM
hidden: 768 # hidden size of SLM
nlayers: 13 # number of layers of SLM
initial_channel: 64 # initial channels of SLM discriminator head
style diffusion model config
diffusion:
embedding_mask_proba: 0.1
# transformer config
transformer:
num_layers: 3
num_heads: 8
head_features: 64
multiplier: 2
# diffusion distribution config
dist:
sigma_data: 0.2 # placeholder for estimate_sigma_data set to false
estimate_sigma_data: true # estimate sigma_data from the current batch if set to true
mean: -3.0
std: 1.0
loss_params:
lambda_mel: 5. # mel reconstruction loss
lambda_gen: 1. # generator loss
lambda_slm: 1. # slm feature matching loss
lambda_mono: 1. # monotonic alignment loss (1st stage, TMA)
lambda_s2s: 1. # sequence-to-sequence loss (1st stage, TMA)
TMA_epoch: 50 # TMA starting epoch (1st stage)
lambda_F0: 1. # F0 reconstruction loss (2nd stage)
lambda_norm: 1. # norm reconstruction loss (2nd stage)
lambda_dur: 1. # duration loss (2nd stage)
lambda_ce: 20. # duration predictor probability output CE loss (2nd stage)
lambda_sty: 1. # style reconstruction loss (2nd stage)
lambda_diff: 1. # score matching loss (2nd stage)
diff_epoch: 20 # style diffusion starting epoch (2nd stage)
joint_epoch: 50 # joint training starting epoch (2nd stage)
optimizer_params:
lr: 0.0001 # general learning rate
bert_lr: 0.00001 # learning rate for PLBERT
ft_lr: 0.00001 # learning rate for acoustic modules
slmadv_params:
min_len: 100
#min_len: 400 # minimum length of samples
#max_len: 500 # maximum length of samples
max_len: 200
batch_percentage: 0.5 # to prevent out of memory, only use half of the original batch size
#batch_percentage: 0.5 # to prevent out of memory, only use half of the original batch size
iter: 10 # update the discriminator every this iterations of generator update
thresh: 5 # gradient norm above which the gradient is scaled
scale: 0.01 # gradient scaling factor for predictors from SLM discriminators
sig: 1.5 # sigma for differentiable duration modeling
from styletts2.
i assume this happens right at the beginning.
It says here : torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 2.00 MiB (GPU 7; 79.15 GiB total capacity; 2.32 GiB already allocated; 3.19 MiB free; 2.37 GiB reserved in total by PyTorch that only 2.37 can b allocated by torch, so is there anything else running on your GPU ?
from styletts2.
i assume this happens right at the beginning. It says here : torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 2.00 MiB (GPU 7; 79.15 GiB total capacity; 2.32 GiB already allocated; 3.19 MiB free; 2.37 GiB reserved in total by PyTorch that only 2.37 can b allocated by torch, so is there anything else running on your GPU ?
Actually I am running my code in our Lab server, there are a 8 GPUs out of which 4-5 GPUs are already in used for other's code execution. I am running my code in specific GPU id (7), which is not used by anyone else as of now.
from styletts2.
i assume this happens right at the beginning. It says here : torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 2.00 MiB (GPU 7; 79.15 GiB total capacity; 2.32 GiB already allocated; 3.19 MiB free; 2.37 GiB reserved in total by PyTorch that only 2.37 can b allocated by torch, so is there anything else running on your GPU ?
output of nvidia-smi command for GPU id 7 which I am using :
7 NVIDIA L40S Off | 00000000:24:00.0 Off | 0 |
| N/A 36C P8 23W / 350W | 3MiB / 46068MiB | 0% Default |
| | | N/A |
from styletts2.
It seems that there is some issue somewhere but i can't really put my finger on it.
GPU 7 seems to be a 48 GB Card, yet torch says it's an 80 ?
What command are you using to run the code ?
There are issues sometimes in the code where it's .to("cuda") instead of .to("device") maybe that would help solve it ?
from styletts2.
i assume this happens right at the beginning. It says here : torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 2.00 MiB (GPU 7; 79.15 GiB total capacity; 2.32 GiB already allocated; 3.19 MiB free; 2.37 GiB reserved in total by PyTorch that only 2.37 can b allocated by torch, so is there anything else running on your GPU ?
As I make changes to the specific lines of code where the issue was raised previously, next time the same issue appeared in different lines of codes. As for example:
File "train_second.py", line 827, in
main()
File "/hdd5/Sandipan/envs/styletts1/lib/python3.7/site-packages/click/core.py", line 1157, in call
return self.main(*args, **kwargs)
File "/hdd5/Sandipan/envs/styletts1/lib/python3.7/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
File "/hdd5/Sandipan/envs/styletts1/lib/python3.7/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/hdd5/Sandipan/envs/styletts1/lib/python3.7/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "train_second.py", line 417, in main
s = model.style_encoder(st.unsqueeze(1) if multispeaker else gt.unsqueeze(1))
File "/hdd5/Sandipan/envs/styletts1/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/hdd5/Sandipan/SDhar-Projects/StyleTTS2/models.py", line 167, in forward
h = self.shared(x)
File "/hdd5/Sandipan/envs/styletts1/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/hdd5/Sandipan/envs/styletts1/lib/python3.7/site-packages/torch/nn/modules/container.py", line 204, in forward
input = module(input)
File "/hdd5/Sandipan/envs/styletts1/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/hdd5/Sandipan/SDhar-Projects/StyleTTS2/models.py", line 143, in forward
x = self._shortcut(x) + self._residual(x)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 7; 79.15 GiB total capacity; 2.36 GiB already allocated; 9.19 MiB free; 2.37 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
from styletts2.
It seems that there is some issue somewhere but i can't really put my finger on it. GPU 7 seems to be a 48 GB Card, yet torch says it's an 80 ? What command are you using to run the code ? There are issues sometimes in the code where it's .to("cuda") instead of .to("device") maybe that would help solve it ?
I am using simply this command
python train_second.py
from styletts2.
It seems that there is some issue somewhere but i can't really put my finger on it. GPU 7 seems to be a 48 GB Card, yet torch says it's an 80 ? What command are you using to run the code ? There are issues sometimes in the code where it's .to("cuda") instead of .to("device") maybe that would help solve it ?
This is how I am setting the device id, and then using " to(device) " in the required parts of the code.
device_id=7
device = torch.device((device_id) if torch.cuda.is_available() else "cpu")
from styletts2.
It seems that there is some issue somewhere but i can't really put my finger on it. GPU 7 seems to be a 48 GB Card, yet torch says it's an 80 ? What command are you using to run the code ? There are issues sometimes in the code where it's .to("cuda") instead of .to("device") maybe that would help solve it ?
in my code I have already replaced all "to(cuda)" with "to(device)"
from styletts2.
This seems to be an issue that is not linked to StyleTTS, i tried to do something similar and it seemed okay.
Have you tried to change device to just cuda, and use CUDA_VISIBLE_DEVICES=7 ?
from styletts2.
This seems to be an issue that is not linked to StyleTTS, i tried to do something similar and it seemed okay. Have you tried to change device to just cuda, and use CUDA_VISIBLE_DEVICES=7 ?
No, let me do then
from styletts2.
This seems to be an issue that is not linked to StyleTTS, i tried to do something similar and it seemed okay. Have you tried to change device to just cuda, and use CUDA_VISIBLE_DEVICES=7 ?
Thank you. Actually it seems the problem was in my end only with the GPU I am specifying. I used CUDA_VISIBLE_DEVICES command and set different GPU ids whenever I have found an idle GPU in our server. But, CUDA_VISIBLE_DEVICES=GPU id was executing my code to other GPUs instead of running my code into the specific GPU id I was specifying. That's why I set the GPU id using device_id=7
device = torch.device((device_id) if torch.cuda.is_available() else "cpu") command. However, still I was getting CUDA out of memory error.
But this time when I executed my code " CUDA_VISIBLE_DEVICES=5 python train_second.py", my code started running. I understood, I must have to do this kind of hit and trial.
Thanks for your suggestion.
from styletts2.
Related Issues (20)
- FP8 Fine Tuning Crashes HOT 1
- Error Message After Using a fine tuned ASR Model
- Stage 2 Training Fails with NaN Loss on Single GPU Due to Inconsistent Checkpoint Keys
- Multi-lingual training HOT 27
- In training Stage1 after 49th epoch getting RuntimeError: you can only change requires_grad flags of leaf variables, g_loss.requires_grad = True HOT 1
- First stage training after 49th epoch (i.e., when epoch >= TMA_epoch)
- Getting error in d_loss.backward() of first_stage training
- Can the model learn accents not supported by espeak-ng?
- Joint training is failing with Assertion error HOT 2
- In 2nd stage training AttributeError: 'AudioDiffusionConditional' object has no attribute 'module'
- Questions about Differentiable Duration Modeling HOT 1
- weird chinese pronunciation HOT 3
- Training PL-BERT on styletts2-community/multilingual-pl-bert
- Can anyone please share checkpoints that we get after we complete both stages of training HOT 4
- Model Size of fine tuned Model
- Can StyleTTS2 use phonemization from different languages to finetune or train?
- StyleTTS Python API doesn't detect devanagari script
- After training 1 epoch, train_first.py crashes: RuntimeError: Expected 2D (unbatched) or 3D (batched) input to conv1d, but got input of size: [1, 1, 1, 800] HOT 1
- Do we need lr scheduler? HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from styletts2.