Comments (5)
Should I reduce batch_size? But Jasper use BN dense.
from nemo.
What is the maximum duration of audiofiles in your train set? Make a histogram out of the durations and see if you have any outliers (very long audiofiles). Try to remove all audiofiles longer than X seconds (start X with some large value, and then lower it until you stop getting CUDA out of memory exceptions).
from nemo.
Like @RobertInjac mentioned above, GPU memory usage depends a lot on the max length of the audio during training. For public datasets such as LibriSpeech and Mozilla Common Voice, we cap it at 16.7 seconds during training. So one option is to cut your audio files into smaller pieces (but don't go too small - you still want several words per audio sample).
Another option is to reduce the batch size per GPU. Note that you can still simulate a larger batch size by setting batches_per_step parameter to more than 1 (see https://nvidia.github.io/NeMo/api-docs/nemo.html#module-nemo.core.neural_factory). This may also help with GPU utilization during multi-GPU/multi-node training
from nemo.
What is the maximum duration of audiofiles in your train set? Make a histogram out of the durations and see if you have any outliers (very long audiofiles). Try to remove all audiofiles longer than X seconds (start X with some large value, and then lower it until you stop getting CUDA out of memory exceptions).
Thank you. The max length of the audio in my dataset in only 15S.
from nemo.
Like @RobertInjac mentioned above, GPU memory usage depends a lot on the max length of the audio during training. For public datasets such as LibriSpeech and Mozilla Common Voice, we cap it at 16.7 seconds during training. So one option is to cut your audio files into smaller pieces (but don't go too small - you still want several words per audio sample).
Another option is to reduce the batch size per GPU. Note that you can still simulate a larger batch size by setting batches_per_step parameter to more than 1 (see https://nvidia.github.io/NeMo/api-docs/nemo.html#module-nemo.core.neural_factory). This may also help with GPU utilization during multi-GPU/multi-node training
Thank you. I solve the problem by opening apex O1 and reducing my batch_size to 14. And I will try to set batches_per_step to 2.
from nemo.
Related Issues (20)
- Getiing an error called ModuleNotFoundError: No module named 'packaging'
- ASR TypeError: '<' not supported between instances of 'NoneType' and 'int' HOT 3
- TransformerLayer MLP parameters are not being set during model initialization
- Converting Mistral/Mixtral to Nemo format throws error: Received both `precision=bf16-mixed` and `plugins=<...nlp_overrides.PipelineMixedPrecisionPlugin>` HOT 1
- RuntimeError: start (2) + length (1) exceeds dimension size (2) when trying to run Online_ASR_Microphone_Demo_Cache_Aware_Streaming.ipynb HOT 1
- vad finetune on my dataset HOT 3
- Forced Aligner with long audios HOT 6
- How to search for pretrained model HOT 1
- Wer is very low. HOT 5
- TypeError: EncDecCTCModel.transcribe() got an unexpected keyword argument 'logprobs' HOT 2
- NeMo License Discrepancy?
- Consider refactoring CTC greedy decoding HOT 5
- Memory is fully eaten and training quit with errors for 40k hours ASR training HOT 4
- The error in loading Llama pretrain checkpoint for NeVa(LLAVA) HOT 1
- File not found in the github repo
- Streaming example provided for Hinglish doesnt work. HOT 3
- eval_trainer.predict() gives AttributeError: 'PipelineMixedPrecisionPlugin' object has no attribute '_desired_input_dtype' error
- Error when using packed sequence and gradient checkpoint: save_for_backward can only save variables, but argument 5 is of type PakedSeqParams
- Warning: nvfuser is no longer supported in torch script
- How to specify the rank,delta and dropout values while LORA finetuning
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from nemo.