Comments (10)
When resuming from your pretrained G and D,
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 2.71 GiB (GPU 0; 23.69 GiB total capacity; 5.86 GiB already allocated; 2.71 GiB free; 19.53 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
from freevc.
Works fine with num_workers=4. It's a minor issue, but could be useful for somebody.
from freevc.
Hmm also, bugs during eval step with num_workers=4
INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes.
./logs/freevc/G_195000.pth
INFO:freevc:Loaded checkpoint './logs/freevc/G_195000.pth' (iteration 2053)
./logs/freevc/D_195000.pth
INFO:freevc:Loaded checkpoint './logs/freevc/D_195000.pth' (iteration 2053)
INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration.
INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration.
INFO:freevc:Train Epoch: 2053 [97%]
INFO:freevc:[2.2948668003082275, 2.467487096786499, 9.334207534790039, 16.157079696655273, 1.7016690969467163, 379800, 0.00015467115812058983]
INFO:freevc:====> Epoch: 2053
INFO:freevc:====> Epoch: 2054
INFO:freevc:Train Epoch: 2055 [5%]
INFO:freevc:[2.3948028087615967, 2.7103328704833984, 10.981183052062988, 18.12336540222168, 1.8913426399230957, 380000, 0.0001546324927477965]
terminate called without an active exception
terminate called without an active exception
Exception ignored in: <function _MultiProcessingDataLoaderIter.__del__ at 0x7fd1f94cc430>
Traceback (most recent call last):
File "/home/sk/anaconda3/envs/freevc/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1466, in __del__
self._shutdown_workers()
File "/home/sk/anaconda3/envs/freevc/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1430, in _shutdown_workers
w.join(timeout=_utils.MP_STATUS_CHECK_INTERVAL)
File "/home/sk/anaconda3/envs/freevc/lib/python3.8/multiprocessing/process.py", line 149, in join
res = self._popen.wait(timeout)
File "/home/sk/anaconda3/envs/freevc/lib/python3.8/multiprocessing/popen_fork.py", line 44, in wait
if not wait([self.sentinel], timeout):
File "/home/sk/anaconda3/envs/freevc/lib/python3.8/multiprocessing/connection.py", line 931, in wait
ready = selector.select(timeout)
File "/home/sk/anaconda3/envs/freevc/lib/python3.8/selectors.py", line 415, in select
fd_event_list = self._selector.poll(timeout)
File "/home/sk/anaconda3/envs/freevc/lib/python3.8/site-packages/torch/utils/data/_utils/signal_handling.py", line 66, in handler
_error_if_any_worker_fails()
RuntimeError: DataLoader worker (pid 30152) is killed by signal: Aborted.
INFO:freevc:Saving model and optimizer state at iteration 2055 to ./logs/freevc/G_380000.pth
INFO:freevc:Saving model and optimizer state at iteration 2055 to ./logs/freevc/D_380000.pth
from freevc.
I had not encountered this problem so currently I tend to think it is due to the machine.
from freevc.
Yes, maybe it's some local misconfiguration of the env.
from freevc.
What pytroch/cuda versions are you running, please?
from freevc.
torch 1.10.0
cudatoolkit 11.1.1
from freevc.
Cheers, mine has
pytorch 1.13.1 py3.8_cuda11.7_cudnn8.5.0_0
from freevc.
set num_workers=0
works well for me
from freevc.
set persistent_workers=True
in train and eval DataLoder works well for me when I set num_workers>1
check link
from freevc.
Related Issues (20)
- Condition decoder on desired output length to have control over speech rate in inference?
- 基于您现有的模型使用aishell3训练,大概要训练多久,作者有试过吗
- Unseen Male to Male results in Female output HOT 1
- 音色转换程度不一致
- Epoch duration
- 关于算法的类型 HOT 1
- 训练了500个epoch,按照freevc.json配置进行训练,无论wav_tgt使用何种音色,测试出来的音色都是同一个?
- Changing batch size to 16 or 32
- poor performance on seen-to-unseen task while finetuning on Hindi language HOT 2
- 2023.01.10 update: code below can deteriorate model performance HOT 3
- Vocoder version
- Fine tuning with custom (multilingual) data HOT 1
- How to start inference example? HOT 1
- 关于训练问题
- target pitch issue after training (not appearing if using the pretrained checkpoint) HOT 1
- Config file for the FreeVC-24 checkpoint HOT 1
- training a model with 44.1k data
- Why is the speaker embedding g used to condition the Posterior Encoder and the Decoder?
- Poor results with: voice_conversion_models--multilingual--vctk--freevc24.zip CoquiTTS
- About checkpoint HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from freevc.