Comments (4)
This may be caused by data preprocessing, the audio length doesn't match spectrogram length.
librosa.effects.trim
can have different trimed segment with different sampling rates. For example, wav, _ = librosa.effects.trim(<24k_wav>, top_db=20)
might trim out the first 0.3 seconds, while wav, _ = librosa.effects.trim(<22k_wav>, top_db=20)
might trim out the first 0.33 seconds.
The 24k wav and 16k spectrogram and 16k wavlm feature should be from the same segment.
And btw I just found that I forgot librosa loads 22k wav by default, the original downsampling code upsamples 22k wav to 24k wav. Just fixed it.
from freevc.
Thanks for your kindly answering.
Then, if there is no leading and trailing silence, is there no difference in length between 16k spectrogram and 24k wav?
from freevc.
If you didn't trim both 16k wav (where the 16k spectrogram come from) and 24k wav, then yes.
librosa.effects.trim
might have the possibility to trim a little even if there is no apparent leading and trailing silence.
from freevc.
Thanks very much. After matching the length of 16k and 24k it works fine.
from freevc.
Related Issues (20)
- Inference or train with WavLM-Base or WavLM-Base+? HOT 1
- Condition decoder on desired output length to have control over speech rate in inference?
- 基于您现有的模型使用aishell3训练,大概要训练多久,作者有试过吗
- Unseen Male to Male results in Female output HOT 1
- 音色转换程度不一致
- Epoch duration
- 关于算法的类型 HOT 1
- 训练了500个epoch,按照freevc.json配置进行训练,无论wav_tgt使用何种音色,测试出来的音色都是同一个?
- Changing batch size to 16 or 32
- poor performance on seen-to-unseen task while finetuning on Hindi language HOT 2
- 2023.01.10 update: code below can deteriorate model performance HOT 3
- Vocoder version
- Fine tuning with custom (multilingual) data HOT 1
- How to start inference example? HOT 1
- 关于训练问题
- target pitch issue after training (not appearing if using the pretrained checkpoint) HOT 1
- Config file for the FreeVC-24 checkpoint HOT 1
- training a model with 44.1k data
- Why is the speaker embedding g used to condition the Posterior Encoder and the Decoder?
- Poor results with: voice_conversion_models--multilingual--vctk--freevc24.zip CoquiTTS
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from freevc.