Comments (3)
Thank your for your interest in our work.
- We freeze the WavLM module and use the pre-trained weights.
- The bottleneck extractor consists of a linear projection layer that projects the WavLM feature into
$d$ -dim hidden representation, 16 layers of non-causal WaveNet residual blocks, and a linear projection layer that projects$d$ -dim hidden representation into$2d$ -dim hidden representation, which is latter split into$d$ -dim$\mu_{\theta}$ and$d$ -dim$\sigma_{\theta}$ . - Yes.
- We use the same config as the official HiFi-GAN for producing
$y'$ .
from freevc.
Thank you very much for your clear answer.
So it can be said that
- The mel-spectrograms that are input to the SR-augmentation part are obtained with 22050Hz waves, using hop size 256, window size 1024 (following the config v1 of the official HiFi-GAN repository)
- and the linear-spectrograms that are input to the posterior encoder are obtained with 16000Hz waves, using hop size 320, window size 1280 (as said in your paper, section 3.1)
- Perhaps the waveform reconstructed after SR-augmentation and HiFi-GAN config v1 model (which will have a sampling rate 22050 Hz) is resampled to 16000Hz before inputting into WavLM?
Yes.
from freevc.
Thank you very much for your clear answer.
So it can be said that
- The mel-spectrograms that are input to the SR-augmentation part are obtained with 22050Hz waves, using hop size 256, window size 1024 (following the config v1 of the official HiFi-GAN repository)
- and the linear-spectrograms that are input to the posterior encoder are obtained with 16000Hz waves, using hop size 320, window size 1280 (as said in your paper, section 3.1)
- Perhaps the waveform reconstructed after SR-augmentation and HiFi-GAN config v1 model (which will have a sampling rate 22050 Hz) is resampled to 16000Hz before inputting into WavLM?
from freevc.
Related Issues (20)
- Asking help for understanding code.
- The audio suffix of VCTK data set is not '_ mic2.flac'? HOT 2
- Question for hps.data.n_mel_channels
- Inference or train with WavLM-Base or WavLM-Base+? HOT 1
- Condition decoder on desired output length to have control over speech rate in inference?
- 基于您现有的模型使用aishell3训练,大概要训练多久,作者有试过吗
- Unseen Male to Male results in Female output HOT 1
- 音色转换程度不一致
- Epoch duration
- 关于算法的类型 HOT 1
- 训练了500个epoch,按照freevc.json配置进行训练,无论wav_tgt使用何种音色,测试出来的音色都是同一个?
- Changing batch size to 16 or 32
- poor performance on seen-to-unseen task while finetuning on Hindi language HOT 2
- 2023.01.10 update: code below can deteriorate model performance HOT 3
- Vocoder version
- Fine tuning with custom (multilingual) data HOT 1
- How to start inference example? HOT 1
- 关于训练问题
- target pitch issue after training (not appearing if using the pretrained checkpoint) HOT 1
- Config file for the FreeVC-24 checkpoint HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from freevc.