Comments (2)
- A better speaker encoder structure can bring better results. In our paper, we just want to prove that, as long as the extracted content representation is clean enough, the speaker encoder will learn to model the missing speaker information, even using such an extremely simple speaker encoder structure.
- 2(1)A. I think as long as the vocoder is good enough, the quality degradation won't be impactful. I‘ve never seen anyone do an ablation study to data augmentation methods, they just propose it. So currently I don't have the plan to do this ablation, sorry.
- 2(1)B. That's why we compress the bottleneck. Using a naive autoencoder we can do waveform reconstruction. If we compress the latent dim of this autoencoder to a proper size then we can do the VC task.
- 2(2). Yes it's 192. A too narrow bottleneck will lose some content information, while a too wide bottleneck will contain some speaker information. If we use a bottleneck dimension of 4, it will lose a lot of content information. Searching the best bottleneck dimension is troublesome and thus we use the SR-based augmentation to help the model learn to discard residual speaker information in the 192-dim bottleneck. As for quantization, at the very beginning of our experiment, we used residual vector quantization after the 192-dim bottleneck and found that it didn't bring any significant improvement, so we removed it.
- I think this may be because of the quality of source speech. Seen sources, which are from VCTK, generally have a more unclear pronunciation (like p259_464); while unseen sources, which are from LibriTTS, have more background noise (like 5105_28233_000016_000001). From the demo page we can hear that our model can ignore the noise but the pronunciation, which is also part of content, remains the same. Also, some unseen sources have a much longer length (5105_28233_000016_000001 is 21 seconds long), I don't know if the wav length can affect the quality judgement.
from freevc.
有关reconstruction的解释:
我从前玩过galgame角色的变声器,主角是一个幼女角色,我在用au做数据增强的时候,发现降调2key后,出其得像声优真人录音,而galgame角色的音色,反而是被变调后的。那么我通过声音的人类自然性的判断,破解了作者录音后升调2key作为galgame角色声线的过程,这就是一个reconstruction
一个0key的录音,被+-key增强后,自然度和真实性肯定不如原本的0key录音,我人耳尚且可以逆向,我认为模型也完全可以学会把他重建到0key,那么推理时担心的src音色,又重新被还原泄露了。
from freevc.
Related Issues (20)
- 48khz HOT 37
- Can you explain the meaning of prior encoder? HOT 1
- Asking help for understanding code.
- The audio suffix of VCTK data set is not '_ mic2.flac'? HOT 2
- Question for hps.data.n_mel_channels
- Inference or train with WavLM-Base or WavLM-Base+? HOT 1
- Condition decoder on desired output length to have control over speech rate in inference?
- 基于您现有的模型使用aishell3训练,大概要训练多久,作者有试过吗
- Unseen Male to Male results in Female output HOT 1
- 音色转换程度不一致
- Epoch duration
- 关于算法的类型 HOT 1
- 训练了500个epoch,按照freevc.json配置进行训练,无论wav_tgt使用何种音色,测试出来的音色都是同一个?
- Changing batch size to 16 or 32
- poor performance on seen-to-unseen task while finetuning on Hindi language HOT 2
- 2023.01.10 update: code below can deteriorate model performance HOT 3
- Vocoder version
- Fine tuning with custom (multilingual) data HOT 1
- How to start inference example? HOT 1
- 关于训练问题
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from freevc.