hhguo / ea-svc Goto Github PK

View Code? Open in Web Editor NEW

122.0 122.0 33.0 20 KB

An implement of "Phonetic Posteriorgrams based Many-to-Many Singing Voice Conversion via Adversarial Training"

License: MIT License

Python 100.00%

ea-svc's People

Contributors

Stargazers

Watchers

ea-svc's Issues

Pretrained model

Hello, could you release pretrained model?

what is "--pitch_checkpoint_path "in inference.py,the pitch model is not mentioned in the paper

greate job！
Can you give some introduction to the pitch model?

How to make conversion with pitch and timbre control?

Hi! Thank you for your great work! I have small question about voice conversion using inference.py or model itself in python shell

In the demo (https://hhguo.github.io/DemoEASVC/) there are the best coversions made with pitch shifting. Assume I have a trained model checkpoint. What should I do to produce different conversed audios corresponding to alpha parameter (like in the end of demo: Pitch Control section)?

Also, the same question to Timbre Transfer section.

I appreciate your help very much!

dataset

Hi ，I want to reproduce the effect of this paper. Will the dataset or model be released?

Unused parameters bug in configs

In configs there are stage parameter in each config with value 0, 1, 2. It is not used in the main train function of train.py file, but also fails with any command:

CUDA_VISIBLE_DEVICES=0 python train.py -c configs/stage1.json

with traceback:

Traceback (most recent call last):
  File "train.py", line 283, in <module>
    train(num_gpus, args.rank, args.group_name, **train_config)
TypeError: train() got an unexpected keyword argument 'stage'

Bad quality of generated speech after training

Hello! I made some preprocessing to get features of wavs in dataset for training EA-SVC. Actually, I get the following features:

PPG from hidden state of model trained on TIMIT dataset (768 dim)
f0 with WORLD by direct use of pyworld (1 dim, zeros in f0 are not processed)
spk embeds using pyannote.audio

I tried training for first 2 stages (i.e. without adversarial generator training and then with it) on both LibriSpeech dev-clean and NUS48E singing. Disentaglement loss wasn't used in experiment. So, for the 1st stage loss_g(g_mag + g_sc) is about 1.0; for the 2nd: loss_g increased to 5.0 (g_mag + g_sc + g_adv + g_feat), loss_d is about 3.0e-01 (d_real + d_fake). Model wasn't trained for 3rd stage. In both dataset experiments results are quite the same.

Because generated audio on both stages are not good, I wonder if I made a mistake in training process or something. I believe losses values above will give you a better view of this situation.

P.S. Number of stage refers to such parameter in config:

"adv_ag": false, "adv_fd": false
"adv_ag": true, "adv_fd": false
"adv_ag": true, "adv_fd": true

Great work! How to make PPG features?

Great work! How to make PPG features? Speaker embedding? F0 features?

hhguo / ea-svc Goto Github PK

ea-svc's People

Contributors

Stargazers

Watchers

Forkers

ea-svc's Issues

Pretrained model

what is "--pitch_checkpoint_path "in inference.py,the pitch model is not mentioned in the paper

How to make conversion with pitch and timbre control?

dataset

Unused parameters bug in configs

Bad quality of generated speech after training

Great work! How to make PPG features?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent