Giter Club home page Giter Club logo

adavocoder's Introduction

AdaVocoder: Adaptive Vocoder for Custom Voice

In our paper, we proposed AdaVocoder: Adaptive Vocoder for Custom Voice.
We provide our implementation and pretrained models for AdaHiFi-GAN as open source in this repository.

Abstract :

Custom voice is to construct a personal speech synthesis system by adapting the source speech synthesis model to the target model through the target few recordings. The solution to constructing a custom voice is to combine an adaptive acoustic model with a robust vocoder. However, training a robust vocoder usually requires a multi-speaker dataset, which should include various age groups and various timbres, so that the trained vocoder can be used for unseen speakers. Collecting such a multi-speaker dataset is difficult, and the dataset distribution always has a mismatch with the distribution of the target speaker dataset.

This paper proposes an adaptive vocoder for custom voice from another novel perspective to solve the above problems. The adaptive vocoder mainly uses a cross-domain consistency loss to solve the overfitting problem encountered by the GAN-based neural vocoder in the transfer learning of few-shot scenes. We construct two adaptive vocoders, AdaMelGAN and AdaHiFi-GAN. First, We pre-train the source vocoder model on AISHELL3 and CSMSC datasets, respectively. Then, fine-tune it on the internal dataset VXI-children with few adaptation data. The empirical results show that a high-quality custom voice system can be built by combining a adaptive acoustic model with a adaptive vocoder.

Pre-requisites

  1. Python >= 3.6
  2. Clone this repository.
  3. Install python requirements. Please refer requirements.txt
  4. Download and extract the AISHELL3 dataset, then rename or create a link to the dataset folder: ln -s /path/to/AISHELL-3/wavs DUMMY1 And move all wav files to AISHELL-3/wavs, and sample all audio files to 22050Hz.

Training HiFi-GAN

python train_hifi_gan.py --config config_v1.json
  • Tensorboard
tensorboard --logdir cp_hifigan/logs/ --bind_all

Checkpoints and copy of the configuration file are saved in cp_hifigan directory by default.
You can change the path by adding --checkpoint_path option.

Pretrained Model

You can also use pretrained models we provide.
Download AISHELL3 pretrained models

Training AdaHiFi-GAN

First you need to save the pre-trained AISHELL-3 model to cp_ada_hifigan.

Due to the need for confidentiality, VXI-children is not used here. I tested it using the child sample shared by Data-Baker.

python train_ada_hifi_gan.py --config config_v1.json
  • Tensorboard
tensorboard --logdir cp_ada_hifigan/logs/ --bind_all

Checkpoints and copy of the configuration file are saved in cp_ada_hifigan directory by default.
You can change the path by adding --checkpoint_path option.

Inference from wav file

  1. Make test_files directory and copy wav files into the directory.
  2. Run the following command.
    python inference.py --checkpoint_file [generator checkpoint file path] --model_name [hifi-gan or adahifi-gan]
    

Generated wav files are saved in generated_files by default.
You can change the path by adding --output_dir option.

Acknowledgements

We referred to HiFi-GAN to implement this.

adavocoder's People

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.