Giter Club home page Giter Club logo

transfertts's Introduction

TransferTTS (Zero-shot VITS) - PyTorch Implementation (-Ongoing-)

Note!!(09.23.)

In current, this is just a implementation of zero-shot system; Not the implementation of the first contribution of the paper: Transfer learning framework using wav2vec2.0. As the future work, the model equipped with complete implementations of the two contributions (zero-shot and transfer-learning) will be implemented in the follwoing repository. Congratulations on being awarded the best paper in INTERSPEECH 2022.

Overview

Unofficial PyTorch Implementation of Transfer Learning Framework for Low-Resource Text-to-Speech using a Large-Scale Unlabeled Speech Corpus. Most of codes are based on VITS

  1. MelStyleEncoder from StyleSpeech is used instead of the reference encoder.
  2. Implementation of untranscribed data training is omitted.
  3. LibriTTS dataset (train-clean-100 and train-clean-360) is used. Sampling rate is set to 22050Hz.

Pre-requisites (from VITS)

  1. Python >= 3.6
  2. Clone this repository
  3. Install python requirements. Please refer requirements.txt
    1. You may need to install espeak first: apt-get install espeak
  4. Build Monotonic Alignment Search and run preprocessing if you use your own datasets.
# Cython-version Monotonoic Alignment Search
cd monotonic_align
python setup.py build_ext --inplace

Preprocessing

Run

python prepare_wav.py --data_path [LibriTTS DATAPATH]

for some preparations.

Training

Train your model with

python train_ms.py -c configs/libritts.json -m libritts_base

Inference

python inference.py --ref_audio [REF AUDIO PATH] --text [INPUT TEXT]

References

transfertts's People

Contributors

hcy71o avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

transfertts's Issues

Speech synthesis results

Hello @hcy71o ,

Liked your work in Transfer TTS and SC VITS. I have trained a model up to 350000 steps using LibriTTS train clean 100 dataset only but when I synthesize results using some random audio file the speech is not clear.

So, my question is:

  1. How many steps did you train your model?

  2. What should be the length (duration) of audio files while passing to inference.py.

  3. Also should the reference audio be a part of the training data speaker, or can it be unseen?

  4. Do you have any demo page where we can see the comparison of Transfer TTS generated audio with VITS?

Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.