Giter Club home page Giter Club logo

gpt-sovits's Introduction

GPT-SoVITS - Voice Conversion and Text-to-Speech WebUI

Demo Video and Features

Check out our demo video in Chinese: Bilibili Demo

few.shot.fine.tuning.demo.mp4

Features:

  1. Zero-shot TTS: Input a 5-second vocal sample and experience instant text-to-speech conversion.

  2. Few-shot TTS: Fine-tune the model with just 1 minute of training data for improved voice similarity and realism.

  3. Cross-lingual Support: Inference in languages different from the training dataset, currently supporting English, Japanese, and Chinese.

  4. WebUI Tools: Integrated tools include voice accompaniment separation, automatic training set segmentation, Chinese ASR, and text labeling, assisting beginners in creating training datasets and GPT/SoVITS models.

Todo List

  1. High Priority:

    • Localization in Japanese and English.
    • User guide.
  2. Features:

    • Zero-shot voice conversion (5s) / few-shot voice conversion (1min).
    • TTS speaking speed control.
    • Enhanced TTS emotion control.
    • Experiment with changing SoVITS token inputs to probability distribution of vocabs.
    • Improve English and Japanese text frontend.
    • Develop tiny and larger-sized TTS models.
    • Colab scripts.
    • Expand training dataset (2k -> 10k).

Requirements (How to Install)

Python and PyTorch Version

Tested with Python 3.9, PyTorch 2.0.1, and CUDA 11.

Pip Packages

pip install torch numpy scipy tensorboard librosa==0.9.2 numba==0.56.4 pytorch-lightning gradio==3.14.0 ffmpeg-python onnxruntime tqdm==4.59.0 cn2an pypinyin pyopenjtalk g2p_en

Additional Requirements

If you need Chinese ASR (supported by FunASR), install:

pip install modelscope torchaudio sentencepiece funasr

FFmpeg

Ubuntu/Debian Users

sudo apt install ffmpeg

MacOS Users

brew install ffmpeg

Windows Users

Download and place ffmpeg.exe and ffprobe.exe in the GPT-SoVITS root.

Pretrained Models

Download pretrained models from GPT-SoVITS Models and place them in GPT_SoVITS\pretrained_models.

For Chinese ASR, download models from Damo ASR Models and place them in tools/damo_asr/models.

For UVR5 (Vocals/Accompaniment Separation & Reverberation Removal), download models from UVR5 Weights and place them in tools/uvr5/uvr5_weights.

Dataset Format

The TTS annotation .list file format:

vocal_path|speaker_name|language|text

Example:

D:\GPT-SoVITS\xxx/xxx.wav|xxx|en|I like playing Genshin.

Language dictionary:

  • 'zh': Chinese
  • 'ja': Japanese
  • 'en': English

Credits

Special thanks to the following projects and contributors:

gpt-sovits's People

Contributors

rvc-boss avatar ricecakey06 avatar blaise-tk avatar pengoosedev avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.