Giter Club home page Giter Club logo

vits's Introduction

VITS

概要

PytorchによるVITSの実装です。
日本語音声のデータセット"JVS corpus"で学習し

  • テキストの読み上げ
  • 音声間の変換

を行うことができます。
モデルの詳しい解説と音声生成例についてはこちらを参照。

想定環境

  • Ubuntu20.04
  • Python 3.10.0
  • torch==1.13.1+cu117
  • torchaudio==0.13.1+cu117

ライブラリの詳細はrequirements.txtを参照。
ライブラリはpipによるインストールを推奨します。

プログラム

  • jvs_preprocessor.pyはJVS corpusに対し前処理を行うプログラムです。
  • vits_train.pyは前処理済みデータセットを読み込み学習を実行し、学習の過程と学習済みパラメーターを出力するプログラムです。
  • vits_text_to_speech.pyvits_train.pyで出力した学習済みパラメーターを読み込み、推論(テキストから音声の生成)を実行、結果を.wav形式で出力するプログラムです。
  • vits_voice_converter.pyvits_train.pyで出力した学習済みパラメーターを読み込み、推論(音声間の変換)を実行、結果を.wav形式で出力するプログラムです。

使い方

データセットの用意

  1. JVS corpusをダウンロード、解凍します。
  2. jvs_preprocessor.pyの16行目付近の変数jvs_dataset_pathで、解凍したJVS corpusへのパスを指定します。
  3. python jvs_preprocessor.pyを実行し前処理を実行します。
    • データセット中の各.wavファイルがサンプリングレート22050[Hz]へと変換され、./dataset/jvs_preprocessed/jvs_wav_preprocessed/以下に出力されます。
    • 前処理済み各.wavファイルへのパスと、それに対応するラベルが列挙されたファイルが./dataset/jvs_preprocessed/jvs_preprocessed_for_train.txtとして出力されます。

Cythonのモジュールのコンパイル

モジュールmonotonic_alignは高速化のためCythonで実装されています。これをコンパイルします。

  1. cd ./module/model_component/monotonic_align/を実行します。
  2. mkdir monotonic_alignを実行します。
  3. python setup.py build_ext --inplaceでCythonで書かれたモジュールのコンパイルを行います。

学習

  1. python vits_train.pyを実行しVITSの学習を行います。
    • 学習過程が./output/vits/train/以下に出力されます。
    • 学習済みパラメーターが./output/vits/train/iteration295000/netG_cpu.pthなどという形で5000イテレーション毎に出力されます。

推論(テキスト読み上げ)

  1. vits_text_to_speech.pyの39行目付近の変数trained_weight_pathvits_train.pyで出力した学習済みパラメーターへのパスを指定します。
  2. vits_text_to_speech.pyの41行目付近の変数source_textに発話させたい文章を指定します。
  3. vits_text_to_speech.pyの43行目付近の変数target_speaker_idに発話の対象とする話者idを指定します。
    • 話者idは(JVS corpusで決められている話者の番号-1)となります。例えば"jvs010"の話者を指定したい場合は、話者idは9となります。
  4. python vits_text_to_speech.pyを実行しテキストの読み上げを行います。
    • 生成結果が./output/vits/inference/text_to_speech/output.wavとして出力されます。

推論(音声変換)

  1. vits_voice_converter.pyの37行目付近の変数trained_weight_pathvits_train.pyで出力した学習済みパラメーターへのパスを指定します。
  2. vits_voice_converter.pyの39行目付近の変数source_wav_pathに変換元としたいwavファイルへのパスを指定します。
  3. vits_voice_converter.pyの41行目付近の変数source_speaker_idに変換元の話者idを指定します。
  4. vits_voice_converter.pyの43行目付近の変数target_speaker_idに変換先の話者idを指定します。
  5. python vits_voice_converter.pyを実行し推論(音声変換)を行います。
    • 変換結果が./output/vits/inference/voice_conversion/output.wavとして出力されます。

参考

https://arxiv.org/abs/2106.06103
https://github.com/jaywalnut310/vits

vits's People

Contributors

zassou65535 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

vits's Issues

ModuleNotFoundError: No module named 'cmake'

When I tried to build this repository, the following error occurred.

$ pip install -r requirements.txt
Collecting appdirs==1.4.4
  Using cached appdirs-1.4.4-py2.py3-none-any.whl (9.6 kB)
Collecting audioread==2.1.9
  Using cached audioread-2.1.9.tar.gz (377 kB)
Collecting certifi==2021.10.8
  Using cached certifi-2021.10.8-py2.py3-none-any.whl (149 kB)
Collecting cffi==1.15.0
  Using cached cffi-1.15.0-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (446 kB)
Collecting charset-normalizer==2.0.10
  Using cached charset_normalizer-2.0.10-py3-none-any.whl (39 kB)
Collecting cycler==0.11.0
  Using cached cycler-0.11.0-py3-none-any.whl (6.4 kB)
Collecting Cython==0.29.26
  Using cached Cython-0.29.26-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl (1.9 MB)
Collecting decorator==5.1.0
  Using cached decorator-5.1.0-py3-none-any.whl (9.1 kB)
Collecting fonttools==4.28.5
  Using cached fonttools-4.28.5-py3-none-any.whl (890 kB)
Collecting idna==3.3
  Using cached idna-3.3-py3-none-any.whl (61 kB)
Collecting joblib
  Using cached joblib-1.2.0-py3-none-any.whl (297 kB)
Collecting kiwisolver==1.3.2
  Using cached kiwisolver-1.3.2-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl (1.2 MB)
Collecting librosa==0.8.1
  Using cached librosa-0.8.1-py3-none-any.whl (203 kB)
Collecting llvmlite==0.37.0
  Using cached llvmlite-0.37.0-cp38-cp38-manylinux2014_x86_64.whl (26.3 MB)
Collecting matplotlib==3.5.1
  Using cached matplotlib-3.5.1-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl (11.3 MB)
Collecting numba==0.54.1
  Using cached numba-0.54.1-cp38-cp38-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (3.3 MB)
Collecting numpy==1.22
  Using cached numpy-1.22.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (16.8 MB)
Collecting packaging==21.3
  Using cached packaging-21.3-py3-none-any.whl (40 kB)
Collecting Pillow>=9.0.0
  Using cached Pillow-9.4.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.3 MB)
Collecting pooch==1.5.2
  Using cached pooch-1.5.2-py3-none-any.whl (57 kB)
Collecting pycparser==2.21
  Using cached pycparser-2.21-py2.py3-none-any.whl (118 kB)
Collecting pyopenjtalk==0.1.5
  Using cached pyopenjtalk-0.1.5.tar.gz (1.5 MB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... error
  ERROR: Command errored out with exit status 1:
   command: /home/ishihara/VITS/Vits/bin/python /tmp/tmpwy3_2sff get_requires_for_build_wheel /tmp/tmplh_9p_po
       cwd: /tmp/pip-install-o3a7e24s/pyopenjtalk
  Complete output (24 lines):
  Traceback (most recent call last):
    File "/home/ishihara/VITS/Vits/bin/cmake", line 5, in <module>
      from cmake import cmake
  ModuleNotFoundError: No module named 'cmake'
  <string>:26: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
  Traceback (most recent call last):
    File "/tmp/tmpwy3_2sff", line 280, in <module>
      main()
    File "/tmp/tmpwy3_2sff", line 263, in main
      json_out['return_val'] = hook(**hook_input['kwargs'])
    File "/tmp/tmpwy3_2sff", line 114, in get_requires_for_build_wheel
      return hook(config_settings)
    File "/tmp/pip-build-env-4y0bjojz/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 338, in get_requires_for_build_wheel
      return self._get_build_requires(config_settings, requirements=['wheel'])
    File "/tmp/pip-build-env-4y0bjojz/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 320, in _get_build_requires
      self.run_setup()
    File "/tmp/pip-build-env-4y0bjojz/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 484, in run_setup
      super(_BuildMetaLegacyBackend,
    File "/tmp/pip-build-env-4y0bjojz/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 335, in run_setup
      exec(code, locals())
    File "<string>", line 154, in <module>
    File "/usr/lib/python3.8/subprocess.py", line 448, in check_returncode
      raise CalledProcessError(self.returncode, self.args, self.stdout,
  subprocess.CalledProcessError: Command '['cmake', '..', '-DHTS_ENGINE_INCLUDE_DIR=.', '-DHTS_ENGINE_LIB=dummy']' returned non-zero exit status 1.
  ----------------------------------------
ERROR: Command errored out with exit status 1: /home/ishihara/VITS/Vits/bin/python /tmp/tmpwy3_2sff get_requires_for_build_wheel /tmp/tmplh_9p_po Check the logs for full command output.
(Vits) ishihara@ishihara-OptiPlex-7010:~/VITS/VITS$ python
Python 3.8.10 (default, Nov 14 2022, 12:59:47) 
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import cmake
>>> from cmake import cmake
>>> exit()
(Vits) ishihara@ishihara-OptiPlex-7010:~/VITS/VITS$ cmake--version
cmake--version: コマンドが見つかりません
(Vits) ishihara@ishihara-OptiPlex-7010:~/VITS/VITS$ cmake --version
cmake version 3.25.0

CMake suite maintained and supported by Kitware (kitware.com/cmake).

It is sayinng that "ModuleNotFoundError: No module named 'cmake'".
But I did "sudo apt install cmake" and "pip install cmake".
Please let me know how to fix it.

How can I train from a checkpoint?

I read through the code and only find that vits_generator can be load from checkpoint while vits_discriminator can not do it.

How can I handle it?

Renewal repository has ImportError

Today, I cloned the renewal repository, but jvs_preprocessor.py maked the following error.

$ python3 jvs_preprocessor.py
Traceback (most recent call last):
  File "__init__.pxd", line 945, in numpy.import_array
RuntimeError: module compiled against API version 0x10 but this version of numpy is 0xf . Check the section C-API incompatibility at the Troubleshooting ImportError section at https://numpy.org/devdocs/user/troubleshooting-importerror.html#c-api-incompatibility for indications on how to solve this problem .

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "jvs_preprocessor.py", line 10, in <module>
    import pyopenjtalk
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pyopenjtalk/__init__.py", line 20, in <module>
    from .htsengine import HTSEngine
  File "pyopenjtalk/htsengine.pyx", line 8, in init pyopenjtalk.htsengine
  File "__init__.pxd", line 947, in numpy.import_array
ImportError: numpy.core.multiarray failed to import

The environment is as follows:
ubuntu:20.04.1
python:3.8.5
torch:1.10.1+cu113
torchaudio:0.10.1+cu113
numpy:1.22.0

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.