zassou65535 / vits Goto Github PK

View Code? Open in Web Editor NEW

86.0 3.0 10.0 77 KB

VITSによるテキスト読み上げ器&ボイスチェンジャー

License: MIT License

Python 98.78% Cython 1.22%

vits voice-conversion text-to-speech pytorch voice-style-transfer voice-converter

vits's Introduction

VITS

概要

PytorchによるVITSの実装です。
日本語音声のデータセット"JVS corpus"で学習し

テキストの読み上げ
音声間の変換

を行うことができます。
モデルの詳しい解説と音声生成例についてはこちらを参照。

想定環境

Ubuntu20.04
Python 3.10.0
torch==1.13.1+cu117
torchaudio==0.13.1+cu117

ライブラリの詳細はrequirements.txtを参照。
ライブラリはpipによるインストールを推奨します。

プログラム

jvs_preprocessor.pyはJVS corpusに対し前処理を行うプログラムです。
vits_train.pyは前処理済みデータセットを読み込み学習を実行し、学習の過程と学習済みパラメーターを出力するプログラムです。
vits_text_to_speech.pyはvits_train.pyで出力した学習済みパラメーターを読み込み、推論(テキストから音声の生成)を実行、結果を.wav形式で出力するプログラムです。
vits_voice_converter.pyはvits_train.pyで出力した学習済みパラメーターを読み込み、推論(音声間の変換)を実行、結果を.wav形式で出力するプログラムです。

使い方

データセットの用意

JVS corpusをダウンロード、解凍します。
jvs_preprocessor.pyの16行目付近の変数jvs_dataset_pathで、解凍したJVS corpusへのパスを指定します。
python jvs_preprocessor.pyを実行し前処理を実行します。
- データセット中の各.wavファイルがサンプリングレート22050[Hz]へと変換され、./dataset/jvs_preprocessed/jvs_wav_preprocessed/以下に出力されます。
- 前処理済み各.wavファイルへのパスと、それに対応するラベルが列挙されたファイルが./dataset/jvs_preprocessed/jvs_preprocessed_for_train.txtとして出力されます。

Cythonのモジュールのコンパイル

モジュールmonotonic_alignは高速化のためCythonで実装されています。これをコンパイルします。

cd ./module/model_component/monotonic_align/を実行します。
mkdir monotonic_alignを実行します。
python setup.py build_ext --inplaceでCythonで書かれたモジュールのコンパイルを行います。

学習

python vits_train.pyを実行しVITSの学習を行います。
- 学習過程が./output/vits/train/以下に出力されます。
- 学習済みパラメーターが./output/vits/train/iteration295000/netG_cpu.pthなどという形で5000イテレーション毎に出力されます。

推論(テキスト読み上げ)

vits_text_to_speech.pyの39行目付近の変数trained_weight_pathにvits_train.pyで出力した学習済みパラメーターへのパスを指定します。
vits_text_to_speech.pyの41行目付近の変数source_textに発話させたい文章を指定します。
vits_text_to_speech.pyの43行目付近の変数target_speaker_idに発話の対象とする話者idを指定します。
- 話者idは(JVS corpusで決められている話者の番号-1)となります。例えば"jvs010"の話者を指定したい場合は、話者idは9となります。
python vits_text_to_speech.pyを実行しテキストの読み上げを行います。
- 生成結果が./output/vits/inference/text_to_speech/output.wavとして出力されます。

推論(音声変換)

vits_voice_converter.pyの37行目付近の変数trained_weight_pathにvits_train.pyで出力した学習済みパラメーターへのパスを指定します。
vits_voice_converter.pyの39行目付近の変数source_wav_pathに変換元としたいwavファイルへのパスを指定します。
vits_voice_converter.pyの41行目付近の変数source_speaker_idに変換元の話者idを指定します。
vits_voice_converter.pyの43行目付近の変数target_speaker_idに変換先の話者idを指定します。
python vits_voice_converter.pyを実行し推論(音声変換)を行います。
- 変換結果が./output/vits/inference/voice_conversion/output.wavとして出力されます。

参考

https://arxiv.org/abs/2106.06103
https://github.com/jaywalnut310/vits

vits's People

Contributors

Stargazers

Watchers

Forkers

ishine uraroji zillk w-okada chodensei denseicho noli-noli lamy210 karasawakeigo

vits's Issues

deleted

ModuleNotFoundError: No module named 'cmake'

When I tried to build this repository, the following error occurred.

$ pip install -r requirements.txt
Collecting appdirs==1.4.4
  Using cached appdirs-1.4.4-py2.py3-none-any.whl (9.6 kB)
Collecting audioread==2.1.9
  Using cached audioread-2.1.9.tar.gz (377 kB)
Collecting certifi==2021.10.8
  Using cached certifi-2021.10.8-py2.py3-none-any.whl (149 kB)
Collecting cffi==1.15.0
  Using cached cffi-1.15.0-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (446 kB)
Collecting charset-normalizer==2.0.10
  Using cached charset_normalizer-2.0.10-py3-none-any.whl (39 kB)
Collecting cycler==0.11.0
  Using cached cycler-0.11.0-py3-none-any.whl (6.4 kB)
Collecting Cython==0.29.26
  Using cached Cython-0.29.26-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl (1.9 MB)
Collecting decorator==5.1.0
  Using cached decorator-5.1.0-py3-none-any.whl (9.1 kB)
Collecting fonttools==4.28.5
  Using cached fonttools-4.28.5-py3-none-any.whl (890 kB)
Collecting idna==3.3
  Using cached idna-3.3-py3-none-any.whl (61 kB)
Collecting joblib
  Using cached joblib-1.2.0-py3-none-any.whl (297 kB)
Collecting kiwisolver==1.3.2
  Using cached kiwisolver-1.3.2-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl (1.2 MB)
Collecting librosa==0.8.1
  Using cached librosa-0.8.1-py3-none-any.whl (203 kB)
Collecting llvmlite==0.37.0
  Using cached llvmlite-0.37.0-cp38-cp38-manylinux2014_x86_64.whl (26.3 MB)
Collecting matplotlib==3.5.1
  Using cached matplotlib-3.5.1-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl (11.3 MB)
Collecting numba==0.54.1
  Using cached numba-0.54.1-cp38-cp38-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (3.3 MB)
Collecting numpy==1.22
  Using cached numpy-1.22.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (16.8 MB)
Collecting packaging==21.3
  Using cached packaging-21.3-py3-none-any.whl (40 kB)
Collecting Pillow>=9.0.0
  Using cached Pillow-9.4.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.3 MB)
Collecting pooch==1.5.2
  Using cached pooch-1.5.2-py3-none-any.whl (57 kB)
Collecting pycparser==2.21
  Using cached pycparser-2.21-py2.py3-none-any.whl (118 kB)
Collecting pyopenjtalk==0.1.5
  Using cached pyopenjtalk-0.1.5.tar.gz (1.5 MB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... error
  ERROR: Command errored out with exit status 1:
   command: /home/ishihara/VITS/Vits/bin/python /tmp/tmpwy3_2sff get_requires_for_build_wheel /tmp/tmplh_9p_po
       cwd: /tmp/pip-install-o3a7e24s/pyopenjtalk
  Complete output (24 lines):
  Traceback (most recent call last):
    File "/home/ishihara/VITS/Vits/bin/cmake", line 5, in <module>
      from cmake import cmake
  ModuleNotFoundError: No module named 'cmake'
  <string>:26: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
  Traceback (most recent call last):
    File "/tmp/tmpwy3_2sff", line 280, in <module>
      main()
    File "/tmp/tmpwy3_2sff", line 263, in main
      json_out['return_val'] = hook(**hook_input['kwargs'])
    File "/tmp/tmpwy3_2sff", line 114, in get_requires_for_build_wheel
      return hook(config_settings)
    File "/tmp/pip-build-env-4y0bjojz/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 338, in get_requires_for_build_wheel
      return self._get_build_requires(config_settings, requirements=['wheel'])
    File "/tmp/pip-build-env-4y0bjojz/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 320, in _get_build_requires
      self.run_setup()
    File "/tmp/pip-build-env-4y0bjojz/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 484, in run_setup
      super(_BuildMetaLegacyBackend,
    File "/tmp/pip-build-env-4y0bjojz/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 335, in run_setup
      exec(code, locals())
    File "<string>", line 154, in <module>
    File "/usr/lib/python3.8/subprocess.py", line 448, in check_returncode
      raise CalledProcessError(self.returncode, self.args, self.stdout,
  subprocess.CalledProcessError: Command '['cmake', '..', '-DHTS_ENGINE_INCLUDE_DIR=.', '-DHTS_ENGINE_LIB=dummy']' returned non-zero exit status 1.
  ----------------------------------------
ERROR: Command errored out with exit status 1: /home/ishihara/VITS/Vits/bin/python /tmp/tmpwy3_2sff get_requires_for_build_wheel /tmp/tmplh_9p_po Check the logs for full command output.
(Vits) ishihara@ishihara-OptiPlex-7010:~/VITS/VITS$ python
Python 3.8.10 (default, Nov 14 2022, 12:59:47) 
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import cmake
>>> from cmake import cmake
>>> exit()
(Vits) ishihara@ishihara-OptiPlex-7010:~/VITS/VITS$ cmake--version
cmake--version: コマンドが見つかりません
(Vits) ishihara@ishihara-OptiPlex-7010:~/VITS/VITS$ cmake --version
cmake version 3.25.0

CMake suite maintained and supported by Kitware (kitware.com/cmake).

It is sayinng that "ModuleNotFoundError: No module named 'cmake'".
But I did "sudo apt install cmake" and "pip install cmake".
Please let me know how to fix it.

How can I train from a checkpoint?

I read through the code and only find that vits_generator can be load from checkpoint while vits_discriminator can not do it.

How can I handle it?

Renewal repository has ImportError

Today, I cloned the renewal repository, but jvs_preprocessor.py maked the following error.

$ python3 jvs_preprocessor.py
Traceback (most recent call last):
  File "__init__.pxd", line 945, in numpy.import_array
RuntimeError: module compiled against API version 0x10 but this version of numpy is 0xf . Check the section C-API incompatibility at the Troubleshooting ImportError section at https://numpy.org/devdocs/user/troubleshooting-importerror.html#c-api-incompatibility for indications on how to solve this problem .

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "jvs_preprocessor.py", line 10, in <module>
    import pyopenjtalk
  File "/home/ubuntu/.local/lib/python3.8/site-packages/pyopenjtalk/__init__.py", line 20, in <module>
    from .htsengine import HTSEngine
  File "pyopenjtalk/htsengine.pyx", line 8, in init pyopenjtalk.htsengine
  File "__init__.pxd", line 947, in numpy.import_array
ImportError: numpy.core.multiarray failed to import

The environment is as follows:
ubuntu:20.04.1
python:3.8.5
torch:1.10.1+cu113
torchaudio:0.10.1+cu113
numpy:1.22.0

the source wav and the target wav must is seen speaker?

hello!
the code about the vits VOICE CONVERSION,
the source wav and the target wav must is seen speaker?