nnsvs / nnsvs Goto Github PK

View Code? Open in Web Editor NEW

658.0 38.0 83.0 12.31 MB

Neural network-based singing voice synthesis library for research

Home Page: https://nnsvs.github.io

License: MIT License

Python 90.07% Shell 9.66% Jupyter Notebook 0.27%

singing-voice-synthesis dnn pytorch kiritan singing-voice python deep-learning singing-synthesis

nnsvs's Introduction

NNSVS

Neural network-based singing voice synthesis library for research

Documentation can be found at https://nnsvs.github.io.

Citation

@article{yamamoto2022nnsvs,
  title={NNSVS: A Neural Network-Based Singing Voice Synthesis Toolkit},
  author={Yamamoto, Ryuichi and Yoneyama, Reo and Toda, Tomoki},
  journal={arXiv preprint arXiv:2210.15987},
  year={2022}
}

Acknowledgements

The inference implementation of uSFGAN was adapted from chomeyama/HN-UnifiedSourceFilterGAN.
The code for diffusion models was adapted from DiffSinger.

nnsvs's People

Contributors

Stargazers

Watchers

Forkers

inconnu11 apeguero1 xzm2004260 yjlolo kokeshing agangzz kentaro-tachibana kingstorm brotheroak t15cs012 taroushirani airhorizons asus4 nilsec c1a1o1 flottant ntzzc young-sun pppku joovvhan janfschr charlottecuc sdercolin ishine popo0293 libhuman seledreams noise-labs zhangsanfeng86 miblue119 a43992899 onseigousei7 maxmax2016 nicolalandro powei-c funkgeek andyweiqiu renyu106 kio829 jsyzc2019 hertz-pj rottenjonny88 palletsynth underdogliu hongwen-sun playbear668 techthiyanes unilight tsaifangsheng oytunturk macroustc souperduper chomeyama gqy2468 trellixvulnteam dabiden adventhymnals oatsu-gh michaellin99999 abbienew shaun95 gjjjjjane xcyoloxcy chunping-xt kaito-nishizawa tweakoz zengchang233 justpain02 keith-hon asaikaxyz xeniafox kor-svs macguyversmusic wonhuchoi ka-de tricky61 bigshipai syntheticity intunist ink-splatters maxugly

nnsvs's Issues

Add docs for post-filter trianing

Note to myself. This requires a bit more testing.

Create xml file

how do i create xml file, in the streamlit demo it ask's for a xml but im not sure on how to create the xml file, do i just need to type my song in a xml?

How to build a new voicedb?

How can I build a new voicedb by my own? Or is there any documents? (I didn't find it up to now)

Train_resF0 Script Error (Dev2)

Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/nnsvs-0.0.3-py3.8.egg/nnsvs/bin/train_resf0.py", line 374, in my_app
last_dev_loss = train_loop(
File "/usr/local/lib/python3.8/dist-packages/nnsvs-0.0.3-py3.8.egg/nnsvs/bin/train_resf0.py", line 269, in train_loop
loss, log_metrics = train_step(
File "/usr/local/lib/python3.8/dist-packages/nnsvs-0.0.3-py3.8.egg/nnsvs/bin/train_resf0.py", line 87, in train_step
pred_out_feats.masked_select(mask), out_feats.masked_select(mask)
AttributeError: 'list' object has no attribute 'masked_select'

What is this error?

Rename egs to recipes

The name egs is used as a recipe directory for historical reasons, however, I think recipes is more descriptive and better. I am going to rename it if there are no strong objections.

Recipe for opencpop database

https://wenet.org.cn/opencpop/

It would be great if we can support Mandarin SVS using opencpop database.

WaveNet

https://github.com/r9y9/ttslearn/tree/master/ttslearn/wavenet

Python3.8 and OpenSuse fail to install

System:

OS: OpenSuse Linux
python3.8
gcc 11.2.1 20220103 [revision d4a1d3c4b377f1d4acb34fe1b55b5088a3f293f6]

I try to do the following command:

python3.8 -m pip install git+https://github.com/r9y9/nnsvs

and it got error on install pysisy.
The full log is:

Defaulting to user installation because normal site-packages is not writeable
Collecting git+https://github.com/r9y9/nnsvs
  Cloning https://github.com/r9y9/nnsvs to /tmp/pip-req-build-tx7joph0
  Running command git clone --filter=blob:none --quiet https://github.com/r9y9/nnsvs /tmp/pip-req-build-tx7joph0
  Resolved https://github.com/r9y9/nnsvs to commit 45da00218dd0a445c8483f11ac891c6ef00d3925
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Requirement already satisfied: pyworld in ./.local/lib/python3.8/site-packages (from nnsvs==0.0.1) (0.3.0)
Collecting pysinsy
  Using cached pysinsy-0.0.4.tar.gz (1.4 MB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Collecting nnmnkwii
  Using cached nnmnkwii-0.1.1-cp38-cp38-linux_x86_64.whl
Collecting hydra-core<1.2.0,>=1.1.0
  Using cached hydra_core-1.1.1-py3-none-any.whl (145 kB)
Collecting hydra-colorlog>=1.1.0
  Using cached hydra_colorlog-1.1.0-py3-none-any.whl (3.6 kB)
Collecting pysptk
  Using cached pysptk-0.1.20-cp38-cp38-linux_x86_64.whl
Requirement already satisfied: torch>=1.1.0 in ./.local/lib/python3.8/site-packages (from nnsvs==0.0.1) (1.10.1)
Collecting tensorboard
  Using cached tensorboard-2.8.0-py3-none-any.whl (5.8 MB)
Requirement already satisfied: numpy in ./.local/lib/python3.8/site-packages (from nnsvs==0.0.1) (1.19.5)
Requirement already satisfied: torchaudio in ./.local/lib/python3.8/site-packages (from nnsvs==0.0.1) (0.10.1)
Requirement already satisfied: librosa>=0.7.0 in ./.local/lib/python3.8/site-packages (from nnsvs==0.0.1) (0.8.0)
Requirement already satisfied: cython in ./.local/lib/python3.8/site-packages (from nnsvs==0.0.1) (0.29.26)
Collecting colorlog
  Using cached colorlog-6.6.0-py2.py3-none-any.whl (11 kB)
Collecting omegaconf==2.1.*
  Using cached omegaconf-2.1.1-py3-none-any.whl (74 kB)
Requirement already satisfied: importlib-resources in ./.local/lib/python3.8/site-packages (from hydra-core<1.2.0,>=1.1.0->nnsvs==0.0.1) (5.4.0)
Collecting antlr4-python3-runtime==4.8
  Using cached antlr4-python3-runtime-4.8.tar.gz (112 kB)
  Preparing metadata (setup.py) ... done
Requirement already satisfied: PyYAML>=5.1.0 in ./.local/lib/python3.8/site-packages (from omegaconf==2.1.*->hydra-core<1.2.0,>=1.1.0->nnsvs==0.0.1) (5.4.1)
Requirement already satisfied: audioread>=2.0.0 in ./.local/lib/python3.8/site-packages (from librosa>=0.7.0->nnsvs==0.0.1) (2.1.9)
Requirement already satisfied: decorator>=3.0.0 in /usr/lib/python3.8/site-packages (from librosa>=0.7.0->nnsvs==0.0.1) (5.1.1)
Requirement already satisfied: joblib>=0.14 in ./.local/lib/python3.8/site-packages (from librosa>=0.7.0->nnsvs==0.0.1) (1.0.1)
Requirement already satisfied: numba>=0.43.0 in ./.local/lib/python3.8/site-packages (from librosa>=0.7.0->nnsvs==0.0.1) (0.53.0)
Requirement already satisfied: pooch>=1.0 in ./.local/lib/python3.8/site-packages (from librosa>=0.7.0->nnsvs==0.0.1) (1.5.1)
Requirement already satisfied: resampy>=0.2.2 in ./.local/lib/python3.8/site-packages (from librosa>=0.7.0->nnsvs==0.0.1) (0.2.2)
Requirement already satisfied: scikit-learn!=0.19.0,>=0.14.0 in ./.local/lib/python3.8/site-packages (from librosa>=0.7.0->nnsvs==0.0.1) (0.24.2)
Requirement already satisfied: scipy>=1.0.0 in ./.local/lib/python3.8/site-packages (from librosa>=0.7.0->nnsvs==0.0.1) (1.7.1)
Requirement already satisfied: soundfile>=0.9.0 in ./.local/lib/python3.8/site-packages (from librosa>=0.7.0->nnsvs==0.0.1) (0.10.3.post1)
Requirement already satisfied: typing-extensions in ./.local/lib/python3.8/site-packages (from torch>=1.1.0->nnsvs==0.0.1) (3.10.0.2)
Requirement already satisfied: tqdm in ./.local/lib/python3.8/site-packages (from nnmnkwii->nnsvs==0.0.1) (4.62.3)
Collecting fastdtw
  Using cached fastdtw-0.3.4.tar.gz (133 kB)
  Preparing metadata (setup.py) ... done
Requirement already satisfied: six in /usr/lib/python3.8/site-packages (from pysptk->nnsvs==0.0.1) (1.16.0)
Requirement already satisfied: werkzeug>=0.11.15 in ./.local/lib/python3.8/site-packages (from tensorboard->nnsvs==0.0.1) (2.0.2)
Collecting markdown>=2.6.8
  Using cached Markdown-3.3.6-py3-none-any.whl (97 kB)
Requirement already satisfied: protobuf>=3.6.0 in /usr/lib/python3.8/site-packages (from tensorboard->nnsvs==0.0.1) (3.19.4)
Collecting wheel>=0.26
  Using cached wheel-0.37.1-py2.py3-none-any.whl (35 kB)
Requirement already satisfied: setuptools>=41.0.0 in /usr/lib/python3.8/site-packages (from tensorboard->nnsvs==0.0.1) (58.3.0)
Collecting tensorboard-plugin-wit>=1.6.0
  Using cached tensorboard_plugin_wit-1.8.1-py3-none-any.whl (781 kB)
Requirement already satisfied: requests<3,>=2.21.0 in /usr/lib/python3.8/site-packages (from tensorboard->nnsvs==0.0.1) (2.27.1)
Requirement already satisfied: grpcio>=1.24.3 in ./.local/lib/python3.8/site-packages (from tensorboard->nnsvs==0.0.1) (1.43.0)
Collecting tensorboard-data-server<0.7.0,>=0.6.0
  Using cached tensorboard_data_server-0.6.1-py3-none-manylinux2010_x86_64.whl (4.9 MB)
Collecting google-auth-oauthlib<0.5,>=0.4.1
  Using cached google_auth_oauthlib-0.4.6-py2.py3-none-any.whl (18 kB)
Requirement already satisfied: google-auth<3,>=1.6.3 in ./.local/lib/python3.8/site-packages (from tensorboard->nnsvs==0.0.1) (2.6.0)
Requirement already satisfied: absl-py>=0.4 in ./.local/lib/python3.8/site-packages (from tensorboard->nnsvs==0.0.1) (1.0.0)
Requirement already satisfied: pyasn1-modules>=0.2.1 in ./.local/lib/python3.8/site-packages (from google-auth<3,>=1.6.3->tensorboard->nnsvs==0.0.1) (0.2.8)
Requirement already satisfied: cachetools<6.0,>=2.0.0 in ./.local/lib/python3.8/site-packages (from google-auth<3,>=1.6.3->tensorboard->nnsvs==0.0.1) (4.2.4)
Requirement already satisfied: rsa<5,>=3.1.4 in ./.local/lib/python3.8/site-packages (from google-auth<3,>=1.6.3->tensorboard->nnsvs==0.0.1) (4.8)
Collecting requests-oauthlib>=0.7.0
  Using cached requests_oauthlib-1.3.1-py2.py3-none-any.whl (23 kB)
Requirement already satisfied: importlib-metadata>=4.4 in /usr/lib/python3.8/site-packages (from markdown>=2.6.8->tensorboard->nnsvs==0.0.1) (4.8.2)
Requirement already satisfied: llvmlite<0.37,>=0.36.0rc1 in ./.local/lib/python3.8/site-packages (from numba>=0.43.0->librosa>=0.7.0->nnsvs==0.0.1) (0.36.0)
Requirement already satisfied: packaging in /usr/lib/python3.8/site-packages (from pooch>=1.0->librosa>=0.7.0->nnsvs==0.0.1) (21.3)
Requirement already satisfied: appdirs in /usr/lib/python3.8/site-packages (from pooch>=1.0->librosa>=0.7.0->nnsvs==0.0.1) (1.4.4)
Requirement already satisfied: certifi>=2017.4.17 in /usr/lib/python3.8/site-packages (from requests<3,>=2.21.0->tensorboard->nnsvs==0.0.1) (2021.10.8)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/lib/python3.8/site-packages (from requests<3,>=2.21.0->tensorboard->nnsvs==0.0.1) (1.26.7)
Requirement already satisfied: charset_normalizer~=2.0.0 in /usr/lib/python3.8/site-packages (from requests<3,>=2.21.0->tensorboard->nnsvs==0.0.1) (2.0.12)
Requirement already satisfied: idna<4,>=2.5 in /usr/lib/python3.8/site-packages (from requests<3,>=2.21.0->tensorboard->nnsvs==0.0.1) (3.3)
Requirement already satisfied: threadpoolctl>=2.0.0 in ./.local/lib/python3.8/site-packages (from scikit-learn!=0.19.0,>=0.14.0->librosa>=0.7.0->nnsvs==0.0.1) (2.2.0)
Requirement already satisfied: cffi>=1.0 in /usr/lib64/python3.8/site-packages (from soundfile>=0.9.0->librosa>=0.7.0->nnsvs==0.0.1) (1.15.0)
Requirement already satisfied: zipp>=3.1.0 in ./.local/lib/python3.8/site-packages (from importlib-resources->hydra-core<1.2.0,>=1.1.0->nnsvs==0.0.1) (3.6.0)
Requirement already satisfied: pycparser in /usr/lib/python3.8/site-packages (from cffi>=1.0->soundfile>=0.9.0->librosa>=0.7.0->nnsvs==0.0.1) (2.21)
Requirement already satisfied: pyasn1<0.5.0,>=0.4.6 in /usr/lib/python3.8/site-packages (from pyasn1-modules>=0.2.1->google-auth<3,>=1.6.3->tensorboard->nnsvs==0.0.1) (0.4.8)
Collecting oauthlib>=3.0.0
  Using cached oauthlib-3.2.0-py3-none-any.whl (151 kB)
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /usr/lib/python3.8/site-packages (from packaging->pooch>=1.0->librosa>=0.7.0->nnsvs==0.0.1) (3.0.7)
Using legacy 'setup.py install' for antlr4-python3-runtime, since package 'wheel' is not installed.
Using legacy 'setup.py install' for fastdtw, since package 'wheel' is not installed.
Building wheels for collected packages: nnsvs, pysinsy
  Building wheel for nnsvs (pyproject.toml) ... done
  Created wheel for nnsvs: filename=nnsvs-0.0.1-py3-none-any.whl size=61665 sha256=8c1a8989cc532852e80fbe6d85313186d6e64313be5a15fab093921ab7c8160c
  Stored in directory: /tmp/pip-ephem-wheel-cache-colbtxkx/wheels/44/5a/0c/bf20bdcedc52be8c565ed2f7fe9be8b7564c4ecb212adf42e3
  Building wheel for pysinsy (pyproject.toml) ... error
  error: subprocess-exited-with-error
  
  × Building wheel for pysinsy (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [141 lines of output]
      fatal: non è un repository Git (né lo è alcun genitore fino al punto di mount /)
      Mi fermo al limite del filesystem (l'opzione GIT_DISCOVERY_ACROSS_FILESYSTEM non è impostata).
      running bdist_wheel
      running build
      running build_py
      -- Building version 0.0.4
      creating build
      creating build/lib.linux-x86_64-3.8
      creating build/lib.linux-x86_64-3.8/pysinsy
      copying pysinsy/version.py -> build/lib.linux-x86_64-3.8/pysinsy
      copying pysinsy/__init__.py -> build/lib.linux-x86_64-3.8/pysinsy
      creating build/lib.linux-x86_64-3.8/pysinsy/htsvoice
      copying pysinsy/htsvoice/nitech_jp_song070_f001.htsvoice -> build/lib.linux-x86_64-3.8/pysinsy/htsvoice
      copying pysinsy/htsvoice/COPYING -> build/lib.linux-x86_64-3.8/pysinsy/htsvoice
      creating build/lib.linux-x86_64-3.8/pysinsy/_dic
      copying pysinsy/_dic/japanese.utf_8.table -> build/lib.linux-x86_64-3.8/pysinsy/_dic
      copying pysinsy/_dic/japanese.utf_8.conf -> build/lib.linux-x86_64-3.8/pysinsy/_dic
      copying pysinsy/_dic/japanese.shift_jis.table -> build/lib.linux-x86_64-3.8/pysinsy/_dic
      copying pysinsy/_dic/japanese.shift_jis.conf -> build/lib.linux-x86_64-3.8/pysinsy/_dic
      copying pysinsy/_dic/japanese.macron -> build/lib.linux-x86_64-3.8/pysinsy/_dic
      copying pysinsy/_dic/japanese.euc_jp.table -> build/lib.linux-x86_64-3.8/pysinsy/_dic
      copying pysinsy/_dic/japanese.euc_jp.conf -> build/lib.linux-x86_64-3.8/pysinsy/_dic
      copying pysinsy/_dic/COPYING -> build/lib.linux-x86_64-3.8/pysinsy/_dic
      running build_ext
      skipping 'pysinsy/sinsy.cpp' Cython extension (up-to-date)
      building 'pysinsy.sinsy' extension
      creating build/temp.linux-x86_64-3.8
      creating build/temp.linux-x86_64-3.8/lib
      creating build/temp.linux-x86_64-3.8/lib/sinsy
      creating build/temp.linux-x86_64-3.8/lib/sinsy/src
      creating build/temp.linux-x86_64-3.8/lib/sinsy/src/lib
      creating build/temp.linux-x86_64-3.8/lib/sinsy/src/lib/converter
      creating build/temp.linux-x86_64-3.8/lib/sinsy/src/lib/hts_engine_API
      creating build/temp.linux-x86_64-3.8/lib/sinsy/src/lib/hts_engine_API/hts_engine
      creating build/temp.linux-x86_64-3.8/lib/sinsy/src/lib/hts_engine_API/hts_engine/src
      creating build/temp.linux-x86_64-3.8/lib/sinsy/src/lib/hts_engine_API/hts_engine/src/lib
      creating build/temp.linux-x86_64-3.8/lib/sinsy/src/lib/japanese
      creating build/temp.linux-x86_64-3.8/lib/sinsy/src/lib/label
      creating build/temp.linux-x86_64-3.8/lib/sinsy/src/lib/score
      creating build/temp.linux-x86_64-3.8/lib/sinsy/src/lib/temporary
      creating build/temp.linux-x86_64-3.8/lib/sinsy/src/lib/util
      creating build/temp.linux-x86_64-3.8/lib/sinsy/src/lib/xml
      creating build/temp.linux-x86_64-3.8/pysinsy
      gcc -Wno-unused-result -Wsign-compare -DNDEBUG -O2 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector-strong -funwind-tables -fasynchronous-unwind-tables -fstack-clash-protection -Werror=return-type -g -DOPENSSL_LOAD_CONF -fwrapv -fno-semantic-interposition -O2 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector-strong -funwind-tables -fasynchronous-unwind-tables -fstack-clash-protection -Werror=return-type -g -O2 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector-strong -funwind-tables -fasynchronous-unwind-tables -fstack-clash-protection -Werror=return-type -g -fPIC -I/tmp/pip-build-env-m_isy4_t/overlay/lib64/python3.8/site-packages/numpy/core/include -I/tmp/pip-install-h73vdvwf/pysinsy_1d0e7929da384a8f85231a0741ad4f7a/lib/sinsy/src/lib -I/tmp/pip-install-h73vdvwf/pysinsy_1d0e7929da384a8f85231a0741ad4f7a/lib/sinsy/src/lib/converter -I/tmp/pip-install-h73vdvwf/pysinsy_1d0e7929da384a8f85231a0741ad4f7a/lib/sinsy/src/lib/japanese -I/tmp/pip-install-h73vdvwf/pysinsy_1d0e7929da384a8f85231a0741ad4f7a/lib/sinsy/src/lib/label -I/tmp/pip-install-h73vdvwf/pysinsy_1d0e7929da384a8f85231a0741ad4f7a/lib/sinsy/src/lib/score -I/tmp/pip-install-h73vdvwf/pysinsy_1d0e7929da384a8f85231a0741ad4f7a/lib/sinsy/src/lib/temporary -I/tmp/pip-install-h73vdvwf/pysinsy_1d0e7929da384a8f85231a0741ad4f7a/lib/sinsy/src/lib/util -I/tmp/pip-install-h73vdvwf/pysinsy_1d0e7929da384a8f85231a0741ad4f7a/lib/sinsy/src/lib/xml -I/tmp/pip-install-h73vdvwf/pysinsy_1d0e7929da384a8f85231a0741ad4f7a/lib/sinsy/src/lib/hts_engine_API -I/tmp/pip-install-h73vdvwf/pysinsy_1d0e7929da384a8f85231a0741ad4f7a/lib/sinsy/src/lib/hts_engine_API/hts_engine/src/lib -Ilib/sinsy/src/include/sinsy -Ilib/sinsy/src/lib/hts_engine_API/hts_engine/src/include -I/usr/include/python3.8 -c lib/sinsy/src/lib/Sinsy.cpp -o build/temp.linux-x86_64-3.8/lib/sinsy/src/lib/Sinsy.o
      In file included from /tmp/pip-install-h73vdvwf/pysinsy_1d0e7929da384a8f85231a0741ad4f7a/lib/sinsy/src/lib/xml/XmlReader.h:50,
                       from lib/sinsy/src/lib/Sinsy.cpp:47:
      /tmp/pip-install-h73vdvwf/pysinsy_1d0e7929da384a8f85231a0741ad4f7a/lib/sinsy/src/lib/xml/XmlParser.h:62:66: error: ISO C++17 does not allow dynamic exception specifications
         62 |    XmlData* read(IReadableStream& stream, std::string& encoding) throw (StreamException);
            |                                                                  ^~~~~
      In file included from /tmp/pip-install-h73vdvwf/pysinsy_1d0e7929da384a8f85231a0741ad4f7a/lib/sinsy/src/lib/util/InputFile.h:47,
                       from lib/sinsy/src/lib/Sinsy.cpp:49:
      /tmp/pip-install-h73vdvwf/pysinsy_1d0e7929da384a8f85231a0741ad4f7a/lib/sinsy/src/lib/util/IReadableStream.h:64:51: error: ISO C++17 does not allow dynamic exception specifications
         64 |    virtual size_t read(void* buffer, size_t byte) throw (StreamException) = 0;
            |                                                   ^~~~~
      /tmp/pip-install-h73vdvwf/pysinsy_1d0e7929da384a8f85231a0741ad4f7a/lib/sinsy/src/lib/util/IReadableStream.h:71:62: error: ISO C++17 does not allow dynamic exception specifications
         71 | IReadableStream& fromStream(IReadableStream& stream, T& buf) throw (StreamException)
            |                                                              ^~~~~
      /tmp/pip-install-h73vdvwf/pysinsy_1d0e7929da384a8f85231a0741ad4f7a/lib/sinsy/src/lib/util/IReadableStream.h:88:72: error: ISO C++17 does not allow dynamic exception specifications
         88 | inline IReadableStream& operator>>(IReadableStream& stream, char& buf) throw (StreamException)
            |                                                                        ^~~~~
      /tmp/pip-install-h73vdvwf/pysinsy_1d0e7929da384a8f85231a0741ad4f7a/lib/sinsy/src/lib/util/IReadableStream.h:96:81: error: ISO C++17 does not allow dynamic exception specifications
         96 | inline IReadableStream& operator>>(IReadableStream& stream, unsigned char& buf) throw (StreamException)
            |                                                                                 ^~~~~
      /tmp/pip-install-h73vdvwf/pysinsy_1d0e7929da384a8f85231a0741ad4f7a/lib/sinsy/src/lib/util/IReadableStream.h:104:73: error: ISO C++17 does not allow dynamic exception specifications
        104 | inline IReadableStream& operator>>(IReadableStream& stream, INT16& buf) throw (StreamException)
            |                                                                         ^~~~~
      /tmp/pip-install-h73vdvwf/pysinsy_1d0e7929da384a8f85231a0741ad4f7a/lib/sinsy/src/lib/util/IReadableStream.h:112:74: error: ISO C++17 does not allow dynamic exception specifications
        112 | inline IReadableStream& operator>>(IReadableStream& stream, UINT16& buf) throw (StreamException)
            |                                                                          ^~~~~
      /tmp/pip-install-h73vdvwf/pysinsy_1d0e7929da384a8f85231a0741ad4f7a/lib/sinsy/src/lib/util/IReadableStream.h:120:73: error: ISO C++17 does not allow dynamic exception specifications
        120 | inline IReadableStream& operator>>(IReadableStream& stream, INT32& buf) throw (StreamException)
            |                                                                         ^~~~~
      /tmp/pip-install-h73vdvwf/pysinsy_1d0e7929da384a8f85231a0741ad4f7a/lib/sinsy/src/lib/util/IReadableStream.h:128:74: error: ISO C++17 does not allow dynamic exception specifications
        128 | inline IReadableStream& operator>>(IReadableStream& stream, UINT32& buf) throw (StreamException)
            |                                                                          ^~~~~
      /tmp/pip-install-h73vdvwf/pysinsy_1d0e7929da384a8f85231a0741ad4f7a/lib/sinsy/src/lib/util/IReadableStream.h:136:73: error: ISO C++17 does not allow dynamic exception specifications
        136 | inline IReadableStream& operator>>(IReadableStream& stream, INT64& buf) throw (StreamException)
            |                                                                         ^~~~~
      /tmp/pip-install-h73vdvwf/pysinsy_1d0e7929da384a8f85231a0741ad4f7a/lib/sinsy/src/lib/util/IReadableStream.h:144:74: error: ISO C++17 does not allow dynamic exception specifications
        144 | inline IReadableStream& operator>>(IReadableStream& stream, UINT64& buf) throw (StreamException)
            |                                                                          ^~~~~
      /tmp/pip-install-h73vdvwf/pysinsy_1d0e7929da384a8f85231a0741ad4f7a/lib/sinsy/src/lib/util/IReadableStream.h:152:73: error: ISO C++17 does not allow dynamic exception specifications
        152 | inline IReadableStream& operator>>(IReadableStream& stream, float& buf) throw (StreamException)
            |                                                                         ^~~~~
      /tmp/pip-install-h73vdvwf/pysinsy_1d0e7929da384a8f85231a0741ad4f7a/lib/sinsy/src/lib/util/IReadableStream.h:160:74: error: ISO C++17 does not allow dynamic exception specifications
        160 | inline IReadableStream& operator>>(IReadableStream& stream, double& buf) throw (StreamException)
            |                                                                          ^~~~~
      /tmp/pip-install-h73vdvwf/pysinsy_1d0e7929da384a8f85231a0741ad4f7a/lib/sinsy/src/lib/util/IReadableStream.h:168:79: error: ISO C++17 does not allow dynamic exception specifications
        168 | inline IReadableStream& operator>>(IReadableStream& stream, long double& buf) throw (StreamException)
            |                                                                               ^~~~~
      In file included from lib/sinsy/src/lib/Sinsy.cpp:49:
      /tmp/pip-install-h73vdvwf/pysinsy_1d0e7929da384a8f85231a0741ad4f7a/lib/sinsy/src/lib/util/InputFile.h:65:43: error: ISO C++17 does not allow dynamic exception specifications
         65 |    size_t read(void* buffer, size_t size) throw (StreamException);
            |                                           ^~~~~
      In file included from /tmp/pip-install-h73vdvwf/pysinsy_1d0e7929da384a8f85231a0741ad4f7a/lib/sinsy/src/lib/util/OutputFile.h:47,
                       from lib/sinsy/src/lib/Sinsy.cpp:50:
      /tmp/pip-install-h73vdvwf/pysinsy_1d0e7929da384a8f85231a0741ad4f7a/lib/sinsy/src/lib/util/IWritableStream.h:63:58: error: ISO C++17 does not allow dynamic exception specifications
         63 |    virtual size_t write(const void* buffer, size_t byte) throw (StreamException) = 0;
            |                                                          ^~~~~
      /tmp/pip-install-h73vdvwf/pysinsy_1d0e7929da384a8f85231a0741ad4f7a/lib/sinsy/src/lib/util/IWritableStream.h:87:71: error: ISO C++17 does not allow dynamic exception specifications
         87 | inline IWritableStream& operator<<(IWritableStream& stream, char buf) throw (StreamException)
            |                                                                       ^~~~~
      /tmp/pip-install-h73vdvwf/pysinsy_1d0e7929da384a8f85231a0741ad4f7a/lib/sinsy/src/lib/util/IWritableStream.h:95:80: error: ISO C++17 does not allow dynamic exception specifications
         95 | inline IWritableStream& operator<<(IWritableStream& stream, unsigned char buf) throw (StreamException)
            |                                                                                ^~~~~
      /tmp/pip-install-h73vdvwf/pysinsy_1d0e7929da384a8f85231a0741ad4f7a/lib/sinsy/src/lib/util/IWritableStream.h:103:72: error: ISO C++17 does not allow dynamic exception specifications
        103 | inline IWritableStream& operator<<(IWritableStream& stream, INT16 buf) throw (StreamException)
            |                                                                        ^~~~~
      /tmp/pip-install-h73vdvwf/pysinsy_1d0e7929da384a8f85231a0741ad4f7a/lib/sinsy/src/lib/util/IWritableStream.h:111:73: error: ISO C++17 does not allow dynamic exception specifications
        111 | inline IWritableStream& operator<<(IWritableStream& stream, UINT16 buf) throw (StreamException)
            |                                                                         ^~~~~
      /tmp/pip-install-h73vdvwf/pysinsy_1d0e7929da384a8f85231a0741ad4f7a/lib/sinsy/src/lib/util/IWritableStream.h:119:72: error: ISO C++17 does not allow dynamic exception specifications
        119 | inline IWritableStream& operator<<(IWritableStream& stream, INT32 buf) throw (StreamException)
            |                                                                        ^~~~~
      /tmp/pip-install-h73vdvwf/pysinsy_1d0e7929da384a8f85231a0741ad4f7a/lib/sinsy/src/lib/util/IWritableStream.h:127:73: error: ISO C++17 does not allow dynamic exception specifications
        127 | inline IWritableStream& operator<<(IWritableStream& stream, UINT32 buf) throw (StreamException)
            |                                                                         ^~~~~
      /tmp/pip-install-h73vdvwf/pysinsy_1d0e7929da384a8f85231a0741ad4f7a/lib/sinsy/src/lib/util/IWritableStream.h:135:72: error: ISO C++17 does not allow dynamic exception specifications
        135 | inline IWritableStream& operator<<(IWritableStream& stream, INT64 buf) throw (StreamException)
            |                                                                        ^~~~~
      /tmp/pip-install-h73vdvwf/pysinsy_1d0e7929da384a8f85231a0741ad4f7a/lib/sinsy/src/lib/util/IWritableStream.h:143:73: error: ISO C++17 does not allow dynamic exception specifications
        143 | inline IWritableStream& operator<<(IWritableStream& stream, UINT64 buf) throw (StreamException)
            |                                                                         ^~~~~
      /tmp/pip-install-h73vdvwf/pysinsy_1d0e7929da384a8f85231a0741ad4f7a/lib/sinsy/src/lib/util/IWritableStream.h:151:72: error: ISO C++17 does not allow dynamic exception specifications
        151 | inline IWritableStream& operator<<(IWritableStream& stream, float buf) throw (StreamException)
            |                                                                        ^~~~~
      /tmp/pip-install-h73vdvwf/pysinsy_1d0e7929da384a8f85231a0741ad4f7a/lib/sinsy/src/lib/util/IWritableStream.h:159:73: error: ISO C++17 does not allow dynamic exception specifications
        159 | inline IWritableStream& operator<<(IWritableStream& stream, double buf) throw (StreamException)
            |                                                                         ^~~~~
      /tmp/pip-install-h73vdvwf/pysinsy_1d0e7929da384a8f85231a0741ad4f7a/lib/sinsy/src/lib/util/IWritableStream.h:167:78: error: ISO C++17 does not allow dynamic exception specifications
        167 | inline IWritableStream& operator<<(IWritableStream& stream, long double buf) throw (StreamException)
            |                                                                              ^~~~~
      In file included from lib/sinsy/src/lib/Sinsy.cpp:50:
      /tmp/pip-install-h73vdvwf/pysinsy_1d0e7929da384a8f85231a0741ad4f7a/lib/sinsy/src/lib/util/OutputFile.h:65:50: error: ISO C++17 does not allow dynamic exception specifications
         65 |    size_t write(const void* buffer, size_t size) throw (StreamException);
            |                                                  ^~~~~
      In file included from lib/sinsy/src/lib/Sinsy.cpp:51:
      /tmp/pip-install-h73vdvwf/pysinsy_1d0e7929da384a8f85231a0741ad4f7a/lib/sinsy/src/lib/util/WritableStrStream.h:67:48: error: ISO C++17 does not allow dynamic exception specifications
         67 |    WritableStrStream& operator<<(const T& buf) throw (StreamException) {
            |                                                ^~~~~
      error: command 'gcc' failed with exit status 1
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for pysinsy
Successfully built nnsvs
Failed to build pysinsy
ERROR: Could not build wheels for pysinsy, which is required to install pyproject.toml-based projects

For now I document It there, in the future I will try to install pysisy mannualy to try to fix it.

How do I train NNSVS?

Could you add instructions for how to train NNSVS on a new data set? I have 50 English language wav/lab files and would like to train the model afresh.

Broken colab notebook

Not a priority, but I will fix it at some point.

init.py under nnsvs/bin/conf missing after installation.

When I tried to run stage 1 of nit-song-070/00-svs-world recipe on Google Colaboratory, I got the error as this;

stage 1: Feature generation
/usr/local/lib/python3.6/dist-packages/hydra/core/utils.py:204: UserWarning: 
Using config_path to specify the config name is deprecated, specify the config name via config_name
See https://hydra.cc/docs/next/upgrades/0.11_to_1.0/config_path_changes
  warnings.warn(category=UserWarning, message=msg)
Primary config module 'nnsvs.bin.conf.prepare_features' not found.
Check that it's correct and contains an __init__.py file

I found __init__.py files under /usr/local/lib/python3.6/dist-packages/nnsvs/bin/conf are missing.

I uploaded the failed ipynb[1] so please see it for detail.

https://gist.github.com/taroushirani/d35aa92d78bebf6559ab6e5e4306c8f8

Objective evaluation

F0 RMSE
MCD or LSD

It would be great if we have objective evaluations in our recipes. In particular, F0 RMSE will be very useful to check the F0 modeling accuracy.

Stuck trying out your guide

First of all thank you for providing such work, it is immensely helpful.
I was trying to follow through the guide you provided. Along the way I am encountering issues. If you could help, it is greatly appreciated.

1

# NOTE: 01.xml and 02.xml were not included in the training data
# 03.xml - 37.xml were used for training.
labels = xml2lab("kiritan_singing/musicxml/01.xml").round_()

Rounding the labels results in an error.

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-24-26976f3a1f25> in <module>
      1 # NOTE: 01.xml and 02.xml were not included in the training data
      2 # 03.xml - 37.xml were used for training.
----> 3 labels = xml2lab("kiritan_singing/musicxml/01.xml").round_()

AttributeError: 'HTSLabelFile' object has no attribute 'round_'

I tried removing .round_() to move on.

2

question_path = join(model_dir, "jp_qst001_nnsvs.hed")
binary_dict, continuous_dict = hts.load_question_set(question_path, append_hat_for_LL=False)

hts.load_question_set does not take the append_hat_for_LL argument.

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-25-bd0c3bcd3958> in <module>
      1 question_path = join(model_dir, "jp_qst001_nnsvs.hed")
----> 2 binary_dict, continuous_dict = hts.load_question_set(question_path, append_hat_for_LL=False)

TypeError: load_question_set() got an unexpected keyword argument 'append_hat_for_LL'

I tried removing , append_hat_for_LL=False to move on.

3

lag = predict_timelag(device, labels, timelag_model, timelag_in_scaler,
    timelag_out_scaler, binary_dict, continuous_dict, pitch_indices,
    log_f0_conditioning)
lag.shape

Labels cannot resolve indices fed in as a list.

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-29-6cad232e7ca5> in <module>
      1 lag = predict_timelag(device, labels, timelag_model, timelag_in_scaler,
      2     timelag_out_scaler, binary_dict, continuous_dict, pitch_indices,
----> 3     log_f0_conditioning)
      4 lag.shape

~/dreamtonics/nnsvs/nnsvs/gen.py in predict_timelag(device, labels, timelag_model, timelag_in_scaler, timelag_out_scaler, binary_dict, continuous_dict, pitch_indices, log_f0_conditioning, allowed_range)
     46     # Extract note-level labels
     47     note_indices = get_note_indices(labels)
---> 48     note_labels = labels[note_indices]
     49 
     50     # Extract musical/linguistic context

~/anaconda3/lib/python3.6/site-packages/nnmnkwii/io/hts.py in __getitem__(self, idx)
    103 
    104     def __getitem__(self, idx):
--> 105         return self.start_times[idx], self.end_times[idx], self.contexts[idx]
    106 
    107     def __str__(self):

TypeError: list indices must be integers or slices, not list

4

In gen.py there is another use of round_(). Here I commented it out.

def predict_timelag(device, labels, timelag_model, timelag_in_scaler, timelag_out_scaler,
        binary_dict, continuous_dict,
        pitch_indices=None, log_f0_conditioning=True, allowed_range=[-30, 30]):
    # round start/end times just in case.
    # labels.round_()

Setup documenattion site using sphinx

TODOs

Setup Travis CI for deploy

Docs

Installation
Coding style
Demo notebook
API docs (not a complete doc though)

Introduce pysen to unify python coding styles

I am going to introduce https://github.com/pfnet/pysen to unify coding style in the codebase. If you have any objections, please let me know.

Numerical instabilities of mdn_loss

Hello, I found some numerical instabilities of mdn_loss in mdn.py.

(1) When I tested Conv1dResnetMDN(Conv1dResnet + MDN)[1], the back propagation of pow in torch.distributions.Normal returned nan[2]. The mechanism of this I guess is;

i. Because we clipped the minimum end of log_prob, log_sigma went bigger to prevent the probability from going smaller.
ii. "scale=exp(log_sigma)" went +inf and the back propagation of "var = (self.scale ** 2)" went nan.

This is fixed by using the centered target instead of target and clipping it within +/-5SD[3] as you recommended in the PR of MDN[4].

(2) After changing as above, logsumexp in mdn_loss still returns nan occasionally.

20%|##        | 10/50 [01:40<06:32,  9.81s/it][W ..\torch\csrc\autograd\python_anomaly_mode.cpp:104] Warning: Error detected in LogsumexpBackward. Traceback of
 forward call that caused the error:
  File "d:\cygwin64\opt\miniconda3\envs\nnsvs\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "d:\cygwin64\opt\miniconda3\envs\nnsvs\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "D:\cygwin64\opt\miniconda3\envs\nnsvs\Scripts\nnsvs-train.exe\__main__.py", line 7, in <module>
    sys.exit(entry())
  File "d:\cygwin64\opt\miniconda3\envs\nnsvs\lib\site-packages\nnsvs\bin\train.py", line 275, in entry
    my_app()
  File "d:\cygwin64\opt\miniconda3\envs\nnsvs\lib\site-packages\hydra\main.py",line 37, in decorated_main
    strict=strict,
  File "d:\cygwin64\opt\miniconda3\envs\nnsvs\lib\site-packages\hydra\_internal\utils.py", line 356, in _run_hydra
    lambda: hydra.run(
  File "d:\cygwin64\opt\miniconda3\envs\nnsvs\lib\site-packages\hydra\_internal\utils.py", line 207, in run_and_report
    return func()
  File "d:\cygwin64\opt\miniconda3\envs\nnsvs\lib\site-packages\hydra\_internal\utils.py", line 359, in <lambda>
    overrides=args.overrides,
  File "d:\cygwin64\opt\miniconda3\envs\nnsvs\lib\site-packages\hydra\_internal\hydra.py", line 112, in run
    configure_logging=with_log_configuration,
  File "d:\cygwin64\opt\miniconda3\envs\nnsvs\lib\site-packages\hydra\core\utils.py", line 125, in run_job
    ret.return_value = task_function(task_cfg)
  File "d:\cygwin64\opt\miniconda3\envs\nnsvs\lib\site-packages\nnsvs\bin\train.py", line 271, in my_app
    train_loop(config, device, model, optimizer, lr_scheduler, data_loaders)
  File "d:\cygwin64\opt\miniconda3\envs\nnsvs\lib\site-packages\nnsvs\bin\train.py", line 175, in train_loop
    loss = mdn_loss(pi, sigma, mu, y, reduce=False).masked_select(mask).mean()
  File "d:\cygwin64\opt\miniconda3\envs\nnsvs\lib\site-packages\nnsvs\mdn.py", line 104, in mdn_loss
    loss = -torch.logsumexp(loss, dim=2)
 (function _print_stack)
 20%|##        | 10/50 [01:47<07:09, 10.74s/it]
Traceback (most recent call last):
  File "d:\cygwin64\opt\miniconda3\envs\nnsvs\lib\site-packages\nnsvs\bin\train.py", line 271, in my_app
    train_loop(config, device, model, optimizer, lr_scheduler, data_loaders)
  File "d:\cygwin64\opt\miniconda3\envs\nnsvs\lib\site-packages\nnsvs\bin\train.py", line 199, in train_loop
    loss.backward()
  File "d:\cygwin64\opt\miniconda3\envs\nnsvs\lib\site-packages\torch\tensor.py", line 221, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "d:\cygwin64\opt\miniconda3\envs\nnsvs\lib\site-packages\torch\autograd\__init__.py", line 132, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: Function 'LogsumexpBackward' returned nan values in its 0th output.

I assumed the diagonal covariance for convenience and mdn_loss contains the code to sum log_prob along the axis of target variables(D_out)[5]. Even though each log_prob is small, the summation of them can be large enough in the negative direction. This may result in 0 of exponential in logsumexp and logsumexp may return nan.

https://github.com/r9y9/nnsvs/blob/6f783d289b8d11d69d954211b8cea83c79b2a49f/nnsvs/mdn.py#L96

I struggled to solve the case (2) but i could not find any good and mathematically correct solutions. Please advise me.

it seems L285 in nnsvs.gen.py can be removed and can be changed from

 windows = get_windows(num_windows) 
  
 # Apply MLPG if necessary 
 if np.any(has_dynamic_features): 
     static_stream_sizes = get_static_stream_sizes( 
         stream_sizes, has_dynamic_features, len(windows)) 
 else: 
     static_stream_sizes = stream_sizes

 # Apply MLPG if necessary 
 if np.any(has_dynamic_features): 
     static_stream_sizes = get_static_stream_sizes( 
         stream_sizes, has_dynamic_features, num_windows) 
 else: 
     static_stream_sizes = stream_sizes

Multi-singer models

I guess people already have done several experiments with enunu though, I'd like to try it and make reproducible recipes in nnsvs. It should be conceptually pretty easy to implement.

Incorporate GAN for acoustic model

I've done it for TTS quite a while ago: https://github.com/r9y9/gantts. It's worth doing to improve acoustic models for SVS as well.

Cannot generate KIRITAN labels

musicxml/01.xml
musicxml/02.xml
[WARN] The last note is not rest
[WARN] The last note is not rest
musicxml/03.xml
musicxml/04.xml
musicxml/05.xml
Traceback (most recent call last):
File "perf_segmentation.py", line 84, in
np.min(flatten_lengths), np.max(flatten_lengths), np.mean(flatten_lengths)))
File "<array_function internals>", line 6, in amin
File "/usr/local/lib/python3.6/dist-packages/numpy/core/fromnumeric.py", line 2793, in amin
keepdims=keepdims, initial=initial, where=where)
File "/usr/local/lib/python3.6/dist-packages/numpy/core/fromnumeric.py", line 90, in _wrapreduction
return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
ValueError: zero-size array to reduction operation minimum which has no identity
Prepare data for time-lag models
0%| | 0/5 [00:00<?, ?it/s]
Traceback (most recent call last):
File "finalize_lab.py", line 49, in
lab_align = trim_sil_and_pau(hts.load(lab_align_path))
File "/usr/local/lib/python3.6/dist-packages/nnmnkwii-0.0.21+1cfaeca-py3.6-linux-x86_64.egg/nnmnkwii/io/hts.py", line 325, in load
return labels.load(path, lines)
File "/usr/local/lib/python3.6/dist-packages/nnmnkwii-0.0.21+1cfaeca-py3.6-linux-x86_64.egg/nnmnkwii/io/hts.py", line 198, in load
with open(path) as f:
FileNotFoundError: [Errno 2] No such file or directory: 'kiritan_singing_extra/full_dtw/01.lab'

numpy.linalg.LinAlgError: 75-th leading minor not positive definite

Stack trace:

Traceback (most recent call last):
  File "_\ENUNU-0.3.1\.\enunu-openutau.py", line 248, in <module>
    acoustic(sys.argv[2])
  File "_\ENUNU-0.3.1\.\enunu-openutau.py", line 208, in acoustic
    enulib.acoustic.timing2acoustic(
  File "_\ENUNU-0.3.1\.\enulib\acoustic.py", line 107, in timing2acoustic
    acoustic_features = predict_acoustic(
  File "_\ENUNU-0.3.1\python-3.8.10-embed-amd64\lib\site-packages\nnsvs\gen.py", line 282, in predict_acoustic
    pred_acoustic = multi_stream_mlpg(max_mu, max_sigma_sq, get_windows(acoustic_config.num_windows),
  File "_\ENUNU-0.3.1\python-3.8.10-embed-amd64\lib\site-packages\nnsvs\multistream.py", line 120, in multi_stream_mlpg
    y = paramgen.mlpg(x, var_, windows) if v else x
  File "_\ENUNU-0.3.1\python-3.8.10-embed-amd64\lib\site-packages\nnmnkwii\paramgen\_mlpg.py", line 200, in mlpg
    y[:, d] = bla.solveh(P, b)
  File "nnmnkwii/paramgen/_bandmat/linalg.pyx", line 302, in nnmnkwii.paramgen._bandmat.linalg.solveh
  File "nnmnkwii/paramgen/_bandmat/linalg.pyx", line 224, in nnmnkwii.paramgen._bandmat.linalg.cholesky
  File "nnmnkwii/paramgen/_bandmat/linalg.pyx", line 80, in nnmnkwii.paramgen._bandmat.linalg._cholesky_banded
numpy.linalg.LinAlgError: 75-th leading minor not positive definite

Note that this is nnsvs packed with ENUNU 0.3.1. It seem to have minor difference from Github tag 0.0.1.

Before denormalization

max_mu: torch.Size([1, 365, 199])
max_sigma: torch.Size([1, 365, 199])

After denormalization

max_mu: (365, 199)
max_sigma_sq: (365, 199)
acoustic_config.num_windows: 3
acoustic_config.stream_sizes: [180, 3, 1, 15]
acoustic_config.has_dynamic_features: [True, True, False, True]

Features are attached: features.zip

Could not finish step 0 on macOS

Hello, I am facing an issue when trying to run recipes on my local device (macOS):

(nnsvs-dev) XXXXX:svs-world-conv sdercolin$ bash run.sh --stage 0 --stop-stage 0
stage 0: Data preparation
Convert musicxml to label files.
100%|███████████████████████████████████████████████████████████████████████████████████████| 51/51 [00:01<00:00, 46.24it/s]
Copy original label files.
100%|██████████████████████████████████████████████████████████████████████████████████████| 51/51 [00:00<00:00, 923.08it/s]
Round label files.
100%|█████████████████████████████████████████████████████████████████████████████████████| 51/51 [00:00<00:00, 1006.12it/s]
100%|██████████████████████████████████████████████████████████████████████████████████████| 51/51 [00:00<00:00, 541.17it/s]
100%|██████████████████████████████████████████████████████████████████████████████████████| 51/51 [00:00<00:00, 745.91it/s]
0it [00:00, ?it/s]end_time 65050000 of the phoneme ch and start_time 66900000 of the phoneme u is not the same. There seems to be a missing phoneme in sinsy_mono_round.
end_time 68750000 of the phoneme u and start_time 66900000 of the phoneme u is not the same. There seems to be a missing phoneme in sinsy_mono_round.
0it [00:02, ?it/s]
Traceback (most recent call last):
  File "/Users/sdercolin/dev/nnsvs-dev/nnsvs/egs/_common/no2/utils/align_lab.py", line 57, in <module>
    lab_sinsy = fix_mono_lab_after_align(lab_sinsy, config["spk"])
  File "/Users/sdercolin/dev/nnsvs-dev/nnsvs/egs/_common/no2/utils/util.py", line 213, in fix_mono_lab_after_align
    return _fix_mono_lab_after_align_natsume_singing(lab)
  File "/Users/sdercolin/dev/nnsvs-dev/nnsvs/egs/_common/no2/utils/util.py", line 241, in _fix_mono_lab_after_align_natsume_singing
    f.append((f.end_times[-1], lab.end_times[i], lab.contexts[i]))
  File "/Users/sdercolin/dev/nnsvs-dev/lib/python3.8/site-packages/nnmnkwii/io/hts.py", line 159, in append
    raise ValueError(
ValueError: end_time (68750000) must be larger than start_time (68750000).
51it [00:00, 435.75it/s]
0it [00:00, ?it/s]
Traceback (most recent call last):
  File "/Users/sdercolin/dev/nnsvs-dev/nnsvs/egs/_common/no2/utils/perf_segmentation.py", line 50, in <module>
    base_segments, start_indices, end_indices = segment_labels(
  File "/Users/sdercolin/dev/nnsvs-dev/nnsvs/egs/_common/no2/utils/util.py", line 117, in segment_labels
    seg.append((s, e, l), strict)
  File "/Users/sdercolin/dev/nnsvs-dev/lib/python3.8/site-packages/nnmnkwii/io/hts.py", line 159, in append
    raise ValueError(
ValueError: end_time (68750000) must be larger than start_time (68750000).
Prepare data for time-lag models
  0%|                                                                                                | 0/51 [00:00<?, ?it/s]1: Global offset (in sec): -0.04
1_seg0.lab offset (in sec): -0.034999999999999996
1.lab: 1/64 time-lags are excluded.
1_seg1.lab offset (in sec): -0.034999999999999996
1.lab: 5/115 time-lags are excluded.
1_seg2.lab offset (in sec): -0.049999999999999996
1_seg3.lab offset (in sec): -0.245
1.lab: 5/11 time-lags are excluded.
1_seg4.lab offset (in sec): -0.034999999999999996
1.lab: 5/114 time-lags are excluded.
1_seg5.lab offset (in sec): -0.06
1.lab: 1/54 time-lags are excluded.
1_seg6.lab offset (in sec): -0.06999999999999999
1_seg7.lab offset (in sec): -0.045
1.lab: 3/49 time-lags are excluded.
1_seg8.lab offset (in sec): -0.06
1.lab: 4/66 time-lags are excluded.
10: Global offset (in sec): -0.055
10_seg0.lab offset (in sec): -0.08499999999999999
10_seg1.lab offset (in sec): -0.095
10.lab: 1/11 time-lags are excluded.
10_seg2.lab offset (in sec): -0.095
10.lab: 5/27 time-lags are excluded.
10_seg3.lab offset (in sec): -0.075
10.lab: 1/11 time-lags are excluded.
10_seg4.lab offset (in sec): -0.13999999999999999
10.lab: 1/10 time-lags are excluded.
10_seg5.lab offset (in sec): -0.075
10.lab: 2/15 time-lags are excluded.
10_seg6.lab offset (in sec): -0.09
10.lab: 3/10 time-lags are excluded.
10_seg7.lab offset (in sec): -0.13999999999999999
10.lab: 2/10 time-lags are excluded.
10_seg8.lab offset (in sec): -0.11
10.lab: 3/10 time-lags are excluded.
10_seg9.lab offset (in sec): -0.06999999999999999
10.lab: 3/15 time-lags are excluded.
10_seg10.lab offset (in sec): -0.38999999999999996
10.lab: 9/10 time-lags are excluded.
11: Global offset (in sec): -2.17
11_seg0.lab offset (in sec): -2.17
11.lab: 6/39 time-lags are excluded.
12: Global offset (in sec): -3.8249999999999997
  6%|█████▏                                                                                  | 3/51 [00:00<00:01, 32.91it/s]
Traceback (most recent call last):
  File "/Users/sdercolin/dev/nnsvs-dev/nnsvs/egs/_common/no2/utils/finalize_lab.py", line 89, in <module>
    assert seg_idx > 0 or exists(lab_align_path)
AssertionError
(nnsvs-dev) XXXXX:svs-world-conv sdercolin$

Some information:

The above is the result for recipe natsume_singing, but the similar errors also occur when I tried another recipe ofuton_p_utagoe_db.
I have succeed with everything on colab (https://colab.research.google.com/gist/taroushirani/3e54d01e9e85674dbb8eaa7e0e457acd/nnsvs_ofuton_p_utagoe_db_official_recipe.ipynb), and followed the same installation steps on macOS, except:

Installed everything in /usr/local/ instead of /usr/
Using python virtual env
Probably important: Did export ARCHFLAGS="-arch x86_64" before installing pysinsy, because without that I get a lot of unknown type name '__int64_t'. Ref: giampaolo/psutil#1832 (comment)

My Environment:

OS: macOS Catalina 10.15.7 (Intel)
Python: 3.8.2

Thanks in advance!

Implementation status and planned TODOs

This is an umbrella issue to track progress and discuss priority items. Comments and requests are always welcome.

MIlestones

~ 4/26 (Sun): Refactor my jupyter-based code to python scripts and push them to the repo
Achieve comparable quality to sinsy
Achieve comparable quality to NEUTRINO

Fundamental components

Music context extraction (by sinsy)
Acoustic model (music context to vocoder parameter prediction)
Relative pitch modeling
Timg-lag & duration model
Multi-stream modeling
~~Quantized F0 modeling~~
Autoregressive modeling [3] #31
Mixtuire density networks #20
Explicit vibrato modeling (low priority, as I believe autoregressive models implicitly model vibrato)
~~HMM (or similar)-based unsupervised phone-level alignment.~~ https://github.com/DYVAUX/SHIRO-Models-Japanese

Demo

Add a Jupyter notebook to demonstrate how to use pretrained models
Add demo page

Dataset

Kiritan singing https://zunko.jp/kiridev/login.php
nit-song070
jsut-song

Frontend

MusicXML -> context features

Japanese language support https://github.com/r9y9/pysinsy
English language support
~~Chinese language support~~ #105
~~Pure python implementation for musicxml parsing~~ We can use https://github.com/oatsu-gh/utaupy for converting UST to HTS labels
~~Frontend implementation for MIDI files~~ Frontend can be done by external tools

DSP

Implement Nakano's vibrato parameter estimation (I have C++ implementation locally. Will port it to python) [2]

Acoustic model

Context features -> acoustic features

Net + MLPG
(Fixed width) autoregressive models [3]
WaveNet-like model

Timing model & duration model

Time-lag model [1]
Phoneme duration prediction [1]

Vocoder

Acoustic features -> raw waveform

WORLD vocoder
Parallel WaveGAN
~~LPCNet~~
NSF

Command-line tools

Data loader

~~Phrase-based mini-batch creation~~

Design TODOs

~~Think and write software design~~
Think about the recipe design

Software quality

Add tests
Enable Github actions
Write documents

Recipes

Think about recipe design
https://arxiv.org/abs/1910.09989

Misc

Waiting for facebookresearch/hydra#386 to provide more flexible control on configs
~~Write a paper for this perhaps? ~~ WIll do it as a part of my PhD research

References

[1] Y. Hono et al, "Recent Development of the DNN-based Singing Voice Synthesis System — Sinsy," Proc. of APSIPA, 2017. PDF
[2] Vibrato estimation in Sinsy: HMMに基づく歌声合成のためのビブラートモデル化, MUS80, 2009.
[3] Wang, Xin, Shinji Takaki, and Junichi Yamagishi. "Autoregressive neural f0 model for statistical parametric speech synthesis." IEEE/ACM Transactions on Audio, Speech, and Language Processing 26.8 (2018): 1406-1419.

Can't advance DTW alignment run script error from singing voice DB　

sh run.sh
musicxml/01.xml
Traceback (most recent call last):
File "gen_lab.py", line 24, in
lab.append(l.split(), strict=False)
TypeError: append() got an unexpected keyword argument 'strict'
Traceback (most recent call last):
File "perf_segmentation.py", line 84, in
np.min(flatten_lengths), np.max(flatten_lengths), np.mean(flatten_lengths)))
File "<array_function internals>", line 6, in amin
File "/usr/local/lib/python3.6/dist-packages/numpy/core/fromnumeric.py", line 2793, in amin
keepdims=keepdims, initial=initial, where=where)
File "/usr/local/lib/python3.6/dist-packages/numpy/core/fromnumeric.py", line 90, in _wrapreduction
return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
ValueError: zero-size array to reduction operation minimum which has no identity
Prepare data for time-lag models
0%| | 0/2 [00:00<?, ?it/s]
Traceback (most recent call last):
File "finalize_lab.py", line 49, in
lab_align = trim_sil_and_pau(hts.load(lab_align_path))
File "/usr/local/lib/python3.6/dist-packages/nnmnkwii/io/hts.py", line 302, in load
return labels.load(path, lines)
File "/usr/local/lib/python3.6/dist-packages/nnmnkwii/io/hts.py", line 175, in load
with open(path) as f:
FileNotFoundError: [Errno 2] No such file or directory: 'kiritan_singing_extra/full_dtw/01.lab'　

Flipping the wrong dimension in TimeInvFIRFilter

Hello,

I was reading your dsp.py code yesterday, and came across a small bug at line 33:

self.weight.data[:, :, :] = filt_coef.flip(0)

The flip should be performed on the last dimension:

self.weight.data[:, :, :] = filt_coef.flip(-1)

As the shape of weight is (1, 1, kernel_size).

The same bug is repeated in the trainable variant TrTimeInvFIRFilter, but is consequence-less since you randomly initialize these weights anyway (flipping or not does not change anything).

I wanted to take this occasion to thank you from the bottom of my heart for all the code you share online, you truly are a god of open-source neural speech synthesis.

Cheers,
Guillaume

Expose hydra conifigs for users in recipes

This is possible using features in hydra 1.0.0.

ref #13

MGE training: Incorporate MLPG into training process

https://r9y9.github.io/blog/2017/10/05/ganvc/

Ndarray shape mismatch between lf0 and lf0_score in data/data_source.py with PJS data-set on Google Colaboratory.

When I use NNSVS with PJS corpus on Google Colaboratory, I get ValueError on stage 1 and fail to extract acoustic features. I attempted print debug on data/data_source.py and found that the shape of lf0_score in collect_features differed before and after interp1d at line 191, so this may be the bug of nnmnkwii, not nnsvs, but I could not find the cause any more.

I uploaded the failed ipynb to gist so please see it for detail.

https://gist.github.com/taroushirani/a44491c94c924d38a87433fe581b2b05

Setting data_parallel: true results in an error

When setting data_parallel to "true" in the config, I end up with this as a result.
I run this under a Jupyter notebook instance that is based on the ENUNU-Training-Kit with the dev2 branch of NNSVS.

Error executing job with overrides: ['model=acoustic_custom', 'train=myconfig', 'data=myconfig', 'data.train_no_dev.in_dir=dump/multi-gpu-test/norm/train_no_dev/in_acoustic/', 'data.train_no_dev.out_dir=dump/multi-gpu-test/norm/train_no_dev/out_acoustic/', 'data.dev.in_dir=dump/multi-gpu-test/norm/dev/in_acoustic/', 'data.dev.out_dir=dump/multi-gpu-test/norm/dev/out_acoustic/', 'train.out_dir=exp/multi-gpu-test_dynamivox_notebook/acoustic', 'train.resume.checkpoint=']
Traceback (most recent call last):
  File "/opt/conda/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/opt/conda/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/opt/conda/lib/python3.8/site-packages/nnsvs/bin/train.py", line 158, in <module>
    my_app()
  File "/opt/conda/lib/python3.8/site-packages/hydra/main.py", line 48, in decorated_main
    _run_hydra(
  File "/opt/conda/lib/python3.8/site-packages/hydra/_internal/utils.py", line 377, in _run_hydra
    run_and_report(
  File "/opt/conda/lib/python3.8/site-packages/hydra/_internal/utils.py", line 214, in run_and_report
    raise ex
  File "/opt/conda/lib/python3.8/site-packages/hydra/_internal/utils.py", line 211, in run_and_report
    return func()
  File "/opt/conda/lib/python3.8/site-packages/hydra/_internal/utils.py", line 378, in <lambda>
    lambda: hydra.run(
  File "/opt/conda/lib/python3.8/site-packages/hydra/_internal/hydra.py", line 111, in run
    _ = ret.return_value
  File "/opt/conda/lib/python3.8/site-packages/hydra/core/utils.py", line 233, in return_value
    raise self._return_value
  File "/opt/conda/lib/python3.8/site-packages/hydra/core/utils.py", line 160, in run_job
    ret.return_value = task_function(task_cfg)
  File "/opt/conda/lib/python3.8/site-packages/nnsvs/bin/train.py", line 141, in my_app
    train_loop(
  File "/opt/conda/lib/python3.8/site-packages/nnsvs/bin/train.py", line 100, in train_loop
    loss = train_step(
  File "/opt/conda/lib/python3.8/site-packages/nnsvs/bin/train.py", line 32, in train_step
    out_feats = model.preprocess_target(out_feats)
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1185, in __getattr__
    raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'DataParallel' object has no attribute 'preprocess_target'
++ set +x

Unused argument at function "nnsvs.gen.predict_duration"

The argument lag looks unused in the function nnsvs.gen.predict_duration.

Whether should the state be removed or held for future development (for example duration prediction using timelag values)?
https://github.com/r9y9/nnsvs/blob/c69566f32c3799f9c0c0a76cc1f48175be9c653a/nnsvs/gen.py#L166-L167

original code follows

def predict_duration(device, labels, duration_model, duration_config, duration_in_scaler, duration_out_scaler,
        lag, binary_dict, continuous_dict, pitch_indices=None, log_f0_conditioning=True):
    # Extract musical/linguistic features
    duration_linguistic_features = fe.linguistic_features(
        labels, binary_dict, continuous_dict,
        add_frame_features=False, subphone_features=None).astype(np.float32)

    if log_f0_conditioning:
        for idx in pitch_indices:
            duration_linguistic_features[:, idx] = interp1d(
                _midi_to_hz(duration_linguistic_features, idx, log_f0_conditioning),
                    kind="slinear")

    # Apply normalization
    duration_linguistic_features = duration_in_scaler.transform(duration_linguistic_features)
    if isinstance(duration_in_scaler, MinMaxScaler):
        # clip to feature range
        duration_linguistic_features = np.clip(
            duration_linguistic_features, duration_in_scaler.feature_range[0],
            duration_in_scaler.feature_range[1])

    # Apply model
    x = torch.from_numpy(duration_linguistic_features).float().to(device)
    x = x.view(1, -1, x.size(-1))

    if duration_model.prediction_type() == PredictionType.PROBABILISTIC:
        # (B, T, D_out)
        max_mu, max_sigma = duration_model.inference(x, [x.shape[1]])
        if np.any(duration_config.has_dynamic_features):
            # Apply denormalization
            # (B, T, D_out) -> (T, D_out)
            max_sigma_sq = max_sigma.squeeze(0).cpu().data.numpy() ** 2 * duration_out_scaler.var_
            max_mu = duration_out_scaler.inverse_transform(max_mu.squeeze(0).cpu().data.numpy())

            # (T, D_out) -> (T, static_dim)
            pred_durations = multi_stream_mlpg(max_mu, max_sigma_sq, get_windows(duration_config.num_windows),
                                              duration_config.stream_sizes, duration_config.has_dynamic_features)
        else:
            # Apply denormalization
            pred_durations = duration_out_scaler.inverse_transform(max_mu.squeeze(0).cpu().data.numpy())
    else:
        # (T, D_out)
        pred_durations = duration_model.inference(x, [x.shape[1]]).squeeze(0).cpu().data.numpy()
        # Apply denormalization
        pred_durations = duration_out_scaler.inverse_transform(pred_durations)
        if np.any(duration_config.has_dynamic_features):
            # (T, D_out) -> (T, static_dim)
            pred_durations = multi_stream_mlpg(
                pred_durations, duration_out_scaler.var_, get_windows(duration_config.num_windows),
                duration_config.stream_sizes, duration_config.has_dynamic_features)

    pred_durations[pred_durations <= 0] = 1
    pred_durations = np.round(pred_durations)

    return pred_durations

Distributed data-parallel to speed up multi-GPU training

#90 (comment)

How do I create a customized hed file?

I checked the links below.
http://hts.sp.nitech.ac.jp
https://github.com/r9y9/sinsy
https://github.com/r9y9/nnmnkwii
https://github.com/DynamiVox/nnsvs-english-support

but I couldn't figure out the specific way.

I'm going to create a hed file for another language.
If possible, could you tell me how to write specific things such as grammar?

Thank you for the good project.

Error when training PJS database and a custom database

I'm training on Google colab.
I tried to train PJS database with this config and database
When I using run.sh include in nnsvs/egs/pjs to train, I got this

stage 0: Data preparation
[ERR] Cannot open phoneme table file : /usr/local/lib/sinsy/dic/japanese.utf_8.table
[ERR] Cannot read phoneme table file : /usr/local/lib/sinsy/dic/japanese.utf_8.table
[ERR] Cannot read Japanese table or config or macron file : /usr/local/lib/sinsy/dic/japanese.utf_8.table, /usr/local/lib/sinsy/dic/japanese.utf_8.conf
Traceback (most recent call last):
  File "utils/data_prep.py", line 66, in <module>
    assert sinsy.setLanguages("j", "/usr/local/lib/sinsy/dic")
AssertionError

And if I use dic include in natsume recipe, I got this

stage 0: Data preparation
[WARN] Cannot open macron table file : /content/nnsvs/egs/pjs/00-svs-world/dic/japanese.macron
[WARN] Cannot open macron table file : /content/nnsvs/egs/pjs/00-svs-world/dic/japanese.macron
[WARN] Cannot open macron table file : /content/nnsvs/egs/pjs/00-svs-world/dic/japanese.macron
Traceback (most recent call last):
  File "utils/data_prep.py", line 82, in <module>
    assert len(align_mono_lab) == len(sinsy_mono_lab)
AssertionError

And if I use run.sh in natsume recipe, I got this(ignore "The last note is not rest")

stage 0: Data preparation
[WARN] Duration of notes in a measure is too long
[WARN] Number of notes in a measure is too large
[WARN] Duration of notes in a measure is too long
[WARN] Lyric in unknown language :  in measure 3
[WARN] Lyric in unknown language :  in measure 3
[WARN] Number of notes in a measure is too large
[WARN] Number of notes in a measure is too large
pjs001.lab 0.0
end_time 648800000000000 of the phoneme u and start_time 603700000000000 of the phoneme w is not the same. There seems to be a missing phoneme in sinsy_mono_round.
Traceback (most recent call last):
  File "/content/gdrive/Training_data/PJS/pjs/00-svs-world/utils/align_lab.py", line 41, in <module>
    lab_sinsy = fix_mono_lab_after_align(lab_sinsy)
  File "/content/drive/Shared drives/台U/NNSVS/Training_data/PJS/pjs/00-svs-world/utils/util.py", line 256, in fix_mono_lab_after_align
    f.append((f.end_times[-1], lab.end_times[i], lab.contexts[i]))
  File "/usr/local/lib/python3.6/dist-packages/nnmnkwii/io/hts.py", line 161, in append
    end_time, start_time))
ValueError: end_time (648800000000000) must be larger than start_time (648800000000000).
Traceback (most recent call last):
  File "/content/gdrive/Training_data/PJS/pjs/00-svs-world/utils/perf_segmentation.py", line 71, in <module>
    k, np.min(v), np.max(v), np.mean(v)))
  File "<__array_function__ internals>", line 6, in amin
  File "/usr/local/lib/python3.6/dist-packages/numpy/core/fromnumeric.py", line 2793, in amin
    keepdims=keepdims, initial=initial, where=where)
  File "/usr/local/lib/python3.6/dist-packages/numpy/core/fromnumeric.py", line 90, in _wrapreduction
    return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
ValueError: zero-size array to reduction operation minimum which has no identity
Prepare data for time-lag models
  0% 0/1 [00:00<?, ?it/s]pjs001: Global offset (in sec): 91114357.91
  0% 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/content/gdrive/Training_data/PJS/pjs/00-svs-world/utils/finalize_lab.py", line 79, in <module>
    assert seg_idx > 0 or exists(lab_align_path)
AssertionError

And I think this is a bug because it also told me end_time of the phoneme e and start_time of the phoneme u is not the same. when I was training my custom small database, but I check the lab files, it doesn't exist any e and u connect. By the way, I use ! cd $RECIPE_ROOT && rm -r data/ before training to make sure it will force rewrite.

Then I delete the file randomly in my custom database, it pass that error but it seems like getting a new error.
console log on pastebin
my database
Sorry, this report is very long and unreadable, so if you have any problem, please tell me. I'm sorry.

Discussion: NNSVS vs. NEUTRINO

Samples: https://soundcloud.com/r9y9/sets/nnsvs-and-neutrino-comparison

While I was looking into the differences from nnsvs and neutrino samples, I noticed that there are MUCH room for improvement in the acoustic model. I will put some analysis results for the record.

Global variance

Spectrogram

Upper: nnsvs, lower: neutrino

Looks like neutrino put emphasis on <8000 Hz frequency bands

Aperiodicity

Upper: nnsvs, lower: neutrino

It seems that neutrino performs phrase-level synthesis (separated by rests I guess?). Aperiodicity components are filled with constant values for pause.

F0

MGC

mgc 0th: ours are shifted. This is not important cause gain of signals are different at training.
mgc higher dims: Clearly ours are smoothed. Temporal fluctuations are clearly observed for neutrino, but not for nnsvs.

BAP

Same as mgc, ours are over-smoothed

So what can we do?

So far I am thinking of the following ideas

Try autoregressive models to alleviate over-smoothing issues for mgc/bap modeling #15
Design a post-filter to alleviate the over-smoothing issues. I guess modulation spectrum based post-filter would work to some extent.

Error Synthesize How can I resolve this error?

[2022-03-07 06:21:16,448][nnsvs][INFO] - Processes 1 utterances...
0%| | 0/1 [00:01<?, ?it/s]
Error executing job with overrides: ['question_path=jp.hed', 'timelag=defaults', 'duration=defaults', 'acoustic=defaults', 'timelag.checkpoint=exp/sho/timelag/best_loss.pth', 'timelag.in_scaler_path=dump/sho/norm/in_timelag_scaler.joblib', 'timelag.out_scaler_path=dump/sho/norm/out_timelag_scaler.joblib', 'timelag.model_yaml=exp/sho/timelag/model.yaml', 'duration.checkpoint=exp/sho/duration/best_loss.pth', 'duration.in_scaler_path=dump/sho/norm/in_duration_scaler.joblib', 'duration.out_scaler_path=dump/sho/norm/out_duration_scaler.joblib', 'duration.model_yaml=exp/sho/duration/model.yaml', 'acoustic.checkpoint=exp/sho/acoustic/best_loss.pth', 'acoustic.in_scaler_path=dump/sho/norm/in_acoustic_scaler.joblib', 'acoustic.out_scaler_path=dump/sho/norm/out_acoustic_scaler.joblib', 'acoustic.model_yaml=exp/sho/acoustic/model.yaml', 'utt_list=./data/list/dataset.list', 'in_dir=./data/synthes/', 'out_dir=exp/sho/synthesis/utt_list/best_loss/label_phone_score', 'ground_truth_duration=false']
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/nnsvs-0.0.1-py3.8.egg/nnsvs/bin/synthesis.py", line 224, in my_app
wav = synthesis(
File "/usr/local/lib/python3.8/dist-packages/nnsvs-0.0.1-py3.8.egg/nnsvs/bin/synthesis.py", line 122, in synthesis
acoustic_features = predict_acoustic(
File "/usr/local/lib/python3.8/dist-packages/nnsvs-0.0.1-py3.8.egg/nnsvs/gen.py", line 406, in predict_acoustic
pred_acoustic = multi_stream_mlpg(
File "/usr/local/lib/python3.8/dist-packages/nnsvs-0.0.1-py3.8.egg/nnsvs/multistream.py", line 114, in multi_stream_mlpg
raise RuntimeError("You probably have specified wrong dimension params.")
RuntimeError: You probably have specified wrong dimension params.

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

Speech-to-singing model

I don't have any ideas yet, but I'd like to explore some speech-to-singing methods as a research project (IF I HAVE TIME). It would be very nice if we can make a singing voice synthesis system with only speech data. I think it's doable to some extent.

Reduce V/UV errors

Improved acoustic model support: introducing autoregressive structure

As in the shallow AR model proposed by Xin Wang.

The issue was part of #1 but I raised a new issue since this is one of the very important action items to improve the singing voice synthesis quality. Specific discussion and progress can be done in this thread. Welcome any comments and suggestions.

Shallow AR
Standard AR
~~MDN + AR~~
Stream-wise modeling? #21

Setup steramlit demo

So that people can try easily try it out

Can I get midi files format files from tar.bz2?

Is there possible way to extract midi label from HTS like kiritan?

Sample script to draw graphs of learning curves from output logs

Hello, I wrote a simple script to draw learning curves from NNSVS output logs.

https://gist.github.com/taroushirani/08683b4a7357f79713976fa66d61857c

If it is worthy for someone, I want to submit PR but I have no idea where to put this in the repository. Could anyone advise me?

Support for neural vocoders

I've posted this on the oatsu-gh/ENUNU repository too (oatsu-gh/ENUNU#11).

Just as the title says, I'm wondering if the code can be modified to use other vocoders, like the neural WaveNet vocoder. If so, steps could be provided which could really help me out, as even with a dataset with good quality and quantity of files, the final result always seems robotic, which I believe is because of the vocoder. Therefore, neural network vocoders could maybe produce better results.

Thank you so much!

I uploaded the failed ipynb to gist.github.com[2] so please see it for detail.