maum-ai / univnet Goto Github PK

View Code? Open in Web Editor NEW

257.0 12.0 46.0 23.24 MB

Unofficial PyTorch Implementation of UnivNet Vocoder (https://arxiv.org/abs/2106.07889)

Home Page: https://mindslab-ai.github.io/univnet/

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%

text-to-speech vocoder gan deep-learning pytorch tts speech-synthesis

univnet's People

Contributors

Stargazers

Watchers

univnet's Issues

Pitch too high

Hello,

I have trained a voice using your framework. I wanted to use it as a Vocoder for Grad-TTS.
Unfortunately the voice that is created as a result is way too high in its pitch.

Could you provide me with a hint or advice how this can happen?
Do I need to change some configs or can this happen in the inference? Do I need to pre-process the input wav?

Hi, I ran the following testing code to convert .wav -> mel using librosa and then Univnet with pretrained checkpoint to do the inverse but the results were extremely bad. Can you point out what I'm doing wrong? The input file is clean, US english speech. arguments: -p ./chkpt/univ_c16_0292.pt -c config/default_c16.yaml -i /Users/kelseyd/Documents/train/TF -o ./out

for filename in tqdm.tqdm(glob.glob(os.path.join(args.input_folder, '*.wav'))):
y, sr = librosa.load(filename,sr=24000)
mel=librosa.feature.melspectrogram(y=y, sr=sr, n_fft=1024, n_mels=100, fmin=0, fmax=12000)
mel = torch.from_numpy(mel)

        if len(mel.shape) == 2:
            mel = mel.unsqueeze(0)

        audio = model.inference(mel)
        audio = audio.cpu().detach().numpy()

Model In ference

is that possible to deploy it on CPU? The operation unfold spend a lot of time and is unsupported on onnx.

GAN loss for the first 200k steps

The paper says

We trained the generator with only auxiliary loss without discriminators in the first 200k steps.

but I don't think your training code reflects that, and starts combining stft_loss and score_loss right off the bat at step 0. Is there any reason behind this modification?

Config structure mismatch when trainer.py

Simple run with default config (all data is placed right) fails:
$ python trainer.py -c config/default.yaml -n test_run

Traceback (most recent call last):rainer.py -c config/conf
  File "trainer.py", line 30, in <module>
    assert hp.data.train != '' and hp.data.validation != '', \
  File "/opt/conda/lib/python3.8/site-packages/omegaconf/dictconfig.py", line 353, in __getattr__
    self._format_and_raise(
  File "/opt/conda/lib/python3.8/site-packages/omegaconf/base.py", line 190, in _format_and_raise
    format_and_raise(
  File "/opt/conda/lib/python3.8/site-packages/omegaconf/_utils.py", line 821, in format_and_raise
    _raise(ex, cause)
  File "/opt/conda/lib/python3.8/site-packages/omegaconf/_utils.py", line 719, in _raise
    raise ex.with_traceback(sys.exc_info()[2])  # set end OC_CAUSE=1 for full backtrace
  File "/opt/conda/lib/python3.8/site-packages/omegaconf/dictconfig.py", line 351, in __getattr__
    return self._get_impl(key=key, default_value=_DEFAULT_MARKER_)
  File "/opt/conda/lib/python3.8/site-packages/omegaconf/dictconfig.py", line 438, in _get_impl
    node = self._get_node(key=key, throw_on_missing_key=True)
  File "/opt/conda/lib/python3.8/site-packages/omegaconf/dictconfig.py", line 470, in _get_node
    raise ConfigKeyError(f"Missing key {key}")
omegaconf.errors.ConfigAttributeError: Missing key train
    full_key: data.train
    object_type=dict

This assert on line 30 should be changed, i guess

Training fail: EOFError: Ran out of input

Hi there,
i am getting an error after 1 iteration of training and I cannot figure out the reason.

Do you have any idea what is causing the error EOFError: Ran out of input ?
Thanks in advance!

The error looks like this:
Loading train data: 0%| | 0/2732 [00:00<?, ?it/s] Traceback (most recent call last): File "trainer.py", line 44, in <module> train(0, args, args.checkpoint_path, hp, hp_str) File "/opt/3tbdrive1/products/voicesurfer/02.VoiceTraining/univnet/utils/train.py", line 125, in train for mel, audio in loader: File "/opt/3tbdrive1/products/voicesurfer/venv/lib/python3.6/site-packages/tqdm/std.py", line 1185, in __iter__ for obj in iterable: File "/opt/3tbdrive1/products/voicesurfer/venv/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 521, in __next__ data = self._next_data() File "/opt/3tbdrive1/products/voicesurfer/venv/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1203, in _next_data return self._process_data(data) File "/opt/3tbdrive1/products/voicesurfer/venv/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1229, in _process_data data.reraise() File "/opt/3tbdrive1/products/voicesurfer/venv/lib/python3.6/site-packages/torch/_utils.py", line 434, in reraise raise exception EOFError: Caught EOFError in DataLoader worker process 0. Original Traceback (most recent call last): File "/opt/3tbdrive1/products/voicesurfer/venv/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop data = fetcher.fetch(index) File "/opt/3tbdrive1/products/voicesurfer/venv/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/opt/3tbdrive1/products/voicesurfer/venv/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp> data = [self.dataset[idx] for idx in possibly_batched_index] File "/opt/3tbdrive1/products/voicesurfer/02.VoiceTraining/univnet/datasets/dataloader.py", line 61, in __getitem__ return self.my_getitem(idx) File "/opt/3tbdrive1/products/voicesurfer/02.VoiceTraining/univnet/datasets/dataloader.py", line 78, in my_getitem mel = self.get_mel(wavpath) File "/opt/3tbdrive1/products/voicesurfer/02.VoiceTraining/univnet/datasets/dataloader.py", line 96, in get_mel mel = torch.load(melpath, map_location='cpu') File "/opt/3tbdrive1/products/voicesurfer/venv/lib/python3.6/site-packages/torch/serialization.py", line 608, in load return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args) File "/opt/3tbdrive1/products/voicesurfer/venv/lib/python3.6/site-packages/torch/serialization.py", line 777, in _legacy_load magic_number = pickle_module.load(f, **pickle_load_args) EOFError: Ran out of input

mel channels

hello，how can i change mel channels from 80 to 100 to use your model？

The cause of mismatch for model params

for your note:

Our UnivNet generator has smaller number of parameters (c32: 5.11M, c16: 1.42M) than the paper (c32: 14.89M, c16: 4.00M). So far, we have not encountered any issues from using a smaller model size. If run into any problem, please report it as an issue.

it should be a mistake in your code:
https://github.com/mindslab-ai/univnet/blob/df77c9a37f71e3d6be1b504e16abaf99ce131de3/model/lvcnet.py#L110

the default value should be 3, but u set to 1 in the code.

Casual Style UnivNet

Hi.

Current implementation of UnivNet is in Non-Casual Style .. Can we get it in Casual Style of UnivNet ?

maum-ai / univnet Goto Github PK

univnet's People

Contributors

Stargazers

Watchers

Forkers

univnet's Issues

Pitch too high

Testing from .wav failed

Model In ference

GAN loss for the first 200k steps

Config structure mismatch when trainer.py

Training fail: EOFError: Ran out of input

mel channels

The cause of mismatch for model params

Casual Style UnivNet

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent