Giter Club home page Giter Club logo

univnet's People

Contributors

azraelkuan avatar wonbin-jung avatar wookladin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

univnet's Issues

Pitch too high

Hello,

I have trained a voice using your framework. I wanted to use it as a Vocoder for Grad-TTS.
Unfortunately the voice that is created as a result is way too high in its pitch.

Could you provide me with a hint or advice how this can happen?
Do I need to change some configs or can this happen in the inference? Do I need to pre-process the input wav?

Testing from .wav failed

Hi, I ran the following testing code to convert .wav -> mel using librosa and then Univnet with pretrained checkpoint to do the inverse but the results were extremely bad. Can you point out what I'm doing wrong? The input file is clean, US english speech. arguments: -p ./chkpt/univ_c16_0292.pt -c config/default_c16.yaml -i /Users/kelseyd/Documents/train/TF -o ./out

for filename in tqdm.tqdm(glob.glob(os.path.join(args.input_folder, '*.wav'))):
y, sr = librosa.load(filename,sr=24000)
mel=librosa.feature.melspectrogram(y=y, sr=sr, n_fft=1024, n_mels=100, fmin=0, fmax=12000)
mel = torch.from_numpy(mel)

        if len(mel.shape) == 2:
            mel = mel.unsqueeze(0)

        audio = model.inference(mel)
        audio = audio.cpu().detach().numpy()

Model In ference

is that possible to deploy it on CPU? The operation unfold spend a lot of time and is unsupported on onnx.

GAN loss for the first 200k steps

The paper says

We trained the generator with only auxiliary loss without discriminators in the first 200k steps.

but I don't think your training code reflects that, and starts combining stft_loss and score_loss right off the bat at step 0. Is there any reason behind this modification?

Config structure mismatch when trainer.py

Simple run with default config (all data is placed right) fails:
$ python trainer.py -c config/default.yaml -n test_run

Traceback (most recent call last):rainer.py -c config/conf
  File "trainer.py", line 30, in <module>
    assert hp.data.train != '' and hp.data.validation != '', \
  File "/opt/conda/lib/python3.8/site-packages/omegaconf/dictconfig.py", line 353, in __getattr__
    self._format_and_raise(
  File "/opt/conda/lib/python3.8/site-packages/omegaconf/base.py", line 190, in _format_and_raise
    format_and_raise(
  File "/opt/conda/lib/python3.8/site-packages/omegaconf/_utils.py", line 821, in format_and_raise
    _raise(ex, cause)
  File "/opt/conda/lib/python3.8/site-packages/omegaconf/_utils.py", line 719, in _raise
    raise ex.with_traceback(sys.exc_info()[2])  # set end OC_CAUSE=1 for full backtrace
  File "/opt/conda/lib/python3.8/site-packages/omegaconf/dictconfig.py", line 351, in __getattr__
    return self._get_impl(key=key, default_value=_DEFAULT_MARKER_)
  File "/opt/conda/lib/python3.8/site-packages/omegaconf/dictconfig.py", line 438, in _get_impl
    node = self._get_node(key=key, throw_on_missing_key=True)
  File "/opt/conda/lib/python3.8/site-packages/omegaconf/dictconfig.py", line 470, in _get_node
    raise ConfigKeyError(f"Missing key {key}")
omegaconf.errors.ConfigAttributeError: Missing key train
    full_key: data.train
    object_type=dict

This assert on line 30 should be changed, i guess

Training fail: EOFError: Ran out of input

Hi there,
i am getting an error after 1 iteration of training and I cannot figure out the reason.

Do you have any idea what is causing the error EOFError: Ran out of input ?
Thanks in advance!

The error looks like this:
Loading train data: 0%| | 0/2732 [00:00<?, ?it/s] Traceback (most recent call last): File "trainer.py", line 44, in <module> train(0, args, args.checkpoint_path, hp, hp_str) File "/opt/3tbdrive1/products/voicesurfer/02.VoiceTraining/univnet/utils/train.py", line 125, in train for mel, audio in loader: File "/opt/3tbdrive1/products/voicesurfer/venv/lib/python3.6/site-packages/tqdm/std.py", line 1185, in __iter__ for obj in iterable: File "/opt/3tbdrive1/products/voicesurfer/venv/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 521, in __next__ data = self._next_data() File "/opt/3tbdrive1/products/voicesurfer/venv/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1203, in _next_data return self._process_data(data) File "/opt/3tbdrive1/products/voicesurfer/venv/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 1229, in _process_data data.reraise() File "/opt/3tbdrive1/products/voicesurfer/venv/lib/python3.6/site-packages/torch/_utils.py", line 434, in reraise raise exception EOFError: Caught EOFError in DataLoader worker process 0. Original Traceback (most recent call last): File "/opt/3tbdrive1/products/voicesurfer/venv/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop data = fetcher.fetch(index) File "/opt/3tbdrive1/products/voicesurfer/venv/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/opt/3tbdrive1/products/voicesurfer/venv/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp> data = [self.dataset[idx] for idx in possibly_batched_index] File "/opt/3tbdrive1/products/voicesurfer/02.VoiceTraining/univnet/datasets/dataloader.py", line 61, in __getitem__ return self.my_getitem(idx) File "/opt/3tbdrive1/products/voicesurfer/02.VoiceTraining/univnet/datasets/dataloader.py", line 78, in my_getitem mel = self.get_mel(wavpath) File "/opt/3tbdrive1/products/voicesurfer/02.VoiceTraining/univnet/datasets/dataloader.py", line 96, in get_mel mel = torch.load(melpath, map_location='cpu') File "/opt/3tbdrive1/products/voicesurfer/venv/lib/python3.6/site-packages/torch/serialization.py", line 608, in load return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args) File "/opt/3tbdrive1/products/voicesurfer/venv/lib/python3.6/site-packages/torch/serialization.py", line 777, in _legacy_load magic_number = pickle_module.load(f, **pickle_load_args) EOFError: Ran out of input

mel channels

hello,how can i change mel channels from 80 to 100 to use your model?

The cause of mismatch for model params

for your note:

Our UnivNet generator has smaller number of parameters (c32: 5.11M, c16: 1.42M) than the paper (c32: 14.89M, c16: 4.00M). So far, we have not encountered any issues from using a smaller model size. If run into any problem, please report it as an issue.

it should be a mistake in your code:
https://github.com/mindslab-ai/univnet/blob/df77c9a37f71e3d6be1b504e16abaf99ce131de3/model/lvcnet.py#L110

the default value should be 3, but u set to 1 in the code.

Casual Style UnivNet

Hi.

Current implementation of UnivNet is in Non-Casual Style .. Can we get it in Casual Style of UnivNet ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.