Giter Club home page Giter Club logo

speech-resynthesis's People

Contributors

adampolyak avatar adiyoss avatar tuanh208 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

speech-resynthesis's Issues

When i train the train_f0_vq.py , I meet a trouble .

Epoch: 1
Traceback (most recent call last):
File "train_f0_vq.py", line 217, in
main()
File "train_f0_vq.py", line 213, in main
train(a.local_rank, a, h)
File "train_f0_vq.py", line 101, in train
for i, batch in enumerate(train_loader):
File "/root/miniconda3/envs/test/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 517, in next
data = self._next_data()
File "/root/miniconda3/envs/test/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1199, in _next_data
return self._process_data(data)
File "/root/miniconda3/envs/test/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1225, in _process_data
data.reraise()
File "/root/miniconda3/envs/test/lib/python3.8/site-packages/torch/_utils.py", line 429, in reraise
raise self.exc_type(msg)
soundfile.LibsndfileError: <exception str() failed>
Killing subprocess 2908
Traceback (most recent call last):
File "/root/miniconda3/envs/test/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/root/miniconda3/envs/test/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/root/miniconda3/envs/test/lib/python3.8/site-packages/torch/distributed/launch.py", line 340, in
main()
File "/root/miniconda3/envs/test/lib/python3.8/site-packages/torch/distributed/launch.py", line 326, in main
sigkill_handler(signal.SIGTERM, None) # not coming back
File "/root/miniconda3/envs/test/lib/python3.8/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler
raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd)
subprocess.CalledProcessError: Command '['/root/miniconda3/envs/test/bin/python', '-u', 'train_f0_vq.py', '--local_rank=0', '--checkpoint_path', 'checkpoints/lj_f0_vq', '--config', 'configs/LJSpeech/f0_vqvae.json']' returned non-zero exit status 1.

Zero shot voice conversion

Can this framework do the zero-shot / few-shot voice conversion task. If the answer is possible, can you give any instructions about how to do it.

speaker information

Will the code used for extracting speaker information in the paper not be provided?

Bigger speech2unit Hubert versions

Hi,

I was just wondering if you have tried hubert-large or hubert-xtralarge as alternatives to hubert-base the speech2unit.
First, I tried to train a kmeans for the hubert-base and retrain the vocoder part to see if I can replicate the results with the pretrained kmeans, but the results that I get are worse.
I would very much appreciate if you either released the pretrained kmeans for hubert-large or hubert-xtralarge (if you have it), or gave me some guidelines to try to replicate your results.
Specifically, I want to know the amount of kmeans iterations, the amount of centroids, the layer of hubert, and the batch_size used. Currently, I'm training the kmeans with 150 iterations, 100 centroids, the 6th layer outputs, and a batch_size of 10000, but I don't know if these parameters are correct.

Thank you in advance

Any pretrained models available?

Is there any chance to release your pretrained models for evaluation purposes? I'd like to make a few comparisons before training. Thank you!

Coding new dataset for training

Hello!

I read the corresponding section in the README, and understand that I need to download the LibriLight dataset to train the VQVAE. I downloaded the small.tar file, but open unzipping I don't see files with the path like /checkpoint/pem/morgane/LibriBig/3717/9/3717_3120_9_0021.wav, but something like LibriLight/100/sea_fairies_0812_librivox_64kb_mp3/01_baum_sea_fairies_64kb.flac. Am I downloading the correct LibriLight? If not, what can I do?

Thank you!

I had a problem during the first step of training. Please how can I solve it

Initializing Training Process..
Initializing Training Process..
Initializing Training Process..
Initializing Training Process..
Batch size per GPU : 2
Batch size per GPU : 2
Batch size per GPU : 2
Batch size per GPU : 2
Initializing Training Process..
Batch size per GPU : 2
Initializing Training Process..
Batch size per GPU : 2
Initializing Training Process..
Initializing Training Process..
Batch size per GPU : 2
Batch size per GPU : 2
Traceback (most recent call last):
File "train_f0_vq.py", line 217, in
main()
File "train_f0_vq.py", line 213, in main
train(a.local_rank, a, h)
File "train_f0_vq.py", line 37, in train
generator = Quantizer(h).to(device)
File "/root/miniconda3/envs/test/lib/python3.8/site-packages/torch/nn/modules/module.py", line 673, in to
return self._apply(convert)
File "/root/miniconda3/envs/test/lib/python3.8/site-packages/torch/nn/modules/module.py", line 387, in _apply
module._apply(fn)
File "/root/miniconda3/envs/test/lib/python3.8/site-packages/torch/nn/modules/module.py", line 387, in _apply
module._apply(fn)
File "/root/miniconda3/envs/test/lib/python3.8/site-packages/torch/nn/modules/module.py", line 387, in _apply
module._apply(fn)
[Previous line repeated 3 more times]
File "/root/miniconda3/envs/test/lib/python3.8/site-packages/torch/nn/modules/module.py", line 409, in _apply
param_applied = fn(param)
File "/root/miniconda3/envs/test/lib/python3.8/site-packages/torch/nn/modules/module.py", line 671, in convert
return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
RuntimeError: CUDA error: invalid device ordinal

Running yaapt on-the-fly extremely slows the training

Hi, thanks for kindly releasing the code for the paper. (Also congratulations on the acceptance in INTERSPEECH!)

While I was running the code, I encountered a significant issue - pYAAPT.yaapt extremely slow the training.
Here's how I found out such a bottleneck on speed:

  • I tried to run train_f0_vq.py as specified in README.
  • However, training was too slow; looks like we need to train an f0 vq model for 400000 steps, but a single epoch (about 700 steps) took 2657 seconds to run. GPU util was really low, and CPUs were running like crazy. (My server has 3080 Ti with 64 CPU cores.)
  • I suspected pYAAPT.yaapt to be a cause for this. To test that, I forked a repository to add a caching functionality: https://github.com/seungwonpark/speech-resynthesis
  • After that, a single epoch after the first epoch (for an initial caching) took only 36 seconds.

So my question is, how did you manage to run yaapt on-the-fly without caching? Though I succeeded in training the model fast enough, I shall need to disable caching again since it requires the _sample_interval method to sample the same interval for each audio (i.e. disabling the data augmentation via randomly choosing the interval).

License question

Since this code has also been added to FairSeq, is the license CreativeCommons, or is it FairSeq's MIT? Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.