bshall / vectorquantizedcpc Goto Github PK

View Code? Open in Web Editor NEW

137.0 4.0 23.0 10.64 MB

Vector-Quantized Contrastive Predictive Coding for Acoustic Unit Discovery and Voice Conversion

Home Page: https://bshall.github.io/VectorQuantizedCPC/

License: MIT License

Python 100.00%

pytorch voice-conversion contrastive-predictive-coding vq-vae speech-synthesis zerospeech acoustic-features

vectorquantizedcpc's People

Contributors

Stargazers

Watchers

vectorquantizedcpc's Issues

Info on training

Dear team,
Can you describe the training process? How many GPUs, how long did it take?
I see that you have 22000 epocs in the configuration file, is that corret? I am only at 400 and it took a whole day.

Best,

About the training curve: My training VQ loss is increasing. Should I judge just from the accuracy?

Hello, may I ask you some questions about the training process?

I have modified the SR to 24kHz and HOP_SIZE to 300, which results in a 80Hz spectrum feature for input. I used my own dataset for training, and the training curve is like follows:

VQ loss is increasing, but the accuracy is at around 75%.
Is this a normal situtation?

In fact, I want to use this model for an unsupervised phone loss, but the input size is fixed. Thus, I also want to know, will the phonetic discrimination performance still be good, for other input with arbitrary length?
Thank you.

CPC loss

I read related papers， but still do not understand the CPC loss computaiton.

        labels = torch.zeros(
            self.n_speakers_per_batch * self.n_utterances_per_speaker, length,
            dtype=torch.long, device=z.device
        )

        loss = F.cross_entropy(f, labels)

Can someone explain it for me. Why labels of zeros and cross_entropy used here?

Is there an easy way to do phonetic transcription with this model?

tensorboard needs to be installed separately

I am using pytorch 1.5 and ran into the following issue with train_cpc.py

ImportError: TensorBoard logging requires TensorBoard with Python summary writer installed. This should be available in 1.14 or above

The following solved the issue

pip install -U tb-nightly

Lacking of ABX_Task files provided in the dataset.

Hi,
I have successfully trained the CPC model and vocoder following the instructions, but I met the problems while evaluating the model for ABX scores.

(zerospeech2020) cad-1@cad1-SYS-7048GR-TR:~/zhangying/VectorQuantizedCPC$ zerospeech2020-evaluate 2019 -j10 -d levenshtein baseline/ -o 2019_levenshtein.json
[INFO] evaluating 2019 track for english test
Traceback (most recent call last):
  File "/home/cad-1/anaconda3/envs/zerospeech2020/bin/zerospeech2020-evaluate", line 33, in <module>
    sys.exit(load_entry_point('zerospeech2020==0.2', 'console_scripts', 'zerospeech2020-evaluate')())
  File "/home/cad-1/anaconda3/envs/zerospeech2020/lib/python3.8/site-packages/zerospeech2020-0.2-py3.8.egg/zerospeech2020/evaluation/main.py", line 197, in main
  File "/home/cad-1/anaconda3/envs/zerospeech2020/lib/python3.8/site-packages/zerospeech2020-0.2-py3.8.egg/zerospeech2020/evaluation/evaluation_2019.py", line 60, in evaluate
  File "/home/cad-1/anaconda3/envs/zerospeech2020/lib/python3.8/site-packages/zerospeech2020-0.2-py3.8.egg/zerospeech2020/evaluation/evaluation_2019.py", line 60, in <dictcomp>
  File "/home/cad-1/anaconda3/envs/zerospeech2020/lib/python3.8/site-packages/zerospeech2020-0.2-py3.8.egg/zerospeech2020/evaluation/evaluation_2019.py", line 108, in _evaluate_single
  File "/home/cad-1/anaconda3/envs/zerospeech2020/lib/python3.8/site-packages/zerospeech2020-0.2-py3.8.egg/zerospeech2020/evaluation/abx.py", line 298, in abx
  File "/home/cad-1/anaconda3/envs/zerospeech2020/lib/python3.8/site-packages/zerospeech2020-0.2-py3.8.egg/zerospeech2020/evaluation/abx.py", line 231, in _abx
  File "/home/cad-1/anaconda3/envs/zerospeech2020/lib/python3.8/site-packages/ABXpy/distances/distances.py", line 300, in compute_distances
    jobs = create_distance_jobs(pair_file, distance_file, n_cpu)
  File "/home/cad-1/anaconda3/envs/zerospeech2020/lib/python3.8/site-packages/ABXpy/distances/distances.py", line 54, in create_distance_jobs
    with h5py.File(pair_file, 'r') as fh:
  File "/home/cad-1/anaconda3/envs/zerospeech2020/lib/python3.8/site-packages/h5py/_hl/files.py", line 406, in __init__
    fid = make_fid(name, mode, userblock_size,
  File "/home/cad-1/anaconda3/envs/zerospeech2020/lib/python3.8/site-packages/h5py/_hl/files.py", line 173, in make_fid
    fid = h5f.open(name, flags, fapl=fapl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5f.pyx", line 88, in h5py.h5f.open
OSError: Unable to open file (unable to open file: name = 'datasets/2019/ABXTasks/byCtxt_acSpkr.abx', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0)

I havn't found ABXTasks in datasets.zip. Can you give me some help?

Thanks very much!

What is the approach for generating new target voice samples (Voice conversion) using the pretrained models?

Given a new input and a target sample, is it possible to make use of a pretrained models to do voice conversion.

As the embedding in the vocoder has to be learnt, I was considering to just train the vocoder to learn embeddings for the new target speaker and then use the convert.py to get the voice converted output. Can it be done this way? If not, please do suggest ways on how to do it.

Thanks,

bshall / vectorquantizedcpc Goto Github PK

vectorquantizedcpc's People

Contributors

Stargazers

Watchers

Forkers

vectorquantizedcpc's Issues

Info on training

About the training curve: My training VQ loss is increasing. Should I judge just from the accuracy?

CPC loss

Is there an easy way to do phonetic transcription with this model?

tensorboard needs to be installed separately

Lacking of ABX_Task files provided in the dataset.

What is the approach for generating new target voice samples (Voice conversion) using the pretrained models?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent