Giter Club home page Giter Club logo

vectorquantizedcpc's People

Contributors

bshall avatar kamperh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

vectorquantizedcpc's Issues

Info on training

Dear team,
Can you describe the training process? How many GPUs, how long did it take?
I see that you have 22000 epocs in the configuration file, is that corret? I am only at 400 and it took a whole day.

Best,

About the training curve: My training VQ loss is increasing. Should I judge just from the accuracy?

Hello, may I ask you some questions about the training process?

I have modified the SR to 24kHz and HOP_SIZE to 300, which results in a 80Hz spectrum feature for input. I used my own dataset for training, and the training curve is like follows:
image
VQ loss is increasing, but the accuracy is at around 75%.
Is this a normal situtation?

In fact, I want to use this model for an unsupervised phone loss, but the input size is fixed. Thus, I also want to know, will the phonetic discrimination performance still be good, for other input with arbitrary length?
Thank you.

CPC loss

I read related papers, but still do not understand the CPC loss computaiton.

        labels = torch.zeros(
            self.n_speakers_per_batch * self.n_utterances_per_speaker, length,
            dtype=torch.long, device=z.device
        )

        loss = F.cross_entropy(f, labels)

Can someone explain it for me. Why labels of zeros and cross_entropy used here?

tensorboard needs to be installed separately

I am using pytorch 1.5 and ran into the following issue with train_cpc.py

ImportError: TensorBoard logging requires TensorBoard with Python summary writer installed. This should be available in 1.14 or above

The following solved the issue

pip install -U tb-nightly

Lacking of ABX_Task files provided in the dataset.

Hi,
I have successfully trained the CPC model and vocoder following the instructions, but I met the problems while evaluating the model for ABX scores.

(zerospeech2020) cad-1@cad1-SYS-7048GR-TR:~/zhangying/VectorQuantizedCPC$ zerospeech2020-evaluate 2019 -j10 -d levenshtein baseline/ -o 2019_levenshtein.json
[INFO] evaluating 2019 track for english test
Traceback (most recent call last):
  File "/home/cad-1/anaconda3/envs/zerospeech2020/bin/zerospeech2020-evaluate", line 33, in <module>
    sys.exit(load_entry_point('zerospeech2020==0.2', 'console_scripts', 'zerospeech2020-evaluate')())
  File "/home/cad-1/anaconda3/envs/zerospeech2020/lib/python3.8/site-packages/zerospeech2020-0.2-py3.8.egg/zerospeech2020/evaluation/main.py", line 197, in main
  File "/home/cad-1/anaconda3/envs/zerospeech2020/lib/python3.8/site-packages/zerospeech2020-0.2-py3.8.egg/zerospeech2020/evaluation/evaluation_2019.py", line 60, in evaluate
  File "/home/cad-1/anaconda3/envs/zerospeech2020/lib/python3.8/site-packages/zerospeech2020-0.2-py3.8.egg/zerospeech2020/evaluation/evaluation_2019.py", line 60, in <dictcomp>
  File "/home/cad-1/anaconda3/envs/zerospeech2020/lib/python3.8/site-packages/zerospeech2020-0.2-py3.8.egg/zerospeech2020/evaluation/evaluation_2019.py", line 108, in _evaluate_single
  File "/home/cad-1/anaconda3/envs/zerospeech2020/lib/python3.8/site-packages/zerospeech2020-0.2-py3.8.egg/zerospeech2020/evaluation/abx.py", line 298, in abx
  File "/home/cad-1/anaconda3/envs/zerospeech2020/lib/python3.8/site-packages/zerospeech2020-0.2-py3.8.egg/zerospeech2020/evaluation/abx.py", line 231, in _abx
  File "/home/cad-1/anaconda3/envs/zerospeech2020/lib/python3.8/site-packages/ABXpy/distances/distances.py", line 300, in compute_distances
    jobs = create_distance_jobs(pair_file, distance_file, n_cpu)
  File "/home/cad-1/anaconda3/envs/zerospeech2020/lib/python3.8/site-packages/ABXpy/distances/distances.py", line 54, in create_distance_jobs
    with h5py.File(pair_file, 'r') as fh:
  File "/home/cad-1/anaconda3/envs/zerospeech2020/lib/python3.8/site-packages/h5py/_hl/files.py", line 406, in __init__
    fid = make_fid(name, mode, userblock_size,
  File "/home/cad-1/anaconda3/envs/zerospeech2020/lib/python3.8/site-packages/h5py/_hl/files.py", line 173, in make_fid
    fid = h5f.open(name, flags, fapl=fapl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5f.pyx", line 88, in h5py.h5f.open
OSError: Unable to open file (unable to open file: name = 'datasets/2019/ABXTasks/byCtxt_acSpkr.abx', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0)

I havn't found ABXTasks in datasets.zip. Can you give me some help?

Thanks very much!

What is the approach for generating new target voice samples (Voice conversion) using the pretrained models?

Given a new input and a target sample, is it possible to make use of a pretrained models to do voice conversion.

As the embedding in the vocoder has to be learnt, I was considering to just train the vocoder to learn embeddings for the new target speaker and then use the convert.py to get the voice converted output. Can it be done this way? If not, please do suggest ways on how to do it.

Thanks,

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.