Giter Club home page Giter Club logo

3d-convolutional-speaker-recognition's Introduction

TensorFlow implementation of 3D Convolutional Neural Networks for Speaker Verification - Official Project Page - Pytorch Implementation

image

image

image

image

image

This repository contains the code release for our paper titled as "Text-Independent Speaker Verification Using 3D Convolutional Neural Networks". The link to the paper is provided as well.

The code has been developed using TensorFlow. The input pipeline must be prepared by the users. This code is aimed to provide the implementation for Speaker Verification (SR) by using 3D convolutional neural networks following the SR protocol.

image

Citation

If you used this code, please kindly consider citing the following paper:

@article{torfi2017text,
  title={Text-independent speaker verification using 3d convolutional neural networks},
  author={Torfi, Amirsina and Nasrabadi, Nasser M and Dawson, Jeremy},
  journal={arXiv preprint arXiv:1705.09422},
  year={2017}
}

DEMO

For running a demo, after forking the repository, run the following scrit:

./run.sh

speakerrecognition

General View

We leveraged 3D convolutional architecture for creating the speaker model in order to simultaneously capturing the speech-related and temporal information from the speakers' utterances.

Speaker Verification Protocol(SVP)

In this work, a 3D Convolutional Neural Network (3D-CNN) architecture has been utilized for text-independent speaker verification in three phases.

1. At the development phase, a CNN is trained to classify speakers at the utterance-level.

2. In the enrollment stage, the trained network is utilized to directly create a speaker model for each speaker based on the extracted features.

3. Finally, in the evaluation phase, the extracted features from the test utterance will be compared to the stored speaker model to verify the claimed identity.

The aforementioned three phases are usually considered as the SV protocol. One of the main challenges is the creation of the speaker models. Previously-reported approaches create speaker models based on averaging the extracted features from utterances of the speaker, which is known as the d-vector system.

How to leverage 3D Convolutional Neural Networks?

In our paper, we propose the implementation of 3D-CNNs for direct speaker model creation in which, for both development and enrollment phases, an identical number of speaker utterances is fed to the network for representing the spoken utterances and creation of the speaker model. This leads to simultaneously capturing the speaker-related information and building a more robust system to cope with within-speaker variation. We demonstrate that the proposed method significantly outperforms the d-vector verification system.

Code Implementation

The input pipeline must be provided by the user. Please refer to ``code/0-input/input_feature.py`` for having an idea about how the input pipeline works.

Input Pipeline for this work

image

The MFCC features can be used as the data representation of the spoken utterances at the frame level. However, a drawback is their non-local characteristics due to the last DCT 1 operation for generating MFCCs. This operation disturbs the locality property and is in contrast with the local characteristics of the convolutional operations. The employed approach in this work is to use the log-energies, which we call MFECs. The extraction of MFECs is similar to MFCCs by discarding the DCT operation. The temporal features are overlapping 20ms windows with the stride of 10ms, which are used for the generation of spectrum features. From a 0.8-second sound sample, 80 temporal feature sets (each forms a 40 MFEC features) can be obtained which form the input speech feature map. Each input feature map has the dimen-sionality of ζ × 80 × 40 which is formed from 80 input frames and their corresponding spectral features, where ζ is the number of utterances used in modeling the speaker during the development and enrollment stages.

The speech features have been extracted using [SpeechPy] package.

Implementation of 3D Convolutional Operation

The Slim high-level API made our life very easy. The following script has been used for our implementation:

net = slim.conv2d(inputs, 16, [3, 1, 5], stride=[1, 1, 1], scope='conv11')
net = PReLU(net, 'conv11_activation')
net = slim.conv2d(net, 16, [3, 9, 1], stride=[1, 2, 1], scope='conv12')
net = PReLU(net, 'conv12_activation')
net = tf.nn.max_pool3d(net, strides=[1, 1, 1, 2, 1], ksize=[1, 1, 1, 2, 1], padding='VALID', name='pool1')

############ Conv-2 ###############
############ Conv-1 ###############
net = slim.conv2d(net, 32, [3, 1, 4], stride=[1, 1, 1], scope='conv21')
net = PReLU(net, 'conv21_activation')
net = slim.conv2d(net, 32, [3, 8, 1], stride=[1, 2, 1], scope='conv22')
net = PReLU(net, 'conv22_activation')
net = tf.nn.max_pool3d(net, strides=[1, 1, 1, 2, 1], ksize=[1, 1, 1, 2, 1], padding='VALID', name='pool2')

############ Conv-3 ###############
############ Conv-1 ###############
net = slim.conv2d(net, 64, [3, 1, 3], stride=[1, 1, 1], scope='conv31')
net = PReLU(net, 'conv31_activation')
net = slim.conv2d(net, 64, [3, 7, 1], stride=[1, 1, 1], scope='conv32')
net = PReLU(net, 'conv32_activation')
# net = slim.max_pool2d(net, [1, 1], stride=[4, 1], scope='pool1')

############ Conv-4 ###############
net = slim.conv2d(net, 128, [3, 1, 3], stride=[1, 1, 1], scope='conv41')
net = PReLU(net, 'conv41_activation')
net = slim.conv2d(net, 128, [3, 7, 1], stride=[1, 1, 1], scope='conv42')
net = PReLU(net, 'conv42_activation')
# net = slim.max_pool2d(net, [1, 1], stride=[4, 1], scope='pool1')

############ Conv-5 ###############
net = slim.conv2d(net, 128, [4, 3, 3], stride=[1, 1, 1], normalizer_fn=None, scope='conv51')
net = PReLU(net, 'conv51_activation')

# net = slim.conv2d(net, 256, [1, 1], stride=[1, 1], scope='conv52')
# net = PReLU(net, 'conv52_activation')

# Last layer which is the logits for classes
logits = tf.contrib.layers.conv2d(net, num_classes, [1, 1, 1], activation_fn=None, scope='fc')

As it can be seen, slim.conv2d has been used. However, simply by using 3D kernels as [k_x, k_y, k_z] and stride=[a, b, c] it can be turned into a 3D-conv operation. The base of the slim.conv2d is tf.contrib.layers.conv2d. Please refer to official Documentation for further details.

Disclaimer

The code architecture part has been heavily inspired by Slim and Slim image classification library. Please refer to this link for further details.

Citation

If you used this code please kindly cite the following paper:

@article{torfi2017text,
  title={Text-Independent Speaker Verification Using 3D Convolutional Neural Networks},
  author={Torfi, Amirsina and Nasrabadi, Nasser M and Dawson, Jeremy},
  journal={arXiv preprint arXiv:1705.09422},
  year={2017}
}

License

The license is as follows:

APPENDIX: How to apply the Apache License to your work.

   To apply the Apache License to your work, attach the following
   boilerplate notice, with the fields enclosed by brackets "{}"
   replaced with your own identifying information. (Don't include the brackets!)  The text should be enclosed in the appropriate
   comment syntax for the file format. We also recommend that a
   file or class name and description of purpose be included on the
   same "printed page" as the copyright notice for easier
   identification within third-party archives.

Copyright {2017} {Amirsina Torfi}

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

Please refer to LICENSE file for further detail.

Contribution

We are looking forward to your kind feedback. Please help us to improve the code and make our work better. For contribution, please create the pull request and we will investigate it promptly. Once again, we appreciate your feedback and code inspections.

references

SpeechPy

Amirsina Torfi. 2017. astorfi/speech_feature_extraction: SpeechPy. Zenodo. doi:10.5281/zenodo.810392.

3d-convolutional-speaker-recognition's People

Contributors

astorfi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

3d-convolutional-speaker-recognition's Issues

Data pipeline example

I'm trying to figure out your pipeline including reading the paper with no luck so far.
Clearly base on the open and and closed issues I'm not the only one. It seems a lot of work has been done here and quality work too.
However this repository cries for a solid example from WAV file through feature extraction development enrollment and prediction.
I know that each case need to customize it's pipeline by itself but in my point of view the example, paper and documentation doesn't give enough infrastructure to continue on your own.

Again it really seems I'm not the only one. Can you please upload a pipeline example , refer me to one or at least upload a clear description from WAV file to prediction.

can't create enrollment hdf5 file

Hi astorfi:
i try to run the program but i can't find a python script to create enrollment hdf5 file
in code folder only exist create development hdf5 python script.
how to create enrollment hdf5 file?

Mark

Prepare data

Hi Astorfi,
I am trying train speaker model with your model,
How can I prepare my speech data (train_data, development_data, evaluation_data) for your model?
Thank you very much!

How to generate data

Hello,

I have a dataset of voices. I want to generate development and enrollment hdf5 file.
The input_feature.py file seams to generate development files (nx80x40x20). How can I generate the enrollment file?

minibatch loss not change

I have use some utterances to test the code, and the program is running well, but the console prints that minibatch loss always during 7~9 and not decrease, accuracy=0, what's wrong about this? Thx!

loss = 0,train acc = 0

Hi astorfi,
I'm trying to use your code to train a model with 31 labels, 60 samples for each label. However, when i use train_softmax.py, last minibatches return loss = 0 while train acc = 0. Do you have any idea to fix it?

image

Thank you.

the feature of input data

Hi Astorfi,
First your work is prefect! I've read your paper and it's really great
In sample data of your previous code, you are applying mfec feature file(feature_mfec.npy) contains shape (3209,40,3). As per my understanding 3209 is Number of Frames,40 is Number of Features,3(The 3 is the number of channels which consists of static, first order and second order derivative features.) Is number of Channels. (using speechpy.feature.extract_derivative_feature(feature)) right?
but in the input_feature.py function (you provided these days),the feature file output contains shape (1, 20, 80, 40),80 is Number of Frames,40 is Number of Features,20 is number of utterances,1 represents a cube of one speaker,right?
so in the input_feature.py just use the first channel of the MFEC features of the audio?
Thanks!

About voice/speaker verification task.

Hi astorfi,
Thanks for such a great work. I want to ask you some questions, because I am quite in DL but new to speech field so please forgive me if I ask any dump question :D

  1. What do you mean by 'input as stacked utterances', utterance here means a sentence, a word or something else?
  2. How large should the dataset be (the number of different speakers, the number of samples) for the model to work well on voice verification task?

Error importing Tables

Hello, first of all thank you for releasing the code. Unfortunately, I'm stuck at step 1 (step 0 works.)
I installed all the requirements, this is my setup:
Win7 64
Python 3.5.4
Tensorflow 1.6 (installed in a separate Anaconda environment but still wit pip install)
Tables 3.4.3
I also installed pytables from conda, I thought it was missing, but the result is still the same:

When running train_softmax, at 'import tables', I get:
File "C:\Users...\AppData\Local\Continuum\Anaconda3\envs\tensorflow\lib\site-packages\tables_init_.py", line 90, in
from .utilsextension import (

ImportError: DLL load failed: The specified procedure could not be found.

The thing is, if i simpy import tables with no code preceding it, it's fine. If I import it after tensorflow (as in your code), it gives me the error. If I move 'import tables' before 'import tensorflow', then python crashes.

I tried to find answers on the net but none was useful...

Thanks

Mean and standard deviation comes NAN

Hi Astorfi,

Your paper is awesome.
I am trying to train speech data using 3D CNN.
I have prepared data according to mention in paper. but during development phase I am getting mean and standard deviation "nan" in each epoch.
I am getting following output:

Epoch 1, Minibatch 1 of 3 , Minibatch Loss= 2.1972, TRAIN ACCURACY= 0.000
Epoch 1, Minibatch 2 of 3 , Minibatch Loss= 2.2215, TRAIN ACCURACY= 0.000
Epoch 1, Minibatch 3 of 3 , Minibatch Loss= 2.2637, TRAIN ACCURACY= 0.000
TESTING after finishing the training on: epoch 1
Test Accuracy 1, Mean= nan, std= nan

Can you please help me, why am I getting this problem?

Input dataset

Hi @astorfi
I have some questions about input dataset.

  1. According to the paper, the number of speakers is 511 in the development phase.
    But how long is the input audio file per speaker ??

  2. Although there is the function of CMVN preprocessing in input_feature.py, I'm not sure whether CMVN preprocessing is appropriate for the output of speechpy.feature.lmfe function.
    Did you use CMVN preprocessing in the experiment of the paper??

Thank you for your work!!

prediction difference between batch=1 and batch=16

Any ideas why I'm receiving different prediction values when running with batch_size=1,16?
find code below:
Thanks!

def predict(self,speech_input):
labels = np.empty(0, int)
labels = np.append(labels, range(speech_input.shape[0]), axis=0)
feature,logits,_ = self.session.run(
[self.features,self.logits,self.end_points_speech],
feed_dict={self.is_training: False, self.batch_dynamic: labels.shape[0],
self.margin_imp_tensor: 50,
self.batch_speech: speech_input})
#self.batch_labels: labels.reshape([labels.shape[0], 1])})

    # Extracting the associated numpy array.
    #print (feature[0])

    return  feature,logits

where is score_vector.npy

fid = open(os_fspath(file), "rb")
FileNotFoundError: [Errno 2] No such file or directory: 'results/SCORES/score_vector.npy'
Traceback (most recent call last):
File "./code/4-ROC_PR_curve/PlotROC.py", line 73, in

EER vs i-vector

Hi astorfi,
Thanks for such a great work. The pipeline is really great.
But I try ai-shell dataset the kaldi i-vector is around 2% eer. 3D-convolutional-speaker-recognitionwith LDA is 17% eer.
What's wrong? Any help will thank a lot!

ValueError: Convolution expects input with rank 4, got 5

When I run the run.sh, it shows something wrong:

/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88  return f(*args, **kwds)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/sklearn/cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
  "This module will be removed in 0.20.", DeprecationWarning)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/sklearn/ensemble/weight_boosting.py:29: DeprecationWarning: numpy.core.umath_tests is an internal NumPy module and should not be imported. It will be removed in a future NumPy release.
  from numpy.core.umath_tests import inner1d/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/sklearn/grid_search.py:42: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. This module will be removed in 0.20.
  DeprecationWarning)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/sklearn/learning_curve.py:22: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the functions are moved. This module will be removed in 0.20
  DeprecationWarning)
Train data shape: (12, 80, 40, 20)Train label shape: (12,)Test data shape: (12, 80, 40, 20)
Test label shape: (12,)
Traceback (most recent call last):  File "./code/1-development/train_softmax.py", line 602, in <module>    tf.app.run()
  File "/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 125, in run
    _sys.exit(main(argv))
  File "./code/1-development/train_softmax.py", line 414, in main
    logits, end_points_speech = model_speech_fn(batch_speech[i * step: (i + 1) * step])  File "/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/code/1-development/nets/nets_factory.py", line 59, in network_fn
    return func(images, num_classes, is_training=is_training)
  File "/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/code/1-development/nets/cnn_speech.py", line 118, in speech_cnn
    net = slim.conv2d(inputs, 16, [3, 1, 5], stride=[1, 1, 1], scope='conv11')
  File "/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 183, in func_with_args
    return func(*args, **current_args)
  File "/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/tensorflow/contrib/layers/python/layers/layers.py", line 1154, in convolution2d
    conv_dims=2)
  File "/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 183, in func_with_args
    return func(*args, **current_args)
  File "/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/tensorflow/contrib/layers/python/layers/layers.py", line 1025, in convolution
    (conv_dims + 2, input_rank))
ValueError: Convolution expects input with rank 4, got 5
Closing remaining open files:data/development_sample_dataset_speaker.hdf5...done
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
Enrollment data shape: (108, 80, 40, 1)
Enrollment label shape: (108,)
Evaluation data shape: (12, 80, 40, 1)
Evaluation label shape: (12,)
Traceback (most recent call last):
  File "./code/2-enrollment/enrollment.py", line 330, in <module>
    tf.app.run()
  File "/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 125, in run
    _sys.exit(main(argv))
  File "./code/2-enrollment/enrollment.py", line 201, in main
    for i in xrange(FLAGS.num_clones):
NameError: name 'xrange' is not defined
Closing remaining open files:data/development_sample_dataset_speaker.hdf5...donedata/enrollment-evaluation_sample_dataset.hdf5...done
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
Enrollment data shape: (108, 80, 40, 1)
Enrollment label shape: (108,)
Evaluation data shape: (12, 80, 40, 1)
Evaluation label shape: (12,)
Traceback (most recent call last):
  File "./code/3-evaluation/evaluation.py", line 380, in <module>
    tf.app.run()
  File "/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 125, in run
    _sys.exit(main(argv))
  File "./code/3-evaluation/evaluation.py", line 202, in main
    for i in xrange(FLAGS.num_clones):
NameError: name 'xrange' is not defined
Closing remaining open files:data/enrollment-evaluation_sample_dataset.hdf5...donedata/development_sample_dataset_speaker.hdf5...done
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/sklearn/cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
  "This module will be removed in 0.20.", DeprecationWarning)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/sklearn/ensemble/weight_boosting.py:29: DeprecationWarning: numpy.core.umath_tests is an internal NumPy module and should not be imported. It will be removed in a future NumPy release.
  from numpy.core.umath_tests import inner1d
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/sklearn/grid_search.py:42: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. This module will be removed in 0.20.
  DeprecationWarning)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/sklearn/learning_curve.py:22: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the functions are moved. This module will be removed in 0.20
  DeprecationWarning)
Traceback (most recent call last):
  File "./code/4-ROC_PR_curve/calculate_roc.py", line 23, in <module>
    score = np.load(os.path.join(FLAGS.evaluation_dir,'score_vector.npy'))
  File "/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/numpy/lib/npyio.py", line 384, in load
    fid = open(file, "rb")
FileNotFoundError: [Errno 2] No such file or directory: 'results/SCORES/score_vector.npy'
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/sklearn/cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
  "This module will be removed in 0.20.", DeprecationWarning)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/sklearn/ensemble/weight_boosting.py:29: DeprecationWarning: numpy.core.umath_tests is an internal NumPy module and should not be imported. It will be removed in a future NumPy release.
  from numpy.core.umath_tests import inner1d
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/sklearn/grid_search.py:42: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. This module will be removed in 0.20.
  DeprecationWarning)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/sklearn/learning_curve.py:22: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the functions are moved. This module will be removed in 0.20
  DeprecationWarning)
Traceback (most recent call last):
  File "./code/4-ROC_PR_curve/PlotROC.py", line 73, in <module>
    score = np.load(os.path.join(FLAGS.evaluation_dir,'score_vector.npy'))
  File "/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/numpy/lib/npyio.py", line 384, in load
    fid = open(file, "rb")
FileNotFoundError: [Errno 2] No such file or directory: 'results/SCORES/score_vector.npy'
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/sklearn/cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
  "This module will be removed in 0.20.", DeprecationWarning)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/sklearn/ensemble/weight_boosting.py:29: DeprecationWarning: numpy.core.umath_tests is an internal NumPy module and should not be imported. It will be removed in a future NumPy release.
  from numpy.core.umath_tests import inner1d
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/sklearn/grid_search.py:42: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. This module will be removed in 0.20.
  DeprecationWarning)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/sklearn/learning_curve.py:22: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the functions are moved. This module will be removed in 0.20
  DeprecationWarning)
Traceback (most recent call last):
  File "./code/4-ROC_PR_curve/PlotPR.py", line 58, in <module>
    score = np.load(os.path.join(FLAGS.evaluation_dir,'score_vector.npy'))
  File "/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/numpy/lib/npyio.py", line 384, in load
    fid = open(file, "rb")
FileNotFoundError: [Errno 2] No such file or directory: 'results/SCORES/score_vector.npy'
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/sklearn/cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
  "This module will be removed in 0.20.", DeprecationWarning)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/sklearn/ensemble/weight_boosting.py:29: DeprecationWarning: numpy.core.umath_tests is an internal NumPy module and should not be imported. It will be removed in a future NumPy release.
  from numpy.core.umath_tests import inner1d
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/sklearn/grid_search.py:42: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. This module will be removed in 0.20.
  DeprecationWarning)
/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/sklearn/learning_curve.py:22: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the functions are moved. This module will be removed in 0.20
  DeprecationWarning)
Traceback (most recent call last):
  File "./code/4-ROC_PR_curve/PlotHIST.py", line 53, in <module>
    score = np.load(os.path.join(FLAGS.evaluation_dir,'score_vector.npy'))
  File "/home/jovyan/Documents/git/3D-convolutional-speaker-recognition/speacker-rec-py35/lib/python3.5/site-packages/numpy/lib/npyio.py", line 384, in load
    fid = open(file, "rb")
FileNotFoundError: [Errno 2] No such file or directory: 'results/SCORES/score_vector.npy'

Run time error in the demo

When I ran the run.sh, the execution terminated saying:
FileNotFoundError: [Errno 2] No such file or directory: 'results/SCORES/score_vector.npy'

Where do i get this score file from? Do I need to create one? I just ran the run.sh for demo.

Can you please help?

Regards!

explain how to take a single wav file and extract features

I've read your paper and it's really impresive.

Would like to ask you regarding the input preprocessing:

Assume I've got a wav file consisting 0.8 sec
fs, signal = wav.read(file_name)
Then I use mfec=speechpy.feature.mfe(signal,fs)
the size if mfec is [79,40] so I changed the input file to be 0.81sec
and then I received [80,40]...

according to your paper I need [20,80,40] to create one training example so I can create this by duplication my original [80,40] by 20 (this is how you did at testing phase) or by concatenating 20 different utterances of 0.81sec. Is that correct?

Any clarifications would be appreciated!

Alan

Retrain on new and own dataset

Hi,

First, that is a great job and ver well done :)
Now I am trying to use your source code and maybe contribute to it, I am working on a speaker recognition problem to detect if a teacher tutorial is recorded by his voic. I have about 10 hours of historical recordings for 6 teachers. First I used the speechpy to get 3d npy from wav files and used your create_development.py to create the hdf5 files for train and eval. Is that correct? Specially I got 13 instead of 40 regarding the features vector length in the npy files! I ran the run.bash file and it gave me also error saying something like that: ValueError: Negative dimension size caused by subtracting 2 from 1 for 'MaxPool_7' (op: 'MaxPool') with input shapes: [?,1,112,128].

ValueError: axes don't match array

To make multiple models from multiple wav files, I added the following to the input_features.py in order to generate .hdf5 file for all wav files I have:

idx = 0
f = open('file_path_test1.txt','r')
for line in f:
idx = idx + 1

lab = []
feat = []
for i in range(idx):
    feature, label = dataset.__getitem__(i)
    lab.append(label)
    feat.append(feature)
    print(feature.shape) 
    print(label) 
######################
## creating hdf5 file ##
######################
h5file = tables.open_file('/root/3D_CNN/3D-convolutional-speaker-recognition/data/evaluation_test.hdf5', 'w')
label_test = h5file.create_carray(where = '/', name = 'label_enrollment', obj = lab, byteorder = 'little')
label_array = h5file.create_carray(where = '/', name = 'label_evaluation', obj = lab, byteorder = 'little')
utterance_test = h5file.create_earray(where = '/', name = 'utterance_enrollment', chunkshape = [1,20,80,40], obj = feat, byteorder = 'little')
utterance_train = h5file.create_earray(where = '/', name = 'utterance_evaluation', `chunkshape = [1,20,80,40]`, obj = feat, byteorder = 'little')
n5file.close()`

When I ran input_features.py, it gave me the following error:
ValueError: the shape ((0, 1, 20, 80, 40)) and chunkshape ((1, 20, 80, 40)) ranks must be equal.
I recognized that lab and feat are arrays and each one has 9 elements (# of wav files I want to test). Each element of the feat array has the features of each wav file in my wav list. So what I did is changing chunkshape values to be chunkshape = [9,1,20,80,40] and the evaluation_test.hdf5 file was created with no errors.

When I used hdf5 file that I created to run run.sh I got this:


Train data shape: (12, 80, 40, 20)
Train label shape: (12,)
Test data shape: (12, 80, 40, 20)
Test label shape: (12,)
Epoch 1, Minibatch 1 of 4 , Minibatch Loss= 0.0000, TRAIN ACCURACY= 100.000
Epoch 1, Minibatch 2 of 4 , Minibatch Loss= 1.2341, TRAIN ACCURACY= 100.000
Epoch 1, Minibatch 3 of 4 , Minibatch Loss= 1.4641, TRAIN ACCURACY= 0.000
Epoch 1, Minibatch 4 of 4 , Minibatch Loss= 1.4434, TRAIN ACCURACY= 0.000
TESTING after finishing the training on: epoch 1
Test Accuracy 1, Mean= 50.0000, std= 50.000
Closing remaining open files:data/development_sample_dataset_speaker.hdf5...done
Enrollment data shape: (9, 1, 20, 80, 40)
Enrollment label shape: (9,)
Evaluation data shape: (9, 1, 20, 80, 40)
Evaluation label shape: (9,)
INFO:tensorflow:Scale of 0 disables regularizer.
.
.
Traceback (most recent call last):
  File "./code/2-enrollment/enrollment.py", line 330, in <module>
    tf.app.run()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 44, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "./code/2-enrollment/enrollment.py", line 289, in main
    assert len(speaker_index) >= NumUtterance, "At least %d utterances is needed for each speaker" % NumUtterance
AssertionError: At least 20 utterances is needed for each speaker
Closing remaining open files:data/development_sample_dataset_speaker.hdf5...donedata/eval_try.hdf5...done
Enrollment data shape: (9, 1, 20, 80, 40)
Enrollment label shape: (9,)
Evaluation data shape: (9, 1, 20, 80, 40)
Evaluation label shape: (9,)
INFO:tensorflow:Scale of 0 disables regularizer.
.
.
Traceback (most recent call last):
  File "./code/3-evaluation/evaluation.py", line 380, in <module>
    tf.app.run()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 44, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "./code/3-evaluation/evaluation.py", line 329, in main
    speech_evaluation = np.transpose(speech_evaluation[None, :, :, :, :], axes=(1, 4, 2, 3, 0))
  File "/usr/local/lib/python2.7/dist-packages/numpy/core/fromnumeric.py", line 598, in transpose
    return _wrapfunc(a, 'transpose', axes)
  File "/usr/local/lib/python2.7/dist-packages/numpy/core/fromnumeric.py", line 51, in _wrapfunc
    return getattr(obj, method)(*args, **kwds)
ValueError: axes don't match array
Closing remaining open files:data/development_sample_dataset_speaker.hdf5...donedata/eval_try.hdf5...done
('EER=', 43.75, 0.0)
('AUC=', 48.4375, 0.0)
('EER = ', 0.44)
('AUC = ', 0.48)
('AP = ', 0.33)

I'm not sure how to fix this: ValueError: axes don't match array

How to make Speaker Verification (1:1 recognition) model in keras?

I have idea about the speaker identification model using CNN.

image

But here my question is how to make a verification model using the data that contains only positive value.
Suppose, i have voice data of my voice only and i want to create the verification model from this data such that when i run the model then it will only recognize me not anyone else.

please provide the code for it.

image

Where does input_feature.py store it's results?

Hi,

I think train_softmax.py, enrollment.py and evaluation.py get their inputs from the hdf5 files stored in the data folder.
I also think that input_feature.py is supposed to store it's results in these hdf5 files.
But I am not able to figure out which part of the input_feature.py code is responsible for writing the results into the hdf5 files.
Can someone please help me out with this?

Thanks

speaker identification input WAV file

Hi, I am using your function to recognize speaker identification.

I am new in machine learning, could you tell me, can I use this function to do the speaker recognition?

Now, I am able to run your function and out put some graphic, but I do not know how to use those graphic to recognize speaker.

And, I am trying to use my own WAV file to do the training.
However, I get some error:

...3D-convolutional-speaker-recognition-master\code\0-input\create_hdf5\pair_generation.py", line 42, in feed_to_hdf5
    :, 0]
IndexError: too many indices for array
Closing remaining open files:development.hdf5...done

Default file, the feature_mfec.npy file like this:

array([[[  1.41430335e+01,   1.38114970e+00,   8.35106419e-02],
        [  1.41430335e+01,   1.36457288e+00,   8.67885390e-03],
        [  1.39772653e+01,   1.11641647e+00,  -2.56141085e-02],

        ...,
        [  9.24432067e+00,   9.21401209e-01,   1.09648406e-01],
        [  9.19465798e+00,   9.45404618e-01,   1.09012088e-01],
        [  9.37358081e+00,   9.68176377e-01,   1.03772331e-01]]])

I change my WAV change to npy and content like this:

array([[ 143.,  143.],
       [ 136.,  136.],
       [ 121.,  121.],
       ...,
       [  72.,   72.],
       [  81.,   81.],
       [  90.,   90.]])

My transform function:

import numpy as np
import scipy.io.wavfile as wav

temp_npy =wav.read('...\\19-198-0000.wav')

print(temp_npy)

result = np.array(temp_npy[1],dtype=float)

np.save('test_wav_r_values.npy', result)

Do I need to change other type WAV file or I should change other transform function from WAV to npy file?

I also find the import scipy.io.wavfile as wav in the create_development.py, can I just input the WAV for training feed?

Thank your for you provide this function.

Speaker recognition

How to record my voice as input with a identification and verify it by another input. It will recognise the speaker or not and what is the work flow?

i cant built the code

slim.conv2d(inputs, 16, [3, 1, 5], stride=[1, 1, 1], scope='conv11')

throw exception:
The kernel_size argument must be a tuple of 2 integers. Received: [3, 1, 5]

Testing error

C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\h5py_init_.py:36: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type.
from ._conv import register_converters as _register_converters
WARNING:tensorflow:From C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\contrib\learn\python\learn\datasets\base.py:198: retry (from tensorflow.contrib.learn.python.learn.datasets.base) is deprecated and will be removed in a future version.
Instructions for updating:
Use the retry module or similar alternatives.
C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\sklearn\cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
"This module will be removed in 0.20.", DeprecationWarning)
C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\sklearn\grid_search.py:42: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. This module will be removed in 0.20.
DeprecationWarning)
C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\sklearn\learning_curve.py:22: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the functions are moved. This module will be removed in 0.20
DeprecationWarning)
Train data shape: (12, 80, 40, 20)
Train label shape: (12,)
Test data shape: (12, 80, 40, 20)
Test label shape: (12,)
WARNING:tensorflow:From ./code/1-development/train_softmax.py:423: softmax_cross_entropy_with_logits (from tensorflow.python.ops.nn_ops) is deprecated and will be removed in a future version.
Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See tf.nn.softmax_cross_entropy_with_logits_v2.

2018-06-08 08:58:18.940789: I T:\src\github\tensorflow\tensorflow\core\platform\cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
Epoch 1, Minibatch 1 of 4 , Minibatch Loss= 1.3863, TRAIN ACCURACY= 0.000
Epoch 1, Minibatch 2 of 4 , Minibatch Loss= 1.2341, TRAIN ACCURACY= 100.000
Epoch 1, Minibatch 3 of 4 , Minibatch Loss= 0.0000, TRAIN ACCURACY= 0.000
Epoch 1, Minibatch 4 of 4 , Minibatch Loss= 1.0951, TRAIN ACCURACY= 100.000
TESTING after finishing the training on: epoch 1
Test Accuracy 1, Mean= 50.0000, std= 50.000
Closing remaining open files:data/development_sample_dataset_speaker.hdf5...done
C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\h5py_init_.py:36: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type.
from ._conv import register_converters as _register_converters
WARNING:tensorflow:From C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\contrib\learn\python\learn\datasets\base.py:198: retry (from tensorflow.contrib.learn.python.learn.datasets.base) is deprecated and will be removed in a future version.
Instructions for updating:
Use the retry module or similar alternatives.
Enrollment data shape: (108, 80, 40, 1)
Enrollment label shape: (108,)
Evaluation data shape: (12, 80, 40, 1)
Evaluation label shape: (12,)
Traceback (most recent call last):
File "./code/2-enrollment/enrollment.py", line 330, in
tf.app.run()
File "C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\platform\app.py", line 126, in run
sys.exit(main(argv))
File "./code/2-enrollment/enrollment.py", line 201, in main
for i in xrange(FLAGS.num_clones):
NameError: name 'xrange' is not defined
Closing remaining open files:data/development_sample_dataset_speaker.hdf5...donedata/enrollment-evaluation_sample_dataset.hdf5...done
C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\h5py_init
.py:36: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type.
from ._conv import register_converters as _register_converters
WARNING:tensorflow:From C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\contrib\learn\python\learn\datasets\base.py:198: retry (from tensorflow.contrib.learn.python.learn.datasets.base) is deprecated and will be removed in a future version.
Instructions for updating:
Use the retry module or similar alternatives.
Enrollment data shape: (108, 80, 40, 1)
Enrollment label shape: (108,)
Evaluation data shape: (12, 80, 40, 1)
Evaluation label shape: (12,)
Traceback (most recent call last):
File "./code/3-evaluation/evaluation.py", line 380, in
tf.app.run()
File "C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\platform\app.py", line 126, in run
sys.exit(main(argv))
File "./code/3-evaluation/evaluation.py", line 202, in main
for i in xrange(FLAGS.num_clones):
NameError: name 'xrange' is not defined
Closing remaining open files:data/enrollment-evaluation_sample_dataset.hdf5...donedata/development_sample_dataset_speaker.hdf5...done
C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\h5py_init
.py:36: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type.
from ._conv import register_converters as register_converters
C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\sklearn\cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
"This module will be removed in 0.20.", DeprecationWarning)
C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\sklearn\grid_search.py:42: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. This module will be removed in 0.20.
DeprecationWarning)
C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\sklearn\learning_curve.py:22: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the functions are moved. This module will be removed in 0.20
DeprecationWarning)
Traceback (most recent call last):
File "./code/4-ROC_PR_curve/calculate_roc.py", line 23, in
score = np.load(os.path.join(FLAGS.evaluation_dir,'score_vector.npy'))
File "C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\numpy\lib\npyio.py", line 372, in load
fid = open(file, "rb")
FileNotFoundError: [Errno 2] No such file or directory: 'results/SCORES\score_vector.npy'
C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\h5py_init
.py:36: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type.
from ._conv import register_converters as register_converters
C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\sklearn\cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
"This module will be removed in 0.20.", DeprecationWarning)
C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\sklearn\grid_search.py:42: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. This module will be removed in 0.20.
DeprecationWarning)
C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\sklearn\learning_curve.py:22: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the functions are moved. This module will be removed in 0.20
DeprecationWarning)
Traceback (most recent call last):
File "./code/4-ROC_PR_curve/PlotROC.py", line 73, in
score = np.load(os.path.join(FLAGS.evaluation_dir,'score_vector.npy'))
File "C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\numpy\lib\npyio.py", line 372, in load
fid = open(file, "rb")
FileNotFoundError: [Errno 2] No such file or directory: 'results/SCORES\score_vector.npy'
C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\h5py_init
.py:36: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type.
from ._conv import register_converters as register_converters
C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\sklearn\cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
"This module will be removed in 0.20.", DeprecationWarning)
C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\sklearn\grid_search.py:42: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. This module will be removed in 0.20.
DeprecationWarning)
C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\sklearn\learning_curve.py:22: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the functions are moved. This module will be removed in 0.20
DeprecationWarning)
Traceback (most recent call last):
File "./code/4-ROC_PR_curve/PlotPR.py", line 58, in
score = np.load(os.path.join(FLAGS.evaluation_dir,'score_vector.npy'))
File "C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\numpy\lib\npyio.py", line 372, in load
fid = open(file, "rb")
FileNotFoundError: [Errno 2] No such file or directory: 'results/SCORES\score_vector.npy'
C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\h5py_init
.py:36: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type.
from ._conv import register_converters as _register_converters
C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\sklearn\cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
"This module will be removed in 0.20.", DeprecationWarning)
C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\sklearn\grid_search.py:42: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. This module will be removed in 0.20.
DeprecationWarning)
C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\sklearn\learning_curve.py:22: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the functions are moved. This module will be removed in 0.20
DeprecationWarning)
Traceback (most recent call last):
File "./code/4-ROC_PR_curve/PlotHIST.py", line 53, in
score = np.load(os.path.join(FLAGS.evaluation_dir,'score_vector.npy'))
File "C:\Users\user\AppData\Local\Programs\Python\Python36\lib\site-packages\numpy\lib\npyio.py", line 372, in load
fid = open(file, "rb")
FileNotFoundError: [Errno 2] No such file or directory: 'results/SCORES\score_vector.npy'

Dataset for evaluation

Hi Astorfi,
I'm trying to train speaker recognition model with your model.
Since I'm a beginner at programming, I don't understand your code nicely.
For enrollment and evaluation phase, I just have to prepare the data (shape of (sample, 1, 80, 40))??
I read the paper and I don't know if I have to copy the data of single utterance to make the data (shape of (sample, 20, 80, 40)).

Also I prepare the data for development (shape of (97, 20, 80, 40)) using input_feature.py, but do I have to prepare the data (shape of (97, 80, 40, 20))??

Thank you very much.

FileNotFoundError: [Errno 2] No such file or directory: 'results/SCORES\\score_vector.npy'

Hi Astorfi,
Your work is prefect! I've read your paper and it's actually great.
So im new in the field of tensorflow and all and im trying to learn.
Im having a problem when executing ./run.sh and this is the text error :

Traceback (most recent call last):
File "./code/4-ROC_PR_curve/PlotHIST.py", line 53, in
score = np.load(os.path.join(FLAGS.evaluation_dir,'score_vector.npy'))
File "C:\Users\Boulbaba Zitouni\AppData\Local\Programs\Python\Python36\lib\site-packages\numpy\lib\npyio.py", line 372, in load
fid = open(file, "rb")
FileNotFoundError: [Errno 2] No such file or directory: 'results/SCORES\score_vector.npy'

i just cloned the project and tried to run it thats it
i have already installed all the requierements (on the requierements.txt )

Any solutions ?

Speaker recognition possible

Is Speaker recognition possible with this framework? I want to store the input voice pattern as part of enrolment process. then want to verify the input voice and find out who is currently speaking.

Is it possible?

What is the exact meaning of "utterances"?

The term utterances has not been defined anywhere in the paper. I am new to the field of speaker recognition.
Can someone tell me what utterances means in the context of this project?

Thanks in advance

Problem with evaluation.

Hi @astorfi ,thank for your great work, i also use all the same settings but use hdf5 to store training data instead of Audio Dataset. However, my evaluation result is low, EER is up to 40%. I think there is something wrong with my work. Do you have any idea to fix this?
I use VoxCeleb dataset for background model and only use 1 sample per speaker.
50 people for enrollment, 50 for un-enrollment (reject).
4 samples for evaluation.

Thank for your help.

validate accuracy on development dataset

Hi astorfi,
Thanks for your great job. These days I am running your code on my dataset but I found it the validate accuracy is low in my experiments. I have no idea if there is something wrong. What's the validate accuracy of your experiments when the network is converged?

Default training not converging

Running just the train_softmax.py command in the example run.sh script with the sample data doesn't seem to converge, even at 50 epochs.

Command:

python -u ./code/1-development/train_softmax.py --num_epochs=50 --batch_size=3 --development_dataset_path=data/development_sample_dataset_speaker.hdf5 --train_dir=results/TRAIN_CNN_3D/train_logs

Output:

image

Loss:

image

Learning rate:

image

Testing accuracy is not increasing

Hi,

I am getting a training accuracy of 95% on voxceleb but testing accuracy is around 10% only. What can be reason of this?
Speakers=1211
Batch size =100
epoch = 50
Data size = 24000

inputting dataset to development

Hi astorfi,

I'm trying to train my own data set on your model. Is there also an update for the development files to feed dataset from input_features.py? It looks like train_softmax.py still takes in an hdf5 file.

Thanks,
Lucas

Regarding input data

Hi @astorfi
I have gone through your code. While extracting mfcc features for sample audio file it contains shape (420,40) here, 420 is number of frames and 40 is number of features.But In sample data of your code youre applying mfec feature file contains shape (3209,40,3). As per my understanding 3209 is Number of Frames,40 is Number of Features,3 Is number of Channels. I didn't understand the number of channels usage.can you please suggest how to create Feature_mfec.npy file in your format.

About the data pipeline

Hi astorfi:
Can you show an example of how to prepare data for the enrollment stage?
I met some problem in this stage, I process the data as the generation of data for development,
but it doesn't work, the hdf5 format is somewhat annoying m can you show how to implement it just by
using some random data as an example? Thanks a lot

shape mismatch problem in input_feature.py

Hello! We are trying to make our own input pipeline. However, when we follow the getitem method in Audioset (with the setting that cube_shape is (20,80,40)), there is a shape mismatch when the model tries to feed data for batch_speech (placeholder with the shape of (20,80,.40,1)).

After carefully review the code in train_softmax.py, we find that the input shape will conflict with the transpose operation in following code:

speech_train = np.transpose(speech_train[None, :, :, :, :], axes=(1, 4, 2, 3, 0))

What is the solution? Could you give us any help?

RuntimeWarning: numpy.dtype size changed

Hello,
Thank you for a wonderful work in speaker verification
I am trying to execute the code and its giving me the following error.
RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88

Is it because of the different versioning of numpy and scipy? If yes then what are the version which you have used while training?

Thank you for your help !

Convolution expects input with rank 4, got 5

Traceback (most recent call last):
File "./code/1-development/train_softmax.py", line 602, in
tf.app.run()
File "/opt/tensorflow/python2.7/local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "./code/1-development/train_softmax.py", line 414, in main
logits, end_points_speech = model_speech_fn(batch_speech[i * step: (i + 1) * step])
File "/opt/speaker-recognition/code/1-development/nets/nets_factory.py", line 59, in network_fn
return func(images, num_classes, is_training=is_training)
File "/opt/speaker-recognition/code/1-development/nets/cnn_speech.py", line 118, in speech_cnn
net = slim.conv2d(inputs, 16, [3, 1, 5], stride=[1, 1, 1], scope='conv11')
File "/opt/tensorflow/python2.7/local/lib/python2.7/site-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 183, in func_with_args
return func(*args, **current_args)
File "/opt/tensorflow/python2.7/local/lib/python2.7/site-packages/tensorflow/contrib/layers/python/layers/layers.py", line 1154, in convolution2d
conv_dims=2)
File "/opt/tensorflow/python2.7/local/lib/python2.7/site-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 183, in func_with_args
return func(*args, **current_args)
File "/opt/tensorflow/python2.7/local/lib/python2.7/site-packages/tensorflow/contrib/layers/python/layers/layers.py", line 1025, in convolution
(conv_dims + 2, input_rank))
ValueError: Convolution expects input with rank 4, got 5

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.