Hello Hervé,
I am currently trying to train the speaker embedding module using the TristouNet architecture but I end up with a loss of nan
from the second epoch... So, here is the command I am running:
$ pyannote-speaker-embedding-keras train --database=db.yml --subset=train tutorials/speaker-embedding/2+0.5/TristouNet Etape.SpeakerDiarization.TV
And here are the warnings/log messages:
/home/mahu/anaconda3/envs/pyannote/lib/python3.5/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
from ._conv import register_converters as _register_converters
Using TensorFlow backend.
/home/mahu/anaconda3/envs/pyannote/lib/python3.5/site-packages/pyannote/generators/indices.py:84: UserWarning: 5 labels (out of 179) have less than 3 training samples.
per_label=per_label))
Epoch 1/1000
2018-01-24 17:20:21.787683: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX
/home/mahu/anaconda3/envs/pyannote/lib/python3.5/site-packages/autograd/core.py:81: RuntimeWarning: divide by zero encountered in power
result_value = self.fun(*argvals, **kwargs)
/home/mahu/anaconda3/envs/pyannote/lib/python3.5/site-packages/autograd/numpy/numpy_grads.py:84: RuntimeWarning: invalid value encountered in multiply
anp.sqrt.defvjp( lambda g, ans, vs, gvs, x : g * 0.5 * x**-0.5)
/home/mahu/anaconda3/envs/pyannote/lib/python3.5/site-packages/autograd/numpy/numpy_grads.py:46: RuntimeWarning: invalid value encountered in multiply
unbroadcast(vs, gvs, g * y * x ** anp.where(y, y - 1, 1.)))
/home/mahu/anaconda3/envs/pyannote/lib/python3.5/site-packages/numpy/core/_methods.py:29: RuntimeWarning: invalid value encountered in reduce
return umr_minimum(a, axis, None, out, keepdims)
/home/mahu/anaconda3/envs/pyannote/lib/python3.5/site-packages/numpy/core/_methods.py:26: RuntimeWarning: invalid value encountered in reduce
return umr_maximum(a, axis, None, out, keepdims)
1/1 [==============================] - 36s - loss: 0.0535
Epoch 2/1000
1/1 [==============================] - 30s - loss: nan
Epoch 3/1000
1/1 [==============================] - 31s - loss: nan
Some minor details: as you may have guessed, I have slightly changed the options to the train
method of pyannote-speaker-embedding-keras
so that, like the data
command, a path other than ~/.pyannote/db.yml
could be specified for the db.yml
file.
The various config.yml
files (tutorial/speaker-embedding/config.yml
and tutorial/speaker-embedding/2+0.5/TristouNet/config.yml
) have the same content as what is given in the corresponding tutorial.
That said, another odd thing is that 2 progress indicators are printed when running
$ pyannote-speaker-embedding-keras data --database=db.yml --duration=2 --step=0.5 tutorials/speaker-embedding/ Etape.SpeakerDiarization.TV
as shown below
Training set: 0it [00:00, ?it/s]
Training set: 28it [02:57, 6.32s/it]
100%|████████████████████████████████████| 81433/81433 [00:37<00:00, 2148.18it/s]
Development set: 0it [00:00, ?it/s]
Development set: 9it [00:47, 5.32s/it]
100%|████████████████████████████████████| 23298/23298 [00:11<00:00, 2082.87it/s]
Test set: 0it [00:00, ?it/s]
Test set: 9it [00:50, 5.66s/it]
100%|████████████████████████████████████| 22815/22815 [00:10<00:00, 2132.08it/s]
So I don't really know if the problem I am facing comes from the training phase or from the data used to train the NN. Looking a bit around, the warnings given by autograd
may be related to the bug reported here.
Have you encountered this problem before? And if so, how did you manage to circumvent it?
Cheers,
Mathieu
Edit:
After some printing, it appears that logs = self.loss_and_grad(batch, embedding)
in pyannote/audio/embedding/approaches_keras/base.py
, l. 333, yields a gradient with NaN
values on the first epoch. However, I haven't been able to find the definition of loss_and_grad
to narrow down the problem yet.