This repository includes implementations of speaker verification systems that input raw waveforms.
Currently, it has three systems in python.
Detailed instructions on each system is described in individual ReadME
files.
- Code refactoring
- PyTorch ResNet alike model implementation
- Deeper architecture
- Improved feature map scaling method
- α-feature map scaling for raw waveform speaker verification
- Only abstract is in English
- α-feature map scaling for raw waveform speaker verification
- Angular loss function adopted
- Performance
- EER 1.91%
- Trained using VoxCeleb2
- VoxCeleb1 original trial
- Will be used as a baseline system for authors' future works
- EER 1.91%
- Improved performance than RawNet
- DNN speaker embedding extraction with raw waveform inputs
- cosine similarity back-end
- EER 4.8% -->> 2.56%
- VoxCeleb1 original trial
- Uses a technique named feature map scaling
- scales feature map alike squeeze-excitation
- Implemented in PyTorch.
- Published as a conference paper in Interspeech 2020.
@article{jung2020improved,
title={Improved RawNet with Feature Map Scaling for Text-independent Speaker Verification using Raw Waveforms},
author={Jung, Jee-weon and Kim, Seung-bin and Shim, Hye-jin and Kim, Ju-ho and Yu, Ha-Jin},
journal={Proc. Interspeech 2020},
pages={3583--3587},
year={2020}
}
- DNN-based speaker embedding extractor used with another DNN-based classifier
- Built on top of authors' previous works on raw waveform speaker verification
- EER 4.8% with cosine simaility back-end, 4.0% with proposed concat&mul back-end
- VoxCeleb1 original trial
- Implemented in Keras and PyTorch
- Published as a conference paper in Interspeech 2019.
@article{jung2019RawNet,
title={RawNet: Advanced end-to-end deep neural network using raw waveforms for text-independent speaker verification},
author={Jung, Jee-weon and Heo, Hee-soo and Kim, ju-ho and Shim, Hye-jin and Yu, Ha-jin},
journal={Proc. Interspeech 2019},
pages={1268--1272},
year={2019}
}