wngh1187 / exu-net Goto Github PK

View Code? Open in Web Editor NEW

24.0 2.0 0.0 38.2 MB

Pytorch implementation of Extended U-Net for Speaker Verification in Noisy Environments

License: MIT License

Python 99.90% Shell 0.10%

speaker-verification noisy-environment feature-enhancement u-net fully-joint-training

exu-net's Introduction

ExU-Net

Pytorch code for following paper:

Title : Extended U-Net for Speaker Verification in Noisy Environments (Oral presentation at Interspeech2022)
Autor : Ju-ho Kim, Jungwoo Heo, Hye-jin Shim, and Ha-Jin Yu

Abstract

Background noise is a well-known factor that deteriorates the accuracy and reliability of speaker verification (SV) systems by blurring speech intelligibility. Various studies have used separate pretrained enhancement models as the front-end module of the SV system in noisy environments, and these methods effectively remove noises. However, the denoising process of independent enhancement models not tailored to the SV task can also distort the speaker information included in utterances. We argue that the enhancement network and speaker embedding extractor should be fully jointly trained for SV tasks under noisy conditions to alleviate this issue. Therefore, we proposed a U-Net-based integrated framework that simultaneously optimizes speaker identification and feature enhancement losses. Moreover, we analyzed the structural limitations of using U-Net directly for noise SV tasks and further proposed Extended U-Net to reduce these drawbacks. We evaluated the models on the noise-synthesized VoxCeleb1 test set and VOiCES development set recorded in various noisy scenarios. The experimental results demonstrate that the U-Net-based fully joint training framework is more effective than the baseline, and the extended U-Net exhibited state-of-the-art performance versus the recently proposed compensation systems.

Prerequisites

Environment Setting

We used 'nvcr.io/nvidia/pytorch:21.04-py3' image of Nvidia GPU Cloud for conducting our experiments.
We used three Titan RTX GPUs for training.
Python 3.6.9
Pytorch 1.8.1
Torchaudio 0.8.1

Datasets

We used VoxCeleb1 dataset for training and test. For noise synthesis, we used the MUSAN corpus.

Training

Go into run directory
./train.sh

Citation

@article{kim2022extended,
  title={Extended U-Net for Speaker Verification in Noisy Environments},
  author={Kim, Ju-ho and Heo, Jungwoo and Shim, Hye-jin and Yu, Ha-Jin},
  journal={arXiv preprint arXiv:2206.13044},
  year={2022}
}

exu-net's People

Contributors

Stargazers

Watchers

exu-net's Issues

Question about main.py

In main.py line 177, one parameter 'process_id' for function 'run' is missing?

The dimension problem of the first convolutional layer

Hi,

Thank you for making the code & pretrained models public.When I was debuggy, I found that the input dimension of the first convolutional layer seemed different from that in the paper.

about testing other datasets

Hi, I trained the model according to your guidance, but is there a special testing section? What if I want to test the trained model on other datasets?Such as Voices ?

about single GPU

Hello, I recently found a problem when reproducing the experimental results, the original was to use multi-GPU training, but these two days, the resources are limited, and found that the model can not run normally on a single gpu training, may I ask you have encountered this problem, do you have a single gpu training version, thank you very much, here is my error:
Traceback (most recent call last):
File "/home/wangzh22/anaconda3/envs/my_pytorch/lib/python3.9/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
fn(i, *args)
File "/home/wangzh22/pycharm/ExU-light/main.py", line 140, in run
trainer.run()
File "/home/wangzh22/pycharm/ExU-light/trainers/train.py", line 31, in run
self.test(epoch)
File "/home/wangzh22/pycharm/ExU-light/trainers/train.py", line 125, in test
self.embeddings = self._enrollment()
File "/home/wangzh22/pycharm/ExU-light/trainers/train.py", line 201, in _enrollment
embedding_dict[keys[i]] = embeddings[i]
TypeError: unhashable type: 'list'

About VOICE Dateset

Hello, I'm trying to replicate your results on the VOICES dataset, but I'm having huge problems,how do you use VOICES as an additional verification set, I see that its trails list and voxceleb1 is very different, voxceleb1 is the audio at both ends and then gives 0/1 to represent whether it belongs to the same speaker, but VOICES is a column of labels, a column of audio Then imp/tar, how do you use it, or did you randomly make a list of trails like voxceleb1, I would like to humbly ask you, I have never encountered this kind of list before, and it is the first time to contact the VOICES dataset, I don't know how to deal with it,I see that you have not uploaded the relevant code on github, perhaps it is convenient, can you tell me how to deal with it, thank you very much for your help!The format of the Voxceleb1 and VOICES datasets is as follows：
(1)Voxceleb1_trails:
1 id10270/x6uYqmx31kE/00002.wav id10270/GWXujl-xAVM/00035.wav
0 id10270/x6uYqmx31kE/00002.wav id10306/uzt36PBzT2w/00001.wav
1 id10270/x6uYqmx31kE/00002.wav id10270/GWXujl-xAVM/00038.wav

(2)VOICES_dev-trial-keys.lst:
Lab41-SRI-VOiCES-rm1-none-sp3446-ch144019-sg0006-mc03-stu-mid-dg080 sid_dev/sp3521/Lab41-SRI-VOiCES-rm2-musi-sp3521-ch012715-sg0017-mc04-lav-mid-dg090.wav imp
Lab41-SRI-VOiCES-rm1-none-sp3446-ch144019-sg0006-mc03-stu-mid-dg080 sid_dev/sp3521/Lab41-SRI-VOiCES-rm2-musi-sp3521-ch012715-sg0006-mc10-lav-cec-dg120.wav imp

Trouble with reproducing result

Hi,

Thank you for making the code & pretrained models public. I am trying to reproduce the results. However, the result I got after running your pretrained ExU-Net model on VoxCeleb1-O is different from the paper's result. My reproduce result is EER_clean: 0.22067868504771956. To get the result, I just commented the train step and load the pretrained model in the test step. I am asking whether you have encountered this problem?

Once again, thank you for sharing the code. I am looking forward to hear from you!

about a lightweight variant of the ExU-Net

Hello, I saw that ExU-Net-L was mentioned in the article. But it seems that it is not given in the code you gave. Can you tell me what changes have been made based on ExU_Net?