Giter Club home page Giter Club logo

exu-net's Introduction

ExU-Net

Pytorch code for following paper:

  • Title : Extended U-Net for Speaker Verification in Noisy Environments (Oral presentation at Interspeech2022)
  • Autor : Ju-ho Kim, Jungwoo Heo, Hye-jin Shim, and Ha-Jin Yu

Abstract

Background noise is a well-known factor that deteriorates the accuracy and reliability of speaker verification (SV) systems by blurring speech intelligibility. Various studies have used separate pretrained enhancement models as the front-end module of the SV system in noisy environments, and these methods effectively remove noises. However, the denoising process of independent enhancement models not tailored to the SV task can also distort the speaker information included in utterances. We argue that the enhancement network and speaker embedding extractor should be fully jointly trained for SV tasks under noisy conditions to alleviate this issue. Therefore, we proposed a U-Net-based integrated framework that simultaneously optimizes speaker identification and feature enhancement losses. Moreover, we analyzed the structural limitations of using U-Net directly for noise SV tasks and further proposed Extended U-Net to reduce these drawbacks. We evaluated the models on the noise-synthesized VoxCeleb1 test set and VOiCES development set recorded in various noisy scenarios. The experimental results demonstrate that the U-Net-based fully joint training framework is more effective than the baseline, and the extended U-Net exhibited state-of-the-art performance versus the recently proposed compensation systems.

Prerequisites

Environment Setting

  • We used 'nvcr.io/nvidia/pytorch:21.04-py3' image of Nvidia GPU Cloud for conducting our experiments.
  • We used three Titan RTX GPUs for training.
  • Python 3.6.9
  • Pytorch 1.8.1
  • Torchaudio 0.8.1

Datasets

We used VoxCeleb1 dataset for training and test. For noise synthesis, we used the MUSAN corpus.

Training

Go into run directory
./train.sh

Citation

@article{kim2022extended,
  title={Extended U-Net for Speaker Verification in Noisy Environments},
  author={Kim, Ju-ho and Heo, Jungwoo and Shim, Hye-jin and Yu, Ha-Jin},
  journal={arXiv preprint arXiv:2206.13044},
  year={2022}
}

exu-net's People

Contributors

wngh1187 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

exu-net's Issues

about testing other datasets

Hi, I trained the model according to your guidance, but is there a special testing section? What if I want to test the trained model on other datasets?Such as Voices ?

about single GPU

Hello, I recently found a problem when reproducing the experimental results, the original was to use multi-GPU training, but these two days, the resources are limited, and found that the model can not run normally on a single gpu training, may I ask you have encountered this problem, do you have a single gpu training version, thank you very much, here is my error:
Traceback (most recent call last):
File "/home/wangzh22/anaconda3/envs/my_pytorch/lib/python3.9/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
fn(i, *args)
File "/home/wangzh22/pycharm/ExU-light/main.py", line 140, in run
trainer.run()
File "/home/wangzh22/pycharm/ExU-light/trainers/train.py", line 31, in run
self.test(epoch)
File "/home/wangzh22/pycharm/ExU-light/trainers/train.py", line 125, in test
self.embeddings = self._enrollment()
File "/home/wangzh22/pycharm/ExU-light/trainers/train.py", line 201, in _enrollment
embedding_dict[keys[i]] = embeddings[i]
TypeError: unhashable type: 'list'

About VOICE Dateset

Hello, I'm trying to replicate your results on the VOICES dataset, but I'm having huge problems,how do you use VOICES as an additional verification set, I see that its trails list and voxceleb1 is very different, voxceleb1 is the audio at both ends and then gives 0/1 to represent whether it belongs to the same speaker, but VOICES is a column of labels, a column of audio Then imp/tar, how do you use it, or did you randomly make a list of trails like voxceleb1, I would like to humbly ask you, I have never encountered this kind of list before, and it is the first time to contact the VOICES dataset, I don't know how to deal with it,I see that you have not uploaded the relevant code on github, perhaps it is convenient, can you tell me how to deal with it, thank you very much for your help!The format of the Voxceleb1 and VOICES datasets is as follows:
   (1)Voxceleb1_trails:
    1 id10270/x6uYqmx31kE/00002.wav id10270/GWXujl-xAVM/00035.wav
    0 id10270/x6uYqmx31kE/00002.wav id10306/uzt36PBzT2w/00001.wav
    1 id10270/x6uYqmx31kE/00002.wav id10270/GWXujl-xAVM/00038.wav

   (2)VOICES_dev-trial-keys.lst:
    Lab41-SRI-VOiCES-rm1-none-sp3446-ch144019-sg0006-mc03-stu-mid-dg080 sid_dev/sp3521/Lab41-SRI-VOiCES-rm2-musi-sp3521-ch012715-sg0017-mc04-lav-mid-dg090.wav imp
    Lab41-SRI-VOiCES-rm1-none-sp3446-ch144019-sg0006-mc03-stu-mid-dg080 sid_dev/sp3521/Lab41-SRI-VOiCES-rm2-musi-sp3521-ch012715-sg0006-mc10-lav-cec-dg120.wav imp

Trouble with reproducing result

Hi,

Thank you for making the code & pretrained models public. I am trying to reproduce the results. However, the result I got after running your pretrained ExU-Net model on VoxCeleb1-O is different from the paper's result. My reproduce result is EER_clean: 0.22067868504771956. To get the result, I just commented the train step and load the pretrained model in the test step. I am asking whether you have encountered this problem?

Once again, thank you for sharing the code. I am looking forward to hear from you!

about a lightweight variant of the ExU-Net

Hello, I saw that ExU-Net-L was mentioned in the article. But it seems that it is not given in the code you gave. Can you tell me what changes have been made based on ExU_Net?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.