RPNSD

PyTorch implementation of RPNSD. Our code is largely based on a Faster R-CNN implementation faster-rcnn.pytorch by jwyang.

Install

Clone this project

git clone https://github.com/HuangZiliAndy/RPNSD.git
cd RPNSD

Add your Python path to PATH variable in path.sh, the current default is ~/anaconda3/bin.
Install PyTorch (0.4.0) and torchvision according to your CUDA version

conda install pytorch==0.4.0 cuda91 torchvision pillow"<7" -c pytorch

Install the packages in requirements.txt

pip install -r requirements.txt

Prepare Kaldi and Faster R-CNN library (You can specify a Kaldi root if you already have it)

cd tools
make KALDI=<path/to/a/compiled/kaldi/directory>

Set your backend computing environment to cmd.sh

# Select the backend used by run.sh from "local", "sge", "slurm", or "ssh"
cmd_backend='local'

Data preparation

The purpose of this step includes

Prepare a large diarization dataset with Mixer6, SRE and SWBD. The majority of the dataset is two-channel telephone conversation of two people. We sum up the channels to create diarization style training data.
Prepare test set with CALLHOME dataset. Since the CALLHOME dataset doesn't specify train/dev/test, we use 5 folds cross validation.

./run_prepare_shared.sh

Train

Training on the Mixer6 + SRE + SWBD dataset. Default setting uses single GPU and takes about 4 days.

./train.sh

Pretrained model is available at pretrain-model.

Adapt

Adapt the model on in-domain data. Since we use 5 folds cross validation, each time we train on 400 utterances from CALLHOME dataset and test on 100.

./adapt.sh

Inference

Inference stage.

Forward the network to get speech region proposals, speaker embedding and background probability.
Post-processing with clustering and NMS.
Compute Diarization Error Rate (DER).

./inference.sh

Demo

One example from CALLHOME dataset. The first stream is the ground truth label, the second stream is the x-vector system, and the third stream is RPNSD.

Citation

@inproceedings{huang2020speaker,
    Title={Speaker Diarization with Region Proposal Network},
    Author={Huang, Zili and Watanabe, Shinji and Fujita, Yusuke and Garcia, Paola and Shao, Yiwen and Povey, Daniel and Khudanpur, Sanjeev},
    Booktitle={Accepted to ICASSP 2020},
    Year={2020}
}

@article{jjfaster2rcnn,
    Author = {Jianwei Yang and Jiasen Lu and Dhruv Batra and Devi Parikh},
    Title = {A Faster Pytorch Implementation of Faster R-CNN},
    Journal = {https://github.com/jwyang/faster-rcnn.pytorch},
    Year = {2017}
}

emrys365 / rpnsd Goto Github PK

rpnsd's Introduction

RPNSD

Install

Data preparation

Train

Adapt

Inference

Demo

Citation

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent