Giter Club home page Giter Club logo

dccrn-with-various-loss-functions's Introduction

DCCRN with various loss functions

DCCRN(Deep Complex Convolutional Recurrent Network) is one of the deep neaural networks proposed at [1]. This repository is an application using DCCRN with various loss functions. Our original paper can be found here, and you can check test samples here. Test samples are randomly choosed and we uploaded samples about SI-SNR and SI-SNR+LMS.

DCCRN_수정최종

Source of the figure: paper


Loss functions

We use two base loss functions and two perceptual loss functions.

Base loss

  1. MSE: Mean Squred Error
    image

  1. SI-SNR: Scale Invariant Source-to-Noise Ratio
    image

Perceptual loss

  1. LMS: Log Mel Spectra
    image

  1. PMSQE: Perceptual Metric for Speech Quality Evaluation
    image

We combined 2 types of base loss functons and 2 types of perceptual loss functions. The coupling constant ratio was determined experimentally. For example, in the case of MSE, which is the basic loss function, the initial size is about 0.001 ~ 0.002, whereas the LMS has an initial size of 0.1 ~ 0.2 and PMSQE is about 0.8 ~ 1.3. Therefore, to combine the two terms to be of similar size, a smaller coefficient was used in the perceptual based loss function term. The coupling constant ratio is a result of reflecting the dynamic range of the two terms rather than reflecting the sensitivity of the two terms. Meanwhile, in the course of the experiment, we determined that the basic loss function is a more important term, so we changed the coefficients so that the dynamic range ratio including the coupling constant could be adjusted from 1:1 to 10:1, respectively.

Requirements

This repository is tested on Ubuntu 20.04.

  • Python 3.7+
  • Cuda 10.1+
  • CuDNN 7+
  • Pytorch 1.7+

Library

  • tqdm
  • asteroid
  • scipy
  • matplotlib
  • tensorboardX
  • pesq
  • pystoi

Prepare data

The training and validation data consist of the following three dimensions.
[Batch size, 2(input & target), wav length]


The test data consists of the following dimensions.
[noise type, dB classes, Batch size, 2(input & target), wav length]
We use 2 type of noise, seen and unseen and 7 dB classes from -10dB to 20dB.


We cut the wav files longer than 3 seconds into 3 seconds and zero padded for wav files shorter than 3 seconds. The sampling frequency is 16k.

Performance comparative evaluation

Objective evaluation


We evaluate the outputs with PESQ(Perceptual Evaluation of Speech Quality) and STOI(Short Time Objective Intelligibility measure).
t1

t2

Spectrogram

image

Source of the figure: paper

The spectrograms of (a) clean speech, (b) noisy speech at 0 dB SNR, estimated speeches using (c) MSE and PMSQE, (d) SI-SNR , (e) SI-SNR and PMSQE, (f) SI-SNR and LMS.

References

DCCRN: Deep Complex Convolution Recurrent Network for Phase-Aware Speech Enhancement
Yanxin Hu, Yun Liu, Shubo Lv, Mengtao Xing, Shimin Zhang, Yihui Fu, Jian Wu, Bihong Zhang, Lei Xie
[arXiv] [code]

Paper

Performance comparison evaluation of speech enhancement using various loss function.
Seo-Rim Hwang, Joon Byun, Young-Cheul Park
[paper]

Note

dccrn-with-various-loss-functions's People

Contributors

seorim0 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

dccrn-with-various-loss-functions's Issues

Format of .npy files

Hi, the datasets in the code are loaded with .npy files. What type of information do they contain? Is it just the path to each individual file?

ModuleNotFoundError: No module named 'run'

Hi, I am new to deep learning.

In the trainer.py, line 17:
from run import model_train, model_validate, model_test

I dont find the run.py in the project.
Are those 3 models in train.py?

Thank you!谢谢!

Missing function

Hi,
the log_data_we_want function is missing in write_on_tensorboard.py

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.