Giter Club home page Giter Club logo

citisen's Introduction

CITISEN

CITISEN video

Introduction

In this work, we present a deep-learning-based speech signal processing mobile application, termed CITISEN, which supports three functions: speech enhancement (SE), acoustic scene conversion (ASC), and model adaptation (MA). For the SE function, CITISEN could effectively reduce noise components from speech signals and accordingly enhance their clarity and intelligibility. For the ASC function, CITISEN converts the current background sound to that sounds like a different background. Finally, the MA function in CITISEN can effectively adapt SE models with few audio files when encountering unknown speaker or unseen noise types; the adapted SE model is used to carry out enhancement on the upcoming noisy utterances. Experimental results have confirmed the effectiveness of these three functions in terms of objective evaluation and subjective listening tests. The promising results reveal that the developed CITISEN mobile application can be potentially used as a front-end processor for various speech-related services, such as voice communication, assertive hearing devices, and virtual reality headsets.

User interface and usage

Four main pages in CITISEN

main

The CITISEN application has four pages, "Speech Enhancement", "Acoustic Scene Conversion", "Model Adaptation", and "Recording". The page name and the navigator buttons of each page are listed on the top-left and bottom in the application, respectively.

Speech Enhancement page

main

For the "Speech Enhancement" page, a user first specify his/her gender identity. Then, by pressing the "SE Model Switch" button, the user can select one suitable SE models from a listed of saved models. CITISEN provides several default SE models trained using our own collected speech data sets. Users can also run MA to prepare adapted SE models and save them as new SE models. Then, by pressing the SE button, the noisy speech is then transformed to clean one online.

Acoustic Scene Conversion page

main

In the "Acoustic Scene Conversion" page, CITISEN mixes acoustic scene on enhanced speech to generate new speech signals with converted acoustic scene. The "Acoustic Scene Conversion" page has a "Record Noise" button, by which users can record and save noise signals for ASC. Meanwhile, the page has a volume bar by which allows users to adjust the volume of background noise and accordingly specify the SNR level of converted speech. To change the acoustic scenes, users first press “SE Model Switch” button to select an SE model. Then, by pressing “Background Noise Switch” button, as shown in the left side, and an acoustic scene selection window will pop up and list all acoustic scene options, as shown in the right side. Users can select the target scene for ASC, and the speech with converted scene will be generated accordingly.

Model Adaptation page

main

There are two file upload buttons: "Record Noise" and "Record Speech", as shown in the left side. By pressing one of these two buttons, users can record pure noise or speaker speech signals and upload the recorded audio to our server. To start recording, users could simply press on one of the buttons, as shown in left side. Once finishing recording, by pressing the button again, CITISEN will pop up a submitting window, as shown in right side. The submitting window will ask the user to name the audio file, and the audio will be sent to the server. After receiving the audio file, the server will estimate an adapted SE model by fine-tuning the original SE model using the recorded audio data. The name of the audio file will also be used to name the adapted SE model, which is later sent from the server to mobile device and appears in the "Speech Enhancement" and "Acoustic Scene Conversion" pages. Accordingly, users can run SE and ASC functions using the adapted SE model.

Recording page

A. recording or loading saved audio files

main

B. selecting a model to perform SE

main

C. demonstrating the processed speech by spectrogram plots

main

The "Recording" page is for users to record speech and noise of the current environment and also save the enhanced or converted audio files. For the "Speech Enhancement" and "Acoustic Scene Conversion" pages, users can immediately listen to enhanced or converted speech online. On the other hand, the "Recording" page allows users to save and playback later on the processed audio files. Users first record (upper path in Fig. A) or load an existing (bottom path in Fig. A) audio file and then press on the "SE Model Switch" button. Then, an SE model selection window will pop up, as shown on the right of Fig. B. By selecting a suitable SE model and then pressing on run button (as shown on left side of Fig. B), enhanced speech will be generated. CITISEN has the function to demonstrate two spectrogram plots: noisy and enhanced speech spectrogram plots (as shown on right side of Fig. C), so that users can visually check the SE results. In addition to these two plots, users can press "Play" and "Stop" buttons, on top of spectrogram plots, to play and listen the original and processed audio files.

Download

  • Download apk for Android.
Google drive Dropbox
URL links links
QR code main main
  • [Download] CITISEN for iOS.

Paper

  • See Paper for more detail.

Results and demo

Citations

@misc{alex2020citisen,
title={CITISEN: A Deep Learning-Based Speech Signal-Processing Mobile Application},
author={Alexander Chao-Fu Kang and Kuo-Hsuan Hung and Yu-Wen Chen and You-Jin Li and Ya-Hsin Lai and Kai-Chun Liu and Sze-Wei Fu and Syu-Siang Wang and Yu Tsao},
year={2020},
eprint={2008.09264},
archivePrefix={arXiv},
primaryClass={eess.AS}
}

License

  • The CITISEN work is released under MIT License. See LICENSE for more details.

Acknowledgments

citisen's People

Contributors

yuwchen avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.