Giter Club home page Giter Club logo

language-detector's Introduction

Language Detector

Detect which language it is from speech (Chinese or English). For more information, check out the blog post here.

Requirements

  • Python2.7
  • FFmpeg (convert audios to wav format): How to install
  • Freetype and png (preprocessing needed): sudo apt-get install libfreetype6-dev; sudo apt-get install libpng-dev
  • Spark (preprocessing, convert wav audios to spectrogram images): How to install
  • Tensorflow (train neural network models): managed by uranium, no need to install manually

Data & Results

  • Raw data: 635 minutes of Chinese interviews from Luyu Official (i.e., Lu Yu You Yue), and 534 minutes of English interviews from Ellen Show, both on YouTube
  • Processed data: 38122 spectrogram images for Chinese interviews, and 32079 spectrogram images for English interviews (one image for one second of speech)
  • Train/test data split: processed data are labelled, mixed, shuffled and split into train/test sets by 80%/20%
  • Evaluation accuracy: 92.7% (on test set) achieved from Berlinnet neural network model trained by 19300 iterations

How to Use

  1. Download raw data from YouTube; the downloaded data will be under ./data/raw/

    ./uranium download

    You can customize your download list in ./language_detection/data_acquisition/sources.yml

  2. Preprocess (using Spark) raw data and label; the processed data (spectrogram images) will be under ./data/rst/, and labelled spectrogram image indices will be under ./data/labelled/

    ./uranium preprocess

  3. Train (using Tensorflow) the neural network model; the trained model will be under ./snapshots/

    ./uranium train

    The neural network model, Berlinnet (a shallow network model adopted from here), is used by default; tweak the configuration in `./language_detector/modeling/config.yaml' if necessary

    Depending on the desired number of training iterations and your hardware, it could take hours to days

    To visualize the model and training progress via TensorBoard, run ./uranium visualize and go to localhost:6006

  4. Evaluate the trained model on the test data set

    ./uranium evaluate

    You can do this no matter whether the training is complete or not; when the training is still in progress, the evaluation is performed upon the checkpoint wherever the training progress is

    Set up the checkpoint properly by making modification here

Acknowledgment

This project is inspired by and a large portion of codes comes from the great work here.

language-detector's People

Contributors

stlong0521 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.