Language Detector
Detect which language it is from speech (Chinese or English). For more information, check out the blog post here.
Requirements
- Python2.7
- FFmpeg (convert audios to wav format): How to install
- Freetype and png (preprocessing needed):
sudo apt-get install libfreetype6-dev; sudo apt-get install libpng-dev
- Spark (preprocessing, convert wav audios to spectrogram images): How to install
- Tensorflow (train neural network models): managed by uranium, no need to install manually
Data & Results
- Raw data: 635 minutes of Chinese interviews from Luyu Official (i.e., Lu Yu You Yue), and 534 minutes of English interviews from Ellen Show, both on YouTube
- Processed data: 38122 spectrogram images for Chinese interviews, and 32079 spectrogram images for English interviews (one image for one second of speech)
- Train/test data split: processed data are labelled, mixed, shuffled and split into train/test sets by 80%/20%
- Evaluation accuracy: 92.7% (on test set) achieved from Berlinnet neural network model trained by 19300 iterations
How to Use
-
Download raw data from YouTube; the downloaded data will be under
./data/raw/
./uranium download
You can customize your download list in
./language_detection/data_acquisition/sources.yml
-
Preprocess (using Spark) raw data and label; the processed data (spectrogram images) will be under
./data/rst/
, and labelled spectrogram image indices will be under./data/labelled/
./uranium preprocess
-
Train (using Tensorflow) the neural network model; the trained model will be under
./snapshots/
./uranium train
The neural network model,
Berlinnet
(a shallow network model adopted from here), is used by default; tweak the configuration in `./language_detector/modeling/config.yaml' if necessaryDepending on the desired number of training iterations and your hardware, it could take hours to days
To visualize the model and training progress via TensorBoard, run
./uranium visualize
and go tolocalhost:6006
-
Evaluate the trained model on the test data set
./uranium evaluate
You can do this no matter whether the training is complete or not; when the training is still in progress, the evaluation is performed upon the checkpoint wherever the training progress is
Set up the checkpoint properly by making modification here
Acknowledgment
This project is inspired by and a large portion of codes comes from the great work here.