Flask Microservice implementing https://arxiv.org/abs/2203.11774 on-demand
Fork of SpeakerProfiling
. Download the weights (checkpoint file) and place it in a model_checkpoint
folder to prepare the model for deployment.
This service provides an API endpoint to predict age, height, and gender from raw PCM audio data. It's built using a Flask server and utilizes a machine learning model trained on the TIMIT dataset.
To start the server, run the Python script containing the Flask application.
python flask_server.py
This will start the server on http://127.0.0.1:5000/
by default.
-
URL
/predict
-
Method:
POST
-
Data Params
Required:
audio=[raw PCM audio data]
-
Success Response:
- Code: 200
Content:{ "age": "predicted_age", "height": "predicted_height", "gender_is_female": "predicted_gender" }
- Code: 200
-
Error Response:
- Code: 400 BAD REQUEST
Content:{ error : "Error message" }
- Code: 400 BAD REQUEST
-
Sample Call:
curl -X POST -H "Content-Type: application/octet-stream" --data-binary @your_audio_file.raw http://127.0.0.1:5000/predict
See flask_test.py
for an example client.
- The endpoint expects raw PCM audio data with a sample rate of 16000 Hz.
- Ensure that your audio file is in the correct format before sending it to the server.
- The predictions (age, height, and gender) are based on the model trained on the TIMIT dataset and may vary based on the quality and characteristics of the input audio.
MIT License
Original README follows
This Repository contains the code for estimating the Age and Height of a speaker with their speech signal. This repository uses s3prl library to load various upstream models like wav2vec2, CPC, TERA etc. This repository uses TIMIT dataset.
NOTE: If you want to run the single encoder model, you should checkout the singleEncoder
branch and follow the README in that branch.
Use the package manager pip to install the required packages for preparing the dataset, training and testing the model.
pip install -r requirements.txt
wget https://data.deepai.org/timit.zip
unzip timit.zip -d 'path to timit data folder'
python TIMIT/prepare_timit_data.py --path='path to timit data folder'
Update the config.json file to update the upstream model, batch_size, gpus, lr, etc and change the preferred logger in train_.py files. Create a folder 'checkpoints' to save the best models. If you wish to perform narrow band experiment, just set narrow_band as true in config.json file.
python train_timit.py --data_path='path to final data folder' --speaker_csv_path='path to this repo/SpeakerProfiling/Dataset/data_info_height_age.csv'
Example:
python train_timit.py --data_path=/notebooks/SpeakerProfiling/TIMIT_Dataset/wav_data/ --speaker_csv_path=/notebooks/SpeakerProfiling/Dataset/data_info_height_age.csv
python test_timit.py --data_path='path to final data folder' --model_checkpoint='path to saved model checkpoint'
Example:
python test_timit.py --data_path=/notebooks/SpeakerProfiling/TIMIT_Dataset/wav_data/ --model_checkpoint=checkpoints/epoch=1-step=245-v3.ckpt
We have uploaded a pretrained model of our experiments. You can download the from Dropbox.
Download it and put it into the model_checkpoint folder.
- [1] S3prl: The self-supervised speech pre-training and representation learning toolkit. AT Liu, Y Shu-wen