Giter Club home page Giter Club logo

speakerprofiling_flaskmicroservice's Introduction

SpeakerProfiling_FlaskMicroservice

Flask Microservice implementing https://arxiv.org/abs/2203.11774 on-demand

Fork of SpeakerProfiling. Download the weights (checkpoint file) and place it in a model_checkpoint folder to prepare the model for deployment.

API Docs

This service provides an API endpoint to predict age, height, and gender from raw PCM audio data. It's built using a Flask server and utilizes a machine learning model trained on the TIMIT dataset.

Running the Server

To start the server, run the Python script containing the Flask application.

python flask_server.py

This will start the server on http://127.0.0.1:5000/ by default.

API Endpoint

Predict Age, Height, and Gender

  • URL

    /predict

  • Method:

    POST

  • Data Params

    Required:

    audio=[raw PCM audio data]

  • Success Response:

    • Code: 200
      Content:
      {
          "age": "predicted_age",
          "height": "predicted_height",
          "gender_is_female": "predicted_gender"
      }
  • Error Response:

    • Code: 400 BAD REQUEST
      Content: { error : "Error message" }
  • Sample Call:

    curl -X POST -H "Content-Type: application/octet-stream" --data-binary @your_audio_file.raw http://127.0.0.1:5000/predict

Sample client

See flask_test.py for an example client.

Notes

  • The endpoint expects raw PCM audio data with a sample rate of 16000 Hz.
  • Ensure that your audio file is in the correct format before sending it to the server.
  • The predictions (age, height, and gender) are based on the model trained on the TIMIT dataset and may vary based on the quality and characteristics of the input audio.

License

MIT License

Original README follows

Speaker Profiling

This Repository contains the code for estimating the Age and Height of a speaker with their speech signal. This repository uses s3prl library to load various upstream models like wav2vec2, CPC, TERA etc. This repository uses TIMIT dataset.

NOTE: If you want to run the single encoder model, you should checkout the singleEncoder branch and follow the README in that branch.

Installation

Use the package manager pip to install the required packages for preparing the dataset, training and testing the model.

pip install -r requirements.txt

Usage

Download the TIMIT dataset

wget https://data.deepai.org/timit.zip
unzip timit.zip -d 'path to timit data folder'

Prepare the dataset for training and testing

python TIMIT/prepare_timit_data.py --path='path to timit data folder'

Update Config and Logger

Update the config.json file to update the upstream model, batch_size, gpus, lr, etc and change the preferred logger in train_.py files. Create a folder 'checkpoints' to save the best models. If you wish to perform narrow band experiment, just set narrow_band as true in config.json file.

Training

python train_timit.py --data_path='path to final data folder' --speaker_csv_path='path to this repo/SpeakerProfiling/Dataset/data_info_height_age.csv'

Example:

python train_timit.py --data_path=/notebooks/SpeakerProfiling/TIMIT_Dataset/wav_data/ --speaker_csv_path=/notebooks/SpeakerProfiling/Dataset/data_info_height_age.csv

Testing

python test_timit.py --data_path='path to final data folder' --model_checkpoint='path to saved model checkpoint'

Example:

python test_timit.py --data_path=/notebooks/SpeakerProfiling/TIMIT_Dataset/wav_data/ --model_checkpoint=checkpoints/epoch=1-step=245-v3.ckpt

Pretrained Model

We have uploaded a pretrained model of our experiments. You can download the from Dropbox.

Download it and put it into the model_checkpoint folder.

License

MIT

Reference

  • [1] S3prl: The self-supervised speech pre-training and representation learning toolkit. AT Liu, Y Shu-wen

speakerprofiling_flaskmicroservice's People

Contributors

tarun360 avatar shangeth avatar ductuantruong avatar riidefi avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.