Giter Club home page Giter Club logo

audio2text-download's Introduction

This project implements the Whisper AI models in an easily accessible and simple to use containerized Web App and terrafrom deployement.

No need to mess with config files and pip dependencies, everything comes packages in a single docker container ready to be used.

The project deploys infrastructure using Terraform, specifically utilizing the AWS cloud provider with an instance type of t2.large.

Check out the project on Docker Hub - Docker Hub

Repository Overview:

This repository hosts a Flask web application integrated with Audi2Text for speech-to-text functionality, Whisper messaging for secure communication, and a feature for downloading text. The README provides instructions on running the application.

Docker Hub Overview:

This Docker image contains the Flask app configured with Audi2Text and Whisper messaging capabilities. It simplifies deployment and offers flexibility for various environments.

Why This Image: This image provides a convert setup for a Audi2Text integration using python Flask app and Whisper messaging services enabling easy deployment and utilisation of audio-to-text and secure messaging and download features.

Overview: This Docker image facilitates the deployment of a Flask app with Audi2Text integration and Whisper messaging, streamlining the process of setting up a web application with advanced functionalities. It serves as a foundation for building speech-to-text enabled applications with secure communication capabilities.

Getting Started

These instructions will get you a copy of the project up and running on your local machine and developement server for a development and testing purposes.

Pre-requisites:

  • Docker installed on the host machine or server

Deployment Options:

  • Docker container only

How to use this image

The container runs on port 5000, launch and run it using the below command.

docker run -d --name=audio2txt -p 5000:5000 asharshith/audio2txtdowload:v1.0.0

Browse to http://your-host-ip:5000 to access the web UI

Building it yourself

You can also build the container locally. Just clone this repository

git clone https://github.com/hrshith/audio2text-download.git 

Then change into the directory

cd audio2txt-download

Build the container

docker build -t audio2txt-download

Finally once the container is build you can launch it using the command

docker run -d --name=audio2txt -p 5000:5000 audio2txt-download

Deploying Using Terraform

Create Terraform Configuration File, And its avalible in the infrastructure.tf file formate.

In the infrastructure.tf file, specify the essential resources for my application, including Docker images and their Docker Hub details. Additionally, deploy an AWS EC2 instance with appropriate configuration and specifications, as outlined in the provided statement. Detailed instructions can be found in this readme file.

Initialize Terraform: Before applying any changes, you need to initialize Terraform in directory containing your configuration files, Run the following command:

To run the below init cmd terraform working directory, installing necessary plugins, configuring the backend, and downloading referenced modules.

terraform init

To run the below cmd to see whether the configuration file is valid or not .

terraform validate

To run the below cmd to see what changes Terraform will make to your AWS infrastructure .

terraform plan

To run the below apply cmd to enact changes to your AWS infrastructure based on the configured plan.

terraform apply

Confirm changes: Terraform will prompt you to confirm the changes before applying them. Review the changes carefully and type yes to confirm and proceed

Destroy the Infrastructure

When you are done and want to clean up the resources, you can destroy the Terraform-managed infrastructure with the following command:

terrafrom destroy

Additional Notes

The container runs the base model of Whisper by default, if you want to change it, follow the instructions below. (For future builds I am hoping to incorporate this into the docker run command)

  1. Once the container is running, enter it

    docker exec -it audio2txt-download bash
    
  2. Look for the text.py file and open it (You can install and use an editor of your choice I am using nano)

    nano text.py
    
  3. You should see the below line

    # Load the Whisper model
    model = whisper.load_model("base")
    
  4. Change it to anything you like based on the below table (The .en models are english only)

  5. For example if you want to run the medium model your code should look like this.

    # Load the Whisper model
    model = whisper.load_model("medium")
    
  6. Just restart the container and upload your audio and it will automatically pull the new model.

Warning: Higher models require a moderately powerful CPU else it will take forever to load

Size Parameters English-only model Multilingual model Required VRAM Relative speed
tiny 39 M tiny.en tiny ~1 GB ~32x
base 74 M base.en base ~1 GB ~16x
small 244 M small.en small ~2 GB ~6x
medium 769 M medium.en medium ~5 GB ~2x
large 1550 M N/A large ~10 GB 1x
  1. Access the application at http://localhost:5000 after running the Docker container.

Credits

Special credits go to the OpenAI Whisper project which has made this project possible! Check them out at - Whisper Project

References:

audio2text-download's People

Contributors

hrshith avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.