Giter Club home page Giter Club logo

epinio-speech-to-command-poc's Introduction

Epinio speech-to-command POC

Project Description

This project is born thanks to SUSE Hack Week 23, where employees at this nice company are given a week time to learn, develop, and grow.

My goal here is to attempt to build a basic speech-to-text app that can execute some basic CLI commands for Epinio simply using voice without touching a keyboard. Examples:

epinio app create sampleapp
epinio app list

I believe that this could be useful not only to ease command typing at times but perhaps be also a nice feature that could add value to Epinio if successful.

Goal for this Hackweek

Some of the goals for this project would be:

  • Find some tools that already bring a speech-to-text library
  • Put together a basic script that somehow works and can recognize voice.
  • Fine-tune the speech recognition as much as possible.
  • Transcribe the speech to text
  • Execute a successful transcription into a functional command

If all the above works somehow next step would be to attempt a more human-friendly interpretation of a command. For instance, instead of transcribing epinio app create sampleapp , it would be nice that a sentence like: "Epinio, please create an app named sampleapp", would execute that command producing the same result.

Demo results

Note: you may want to enable/turn on volume on both videos to see the result

Demo 1: Transcription of Epinio commands

ep_simple_commands.mov

Demo 2: Complex Epinio command transformation from "human-friendly" speech:

ep_chatpoc_push_complex_command.mov

Requirements

Note: for the time being current instructions are centered in Linux users. To be expanded.

Main installs:

  • Install Epinio. If you are unsure about how to install visit: https://docs.epinio.io/installation/install_epinio

  • Python (Preferably > 3.10.0). If not installed, download it here: https://www.python.org/downloads/

  • PIP to install the required packages. If not installed check here: https://pip.pypa.io/en/stable/installation/

    For Linux users, you can use:

    wget https://bootstrap.pypa.io/get-pip.py
    python3 ./get-pip.py
    
  • Other installs:

    Python Speech Recognition module:

    pip3 install speechrecognition
    

    PyAudio:

    pip install pyaudio
    

    Alternatively Ubuntu 22.04 users may install it with:

    sudo apt-get install python3-pyaudio
    

    PyAutoGUI

    pip3 install PyAutoGUI
    

    tkinter (If not installed already)

    sudo apt-get install python3-tk python3-dev
    

How to run it

Run on terminal:

python3 main.py

Alternatively you may want to run it as this if an errors related to ALSA like this appears :

 python3 main.py 2>/dev/null

After this, you will see the following text in your terminal prompting you to say a command: Please say the Epinio command you wish to be executed

Speak as clear as possible so the program has better chance to recognize the words. Later, the program will try to interpret your speech, parse the words and transform it into a working command.

Resources used

https://geekscoders.com/python-speech-recognition-tutorial-for-beginners/ https://people.csail.mit.edu/hubert/pyaudio/#downloads https://github.com/Uberi/speech_recognition

epinio-speech-to-command-poc's People

Contributors

mmartin24 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.