Giter Club home page Giter Club logo

alts's Introduction

alts

( ๐ŸŽ™๏ธ listens | ๐Ÿ’ญ thinks | ๐Ÿ”Š speaks )


๐Ÿ’ฌ about

100% free, local and offline assistant with speech recognition and talk-back functionalities.

๐Ÿค– default usage

ALTS runs in the background and waits for you to press cmd+esc (or win+esc).

  • ๐ŸŽ™๏ธ While holding the hotkey, your voice will be recorded (saves in the project root).
  • ๐Ÿ’ญ On release, the recording stops and a transcript is sent to the LLM (the recording is deleted).
  • ๐Ÿ”Š The LLM responses then get synthesized and played back to you (also shown as desktop notifications).

You can modify the hotkey combination and other settings in your config.yaml.

ALL processes are local and NONE of your recordings or queries leave your environment; the recordings are deleted as soon as they are used; it's ALL PRIVATE by default

โš™๏ธ pre-requisites

  • python

    (tested on) version >=3.11 on macOS and version >=3.8 on windows

  • llm

    By default, the project is configured to work with Ollama, running the stablelm2 model (a very tiny and quick model). This setup makes the whole system completely free to run locally and great for low resource machines.

    However, we use LiteLLM in order to be provider agnostic, so you have full freedom to pick and choose your own combinations. Take a look at the supported Models/Providers for more details on LLM configuration.

    See .env.template and config-template.yaml for customizing your setup

  • stt

    We use openAI's whisper to transcribe your voice queries. It's a general-purpose speech recognition model.

    You will need to have ffmepg installed in your environment, you can download it from the official site.

    Make sure to check out their setup docs, for any other requirement.

    if you stumble into errors, one reason could be the model not downloading automatically. If that's the case you can run a whisper example transcription in your terminal (see examples) or manually download it and place the model-file in the correct folder

  • tts

    We use coqui-TTS for ALTS to talk-back to you. It's a library for advanced Text-to-Speech generation.

    You will need to install eSpeak-ng in your environment:

    • macOS โ€“ brew install espeak
    • linux โ€“ sudo apt-get install espeak -y
    • windows โ€“ download the executable from their repo

      on windows you'll also need Desktop development with C++ and .NET desktop build tools. Download the Microsoft C++ Build Tools and install these dependencies.

    Make sure to check out their setup docs, for any other requirement.

    if you don't have the configured model already downloaded it should download automatically during startup, however if you encounter any problems, the default model can be pre-downloaded by running the following:

    tts --text "this is a setup test" --out_path test_output.wav --model_name tts_models/en/vctk/vits --speaker_idx p364
    

    The default model has several "speakers" to choose from; running the following command will serve a demo site where you can test the different voices available:

    tts-server --model_name tts_models/en/vctk/vits
    

โœ… get it running

clone the repo

git clone https://github.com/alxpez/alts.git

go to the main folder

cd alts/

install the project dependencies

pip install -r requirements.txt

see the pre-requisites section, to make sure your machine is ready to start the ALTS

duplicate and rename the needed config files

cp config-template.yaml config.yaml
cp .env.template .env

modify the default configuration to your needs

start up ALTS

sudo python alts.py

the keyboard package requires to be run as admin (in macOS and Linux), it's not the case on Windows

alts's People

Contributors

alxpez avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.