Giter Club home page Giter Club logo

local_whisper_cat's Introduction

Local Whisper Cat

awesome plugin

A plugin to transcript locally on your gpu/cpu, audio files to text.

logo

How it works

This plugin communicates with a local container running an api service to transcript audio files to text.

In the settings panel you can set the location of the container and the audio_key .

Should be agnostic but in practice I am referring to Whisper ASR Webservice

How to setup the plugin settings

Choose the url model

"http://openai-whisper-asr-webservice:9000" by default

Choose an Audio key field for your Websocket message

"audio_key" by default

Choose a language

"en" by default

How to send audio files to the Cat

Your client should send a message with the following fields: text, user_id, audio_key, audio_type, audio_name, encodedBase64.

The audio_key field should contain the base64 encoded audio file. like the next example:

your_json_fields = {
    text='',
    user_id='user69',
    audio_key: "",
    audio_type: "
    "audio/ogg"
    ",audio_name: 'msg45430839-160807.ogg',
    encodedBase64: True,
    }

For convenience you can use a compatible Python client Chatty! in order to send 10 second audio in the right format.

Obviously you must have set the nvidia-container-toolkit and have an adequate video card

Obviously you need a running container with whisper-asr-webservice

The accepted audio formats are: mp3, wav, ogg,mpeg, mp4(depending on the container settings).

Example of a full local istance with ollama and nvidia container with docker-compose

networks:
    fullcat-network:
services:
    cheshire-cat-core:
        build:
            context: ./core
        container_name: cheshire_cat_core
        depends_on:
            - cheshire-cat-vector-memory
            - ollama
            - openai-whisper-asr-webservice
        environment:
            - PYTHONUNBUFFERED=1
            - WATCHFILES_FORCE_POLLING=true
            - CORE_HOST=${CORE_HOST:-localhost}
            - CORE_PORT=${CORE_PORT:-1865}
            - QDRANT_HOST=${QDRANT_HOST:-cheshire_cat_vector_memory}
            - QDRANT_PORT=${QDRANT_PORT:-6333}
            - CORE_USE_SECURE_PROTOCOLS=${CORE_USE_SECURE_PROTOCOLS:-}
            - API_KEY=${API_KEY:-}
            - LOG_LEVEL=${LOG_LEVEL:-DEBUG}
            - DEBUG=${DEBUG:-true}
            - SAVE_MEMORY_SNAPSHOTS=${SAVE_MEMORY_SNAPSHOTS:-false}
        ports:
            - ${CORE_PORT:-1865}:80
        volumes:
            - ./cat/static:/app/cat/static
            - ./cat/public:/app/cat/public
            - ./cat/plugins:/app/cat/plugins
            - ./cat/metadata.json:/app/metadata.json
        restart: unless-stopped
        networks:
            - fullcat-network
            
    cheshire-cat-vector-memory:
        image: qdrant/qdrant:latest
        container_name: cheshire_cat_vector_memory
        expose:
            - 6333
        volumes:
            - ./cat/long_term_memory/vector:/qdrant/storage
        restart: unless-stopped
        networks:
            - fullcat-network
            
    ollama:
        container_name: ollama_cat
        image: ollama/ollama:latest
        volumes:
            - ./ollama:/root/.ollama
        expose:
            - 11434
        environment:
            - gpus=all
        deploy:
            resources:
                reservations:
                    devices:
                        - driver: nvidia
                          count: 1
                          capabilities:
                              - gpu
        networks:
            - fullcat-network
            
    openai-whisper-asr-webservice:
        deploy:
            resources:
                reservations:
                    devices:
                        - driver: nvidia
                          count: all
                          capabilities:
                              - gpu
        ports:
            - 9000:9000
        expose:
            - 9000
        environment:
            - ASR_MODEL=base
            - ASR_ENGINE=openai_whisper
        image: onerahmet/openai-whisper-asr-webservice:latest-gpu
        networks:
            - fullcat-network

local_whisper_cat's People

Contributors

lorenzosiena avatar

Stargazers

David Mauger avatar Nikolaus Schlemm avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.