Giter Club home page Giter Club logo

mau_local_stt's Introduction

mau_local_stt

A Maubot to transcribe audio messages in matrix rooms using local open-source libraries

Installation

  1. FFmpeg must be in $PATH
  2. Activate the maubot virtual environment (source ./bin/activate), and run
    • pip install whispercpp numpy - if you want to use whisper as the backend.
    • pip install vosk - if you want to use vosk as the backend.
  3. Download maulocalstt from the releases (or download the repository and build with mbc build), and upload it to maubot.
  4. Download a model for your backend:
  5. Create an instance of the bot, and update the configuration:
    • For whisper, specify
      • model_name - the name of the model you downloaded (the name of the file without the ggml- and .bin)
      • language - the language the audio will be in (you can set it to auto for whisper to auto-detect the language)
      • translate - if you want wisper to translate the transcription to english (true or false)
    • For vosk, specify
      • model_path - the path to the top directory of the model you downloaded (the one with the folders am conf graph etc.), either absolute or related to maubot's working directory.

Usage

Simply invite the bot to a room, and it will reply to all audio messages with their transcription

mau_local_stt's People

Contributors

chayleaf avatar elishaaz avatar

Stargazers

 avatar  avatar

Watchers

 avatar

mau_local_stt's Issues

Model can't be loaded after an error occured

If any error occurs when switching from vosk to whisper (or vice versa) the model is set to none, but the current_backend isn't since it didn't come that far. this results in the current_backend still being vosk and the path being the same, while the model is null.

Vosk discards everything but the end.

The function rec.FinalResult() that is being called in transcribe_audio.py just returns the last batch, not the whole text.
The debug logs show the whole message, but no running total is kept.

Off-topic: I have a working docker image for maubot thats compatible with whisper and vosk darmstadt/maubuntubot

Support for faster-whisper or OpenAI's API

Hi.

Thank you for this piece of software. It's very useful in order to transcribe into searchable text the ungodly number of Whatsapp voice messages (even work-related) one gets here in Brazil .

After living in text-based bliss for a couple of months, I noticed an issue with whispercpp. For some reason, long voice messages were not getting from ffmpeg to the model (brief messages were still getting through). The ffmpeg process would just sit there idle, presumably after completing its job, without passing the result down the pipe to whisper. After trying unsuccessfully to debug the issue for a while, I hacked the plugin to use faster-whisper instead. This seems to be working well.

I am now starting to prepare a proper pull request to add support for faster-whisper alongside whispercpp. However, before I do that, I would like to ask whether this is the right approach and something the project would be interested in.

Perhaps, at least for whisper (in contrast with vosk), it would be preferable to support OpenAI's API so that people can expose their locally running model (with either whisper.cpp or faster-whisper) for use by other tools besides maubot. Furthermore, I find myself forced to add quite a bit of complexity in order to support two different flavors of whisper, which could be avoided by relying on the API.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.