mau_local_stt's Introduction

mau_local_stt

A Maubot to transcribe audio messages in matrix rooms using local open-source libraries

Installation

FFmpeg must be in $PATH
Activate the maubot virtual environment (source ./bin/activate), and run
- pip install whispercpp numpy - if you want to use whisper as the backend.
- pip install vosk - if you want to use vosk as the backend.
Download maulocalstt from the releases (or download the repository and build with mbc build), and upload it to maubot.
Download a model for your backend:
- For wisper, download a model from https://huggingface.co/ggerganov/whisper.cpp and place it under models/whispercpp
- For vosk, download a zipped model from https://alphacephei.com/vosk/models and unpack it into models/vosk
Create an instance of the bot, and update the configuration:
- For whisper, specify
  - model_name - the name of the model you downloaded (the name of the file without the ggml- and .bin)
  - language - the language the audio will be in (you can set it to auto for whisper to auto-detect the language)
  - translate - if you want wisper to translate the transcription to english (true or false)
- For vosk, specify
  - model_path - the path to the top directory of the model you downloaded (the one with the folders am conf graph etc.), either absolute or related to maubot's working directory.

Usage

Simply invite the bot to a room, and it will reply to all audio messages with their transcription

mau_local_stt's People

Contributors

Stargazers

Watchers

mau_local_stt's Issues

Model can't be loaded after an error occured

If any error occurs when switching from vosk to whisper (or vice versa) the model is set to none, but the current_backend isn't since it didn't come that far. this results in the current_backend still being vosk and the path being the same, while the model is null.

Vosk discards everything but the end.

The function rec.FinalResult() that is being called in transcribe_audio.py just returns the last batch, not the whole text.
The debug logs show the whole message, but no running total is kept.

Off-topic: I have a working docker image for maubot thats compatible with whisper and vosk darmstadt/maubuntubot

Unable to load without vosk

I can't start the instance without vosk.
Since there is no alpine/musl version of vosk i tried to run it without but it fails when loading.

Here is the error from maubot: stacktrace.txt

And my Maubot image with the dependencies: Dockerfile.txt

Support for faster-whisper or OpenAI's API

Hi.

Thank you for this piece of software. It's very useful in order to transcribe into searchable text the ungodly number of Whatsapp voice messages (even work-related) one gets here in Brazil .

After living in text-based bliss for a couple of months, I noticed an issue with whispercpp. For some reason, long voice messages were not getting from ffmpeg to the model (brief messages were still getting through). The ffmpeg process would just sit there idle, presumably after completing its job, without passing the result down the pipe to whisper. After trying unsuccessfully to debug the issue for a while, I hacked the plugin to use faster-whisper instead. This seems to be working well.

I am now starting to prepare a proper pull request to add support for faster-whisper alongside whispercpp. However, before I do that, I would like to ask whether this is the right approach and something the project would be interested in.

Perhaps, at least for whisper (in contrast with vosk), it would be preferable to support OpenAI's API so that people can expose their locally running model (with either whisper.cpp or faster-whisper) for use by other tools besides maubot. Furthermore, I find myself forced to add quite a bit of complexity in order to support two different flavors of whisper, which could be avoided by relying on the API.

Recommend Projects

elishaaz / mau_local_stt Goto Github PK

mau_local_stt's Introduction

mau_local_stt

Installation

Usage

mau_local_stt's People

Contributors

Stargazers

Watchers

Forkers

mau_local_stt's Issues

Model can't be loaded after an error occured

Vosk discards everything but the end.

Unable to load without vosk

Support for faster-whisper or OpenAI's API

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent