Giter Club home page Giter Club logo

transcriberbot's Introduction

Transcriber Bot

Generic badge Generic badge

Quick Start

  1. Create your own Telegram bot from @BotFather and take the bot token

  2. Edit the file config/telegram.json

    { "username": "BOT USERNAME", "token": "BOT TOKEN", "admins": [ "YOUR TELEGRAM ID" ] }

  3. Create your own Wit token on Wit website

  4. Edit the file config/wit.json (for example with italian token)

    { "it-IT": "WIT TOKEN FOR Italian" }

    You can repeat the points 3 and 4 for support multiple languages.

    You can test if your token is working by running:

    $ python src/audiotools/speech.py wit_api_key some_file.mp3 transcription.txt

  5. Create your own Yandex translate token on Yandex website

  6. Edit the file config/yandex.json

    { "translate_key": "YOUR YANDEX TOKEN" }

Installation with virtualenv

  1. Install virtualenv and setuptools package

    $ python3 -m pip install --upgrade pip $ pip3 install virtualenv setuptools

  2. Make a note of the full file path to the custom version of Python you just installed

    $ which python3

  3. Create the virtual environment while you specify the version of Python you wish to use

    $ virtualenv -p /usr/bin/python3 venv

  4. Activate the new virtual environment

    $ source venv/bin/activate

  5. Install the requirement packages

    (venv) $ pip3 install -r requirements.txt

  6. Run the bot

    (venv) $ python3 src/main.py

Installation with Docker

You can install easily with Docker.

  1. Run the script dockerBuild.sh to generate the docker image from the Dockerfile.

  2. Run the script dockerRun.sh to create and start the docker container.

    In the run script, the docker directories config, data and values are binding with the repository directory. If you want to edit the files in the configuration directories you can do this simply by stopping the container. As soon as you finish editing the files, just restart the container to make them active.

TODO

  • Voice Messages
  • Audio Files
  • Video notes
  • Pictures
  • Multithreading
  • Stop callback
  • Stats
  • Admin commands only in groups
  • Antiflood
  • Translations
  • Voice ask
  • Channels support

transcriberbot's People

Contributors

alexdicy avatar carloalbertobarbano avatar davte avatar dodekaphilist avatar maanuelmm avatar stefanodelbosco avatar thigschuch avatar turicas avatar vetu11 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

transcriberbot's Issues

Missing Getting Started and Test

A Getting Started Guide or a Tests tutorial is needed for the community to start work in the project. I just got the news that the TranscriberBot is open-source, but I don't know where to start. I think there should be a guide to help install and run the project and test the code.

Off course, someone who knows about python projects can get it running, but a guide is helpful for the newcomers.

Bot can't transcribe

I've sent an audio in private and also in a group and TranscriberBot answered: "Não pude transcrever o áudio" (transcription_failed string in pt_BR).

Wrong languaje selected

When I select to transcribe from catalan (ca-ES) the bot tries to transcribe from english

The bot is not working again

Sorry to bother you sir
But the bot stopped working
Im so so sorry but I'm depending on it to sutdy for the finals
Screenshot_20210602-090004

Non funziona su telegram

Buongiorno, il bot non fa nessuna funzione su telegram, ha funzionato 1 sola volta e poi si è bloccato, come devo fare?

Transcribing often not completes

Unless I keep retrying more then two times!
And sometimes: could not transcribe
What are the best practices with this bot?
Is there a special preferred audio long and quality?
To ensure the complete transcribing for the first time?
Thanks

[feature request] Download transcription as text file (raw or in a subtitle format)

I had this idea during this use case:

  1. I needed to transcribe an 1-hour long interview in Brazilian Portuguese (the file was 80MB+ and was in M4A format)
  2. I split the original file in 5 OGG parts using ffmpeg (~8MB each part)
  3. I sent the files to the bot, received lots of 4k-chars messages in reply and copied to a text editor (this was boring)
  4. I found some errors when reading the text in the editor, but it was hard to find the error chunk in the audio file (so I could listen and fix it manually)

Being able to download the transcription in a text file will solve problem in item 3. Using a subtitle file format (like srt) would help a lot in item 4. The behavior of attaching the file could be triggered automatically for files longer than 1 minute.

I'm willing to implement this feature if the maintainers accept the proposal.

Commands from anonymous admins are ignored

The bot all commands in group chats from anonymous admins. To fix this behavior probably need to change FilterIsAdmin so it also checks if message was sent as the group itself (messages sent by anonymous admins show up as if the group sent the message, like it happens on channels).

The bot is not working

I tried to use it in English and Italian, but it can't transcribe anything anymore. It just continue to fail again and again

bot won't work anymore

Hi, the bot is currently down, it won't reply to commands and it doesn't transcribe the audio files

The app is not working

Sorry,
But the bot stopped working
Im so so sorry but I'm depending on it to sutdy for the finals

Merge `development` into `master`?

I found that the development branch is behind master for some commits but also it has a new merge (PR #24) which hasn't been merged into master yet.
Are you going to merge it on master and deploy this new version (supports video files)?

branches

Missing CONTRIBUTING file

There's no instrucctions or guidelines for contributing. Is the repository open to contributions?

[Usability] Remove "text"/"recognized text" before result in private chat

The most use case for me is to record a voice clip, wait for transcription, copy the whole answer and then paste it in another place (I usually do this using a cellphone). I imagine this is the most use case in general for private chats.
Since I don't want to paste the "Text: " part of the message and it's hard to copy only a portion of the text in cellphones, I suggest removing these prefixes (strings transcription_text and ocr_result) in private chats. It's already clear that the answer is the transcribed text for that audio since it's an reply to the sent audio clip and well, the bot name is TranscriberBot! :)
What do you think?

Image transcription is not working


Figure 1. Result from bot after sending an image with /enable_photos enabled.

I have been trying to transcribe images whilst and within groups, for ease of access to blind members (they can use an app to read screen, but many of them can't read telegram images). The bot gets stuck at "Recognizing" for several minutes (in fact, days so far).

I thought that was some misconfiguration so, I have tried to upload a random image directly to bot and see if works. Sadly, it doesn't work too. Not sure if its a problem of certain images or it doesn't work anymore. Uploaded a sample image if you want to test or need some input on "failed" cases.


Figure 2. Sample image provided for recognition.

Don't get me wrong. Your bot is great and really helps transcribing audio. Great work!

Thanks in advance and hope to see this working again.

Long transcriptions segmentation

I'd like to report a possible bug when a file is pretty long and text length is longer than the maximum allowed number of a single message. The bot sends only one message in this case and skips the rest.

This can be easily reproduced by sending any long file (20 minutes for example).

The bot doesn't transcribe everything

Hi,

The bot has been quite buggy. The vast majority of times, if an audio is long, it won't transcribe it all. It will only do a portion, and even the portion percentage may differ here and there. Sometimes it does 20%, another time 40%, never 100%

I was transcribing from Arabic (if that helps)

Any idea? Is there anything I can do for it to do it all? And if it's a bot issue then maybe it can be looked into

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.