jschmie / scraibe Goto Github PK

View Code? Open in Web Editor NEW

18.0 1.0 3.0 6.01 MB

Tool for automatic transcription and speaker diarization based on whisper and pyannote.

Home Page: https://jschmie.github.io/ScrAIbe/

License: GNU General Public License v3.0

Python 97.24% Dockerfile 2.00% Makefile 0.77%

diarization speech-to-text transcription

scraibe's People

Contributors

Stargazers

Watchers

Forkers

stalbrec vanildo tillhanke

scraibe's Issues

Bug using `from_json` method using Transcript class

Both the input value and the module are named json which does not work.

Dockerimage: Model is kept in vram after transcription

Hallo nochmal und vielen Dank übrigens fürs coole Projekt😊

Using the small whisper model and "Auto Transcribe" needs almost 11gb of Vram. After transcription and diarization is done the model seems to be kept in vram at 11gb Vram. As a "GPU Poor" person I ask: Would it be possible to flush it automatically after use? is there also a way to set beam size maybe?

edit: this is weird, small uses about 11gb of vram, medium about 9gb and large uses 11gb of vram. Does it batch, or maybe share ram?

Btw: There are a few other projects with way less user friendly webuis/or just api that use faster-whisper or insanely-fast-whisper that need less vram and are also faster. Mostly by using ctranslate, batching, bettertransformer, flashattention-2, or whisper-distil models (also german).

I've already used whisper-asr-webservice which currently doesn't have diarization but tries to implement it via whisperx and wordcab-transcribe which uses Nvidia NeMo for diarization. Maybe some of these ressources are of use to you?

I've no serious programming knowledge, I just dabble a little bit. I just really like your concept of the webui. I actually tried something similar with a simple gradio interface half a year ago which transcribes, diarizes via the wordcab-transcribe api and also formats the .json and associates names. It But it never worked as robustly as I hoped and I stopped working on it due to missing time and programming knowledge. Forgive me for this wall of text, I'm just a little bit excited about the possibilities and really glad I found your project😄

via the wordcab-api it used about 4gb of Vram with spike of 10gb for the first 20 seconds (probably due to diarization) with the largev2 model and took about 2:30 min for a 22min file

so maybe there's some room for improvement?

I also tried via insanely-fast-whisper which simply uses multiple optimizations and it took about 33 seconds (same file only transcription task, segmenting added about 1:20min) with less than 8gb vram. 150min of audio in less than 5 minutes transcription + diarization (have to recheck the exact time/usage).

Mac Install Fails

Traceback (most recent call last):
  File "/Users/brianjking/opt/anaconda3/envs/scraibe/bin/scraibe", line 5, in <module>
    from scraibe.cli import cli
  File "/Users/brianjking/opt/anaconda3/envs/scraibe/lib/python3.10/site-packages/scraibe/__init__.py", line 1, in <module>
    from .autotranscript import *
  File "/Users/brianjking/opt/anaconda3/envs/scraibe/lib/python3.10/site-packages/scraibe/autotranscript.py", line 40, in <module>
    from .diarisation import Diariser
  File "/Users/brianjking/opt/anaconda3/envs/scraibe/lib/python3.10/site-packages/scraibe/diarisation.py", line 34, in <module>
    from pyannote.audio import Pipeline
  File "/Users/brianjking/opt/anaconda3/envs/scraibe/lib/python3.10/site-packages/pyannote/audio/__init__.py", line 29, in <module>
    from .core.inference import Inference
  File "/Users/brianjking/opt/anaconda3/envs/scraibe/lib/python3.10/site-packages/pyannote/audio/core/inference.py", line 34, in <module>
    from pyannote.audio.core.io import AudioFile
  File "/Users/brianjking/opt/anaconda3/envs/scraibe/lib/python3.10/site-packages/pyannote/audio/core/io.py", line 38, in <module>
    import torchaudio
  File "/Users/brianjking/opt/anaconda3/envs/scraibe/lib/python3.10/site-packages/torchaudio/__init__.py", line 1, in <module>
    from torchaudio import _extension  # noqa: F401
  File "/Users/brianjking/opt/anaconda3/envs/scraibe/lib/python3.10/site-packages/torchaudio/_extension.py", line 67, in <module>
    _init_extension()
  File "/Users/brianjking/opt/anaconda3/envs/scraibe/lib/python3.10/site-packages/torchaudio/_extension.py", line 61, in _init_extension
    _load_lib("libtorchaudio")
  File "/Users/brianjking/opt/anaconda3/envs/scraibe/lib/python3.10/site-packages/torchaudio/_extension.py", line 51, in _load_lib
    torch.ops.load_library(path)
  File "/Users/brianjking/opt/anaconda3/envs/scraibe/lib/python3.10/site-packages/torch/_ops.py", line 220, in load_library
    ctypes.CDLL(path)
  File "/Users/brianjking/opt/anaconda3/envs/scraibe/lib/python3.10/ctypes/__init__.py", line 374, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: dlopen(/Users/brianjking/opt/anaconda3/envs/scraibe/lib/python3.10/site-packages/torchaudio/lib/libtorchaudio.so, 0x0006): Symbol not found: __ZN2at8internal15invoke_parallelExxxRKNSt3__18functionIFvxxEEE
  Referenced from: <FDA92314-6B3C-3951-A6EA-674B8F2438DA> /Users/brianjking/opt/anaconda3/envs/scraibe/lib/python3.10/site-packages/torchaudio/lib/libtorchaudio.so
  Expected in:     <BAC87571-ABAB-3E0E-AC71-304C308C3507> /Users/brianjking/opt/anaconda3/envs/scraibe/lib/python3.10/site-packages/torch/lib/libtorch_cpu.dylib

@JSchmie Any ideas? Thanks!

Add front end to the project

Whisper model bias

Similar to openai/whisper#928 i noticed whisper includes common phrases hinting at training dataset bias.

The resulting segments are <1s, thus can be filtered relatively easy.

Furthermore, this seems to be only a problem when using the whisper model large-v2.
(In the newest model large-v3, this seems to be fixed. Available in openai-whisper >=v20231106 )

add Sphinx support

Docker build problems: missing folder, problems with spelling in dockerfile

Just fyi, to built the docker image it seems that an additional "models" folder needs to be created at the root when cloning the repo.

[+] Building 0.9s (10/21)                                                  docker:default
 => [internal] load .dockerignore                                                    0.0s
 => => transferring context: 2B                                                      0.0s
 => [internal] load build definition from Dockerfile                                 0.0s
 => => transferring dockerfile: 1.74kB                                               0.0s
 => [internal] load metadata for docker.io/pytorch/pytorch:1.11.0-cuda11.3-cudnn8-r  0.8s
 => [auth] pytorch/pytorch:pull token for registry-1.docker.io                       0.0s
 => [ 1/16] FROM docker.io/pytorch/pytorch:1.11.0-cuda11.3-cudnn8-runtime@sha256:99  0.0s
 => [internal] load build context                                                    0.0s
 => => transferring context: 133.34kB                                                0.0s
 => CACHED [ 2/16] WORKDIR /app                                                      0.0s
 => CACHED [ 3/16] COPY requirements.txt /app/requirements.txt                       0.0s
 => CACHED [ 4/16] COPY README.md /app/README.md                                     0.0s
 => ERROR [ 5/16] COPY models /app/models                                            0.0s
------
 > [ 5/16] COPY models /app/models:
------
Dockerfile:26
--------------------
  24 |     COPY requirements.txt /app/requirements.txt
  25 |     COPY README.md /app/README.md
  26 | >>> COPY models /app/models
  27 |     COPY scraibe /app/scraibe
  28 |     COPY setup.py /app/setup.py
--------------------
ERROR: failed to solve: failed to compute cache key: failed to calculate checksum of ref 4A4L:KJIP:A7CG:EM6K:KMYL:AP5S:ZADH:EBS7:RIFH:FWE7:IZH2:OET5::zctq8tubo5s6j4po9qojalgm7: "/models": not found

it also seems like there is a slight error with the dockerfile. tried:

COPY requirements.txt /app/requirements.txt
COPY scraibe /app/scraibe
COPY setup.py /app/setup.py

instead of

COPY requirements.txt /app/requirements.txt
COPY scraibe /app/Scraibe
COPY setup.py /app/setup.py

otherwise this error happens:

 => ERROR [10/12] RUN pip install /app/                                                                                                                   0.7s 
------                                                                                                                                                         
 > [10/12] RUN pip install /app/:                                                                                                                              
0.550 Processing /app                                                                                                                                          
0.550   Preparing metadata (setup.py): started                                                                                                                 
0.635   Preparing metadata (setup.py): finished with status 'error'                                                                                            
0.637   error: subprocess-exited-with-error
0.637   
0.637   × python setup.py egg_info did not run successfully.
0.637   │ exit code: 1
0.637   ╰─> [6 lines of output]
0.637       Traceback (most recent call last):
0.637         File "<string>", line 2, in <module>
0.637         File "<pip-setuptools-caller>", line 34, in <module>
0.637         File "/app/setup.py", line 16, in <module>
0.637           with open(verfile, "r") as fp:
0.637       FileNotFoundError: [Errno 2] No such file or directory: '/app/scraibe/version.py'
0.637       [end of output]
0.637   
0.637   note: This error originates from a subprocess, and is likely not a problem with pip.
0.638 error: metadata-generation-failed
0.638 
0.638 × Encountered error while generating package metadata.
0.638 ╰─> See above for output.
0.638 
0.638 note: This is an issue with the package mentioned above, not pip.
0.638 hint: See above for details.
------
Dockerfile:20
--------------------
  18 |     RUN conda install -c conda-forge libsndfile
  19 |     RUN pip install torchaudio==0.11.0+cu113 -f https://download.pytorch.org/whl/torch_stable.html
  20 | >>> RUN pip install /app/ 
  21 |     RUN pip install markupsafe==2.0.1 --force-reinstall
  22 |     RUN Scraibe  --hf_token $hf_token
--------------------
ERROR: failed to solve: process "/bin/sh -c pip install /app/" did not complete successfully: exit code: 1

maybe similar problem in dockerfile with
RUN Scraibe --hf_token $hf_token

 => ERROR [12/12] RUN Scraibe  --hf_token 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                                                                           0.5s 
------                                                                                                                                                         
 > [12/12] RUN Scraibe  --hf_token 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 :                                                                                     
0.448 /bin/sh: 1: Scraibe: not found                                                                                                                           
------                                                                                                                                                         
Dockerfile:22                                                                                                                                                  
--------------------
  20 |     RUN pip install /app/ 
  21 |     RUN pip install markupsafe==2.0.1 --force-reinstall
  22 | >>> RUN Scraibe  --hf_token $hf_token
  23 |     # Expose port
  24 |     EXPOSE 7860
--------------------
ERROR: failed to solve: process "/bin/sh -c Scraibe  --hf_token $hf_token" did not complete successfully: exit code: 127

after changing to

RUN scraibe --hf_token $hf_token

it runs further up to:

 => ERROR [12/12] RUN scraibe  --hf_token hf_jbOynJACnotrealtokenfwBYEqCRV                                                                           2.6s
------                                                                                                                                                         
 > [12/12] RUN scraibe  --hf_token hf_jbOynJACnotrealtokenwfBYEqCRV:                                                                                      
2.123 Traceback (most recent call last):                                                                                                                       
2.123   File "/opt/conda/bin/scraibe", line 5, in <module>                                                                                                     
2.123     from scraibe.cli import cli                                                                                                                          
2.123   File "/opt/conda/lib/python3.8/site-packages/scraibe/__init__.py", line 10, in <module>                                                                
2.123     from .app.gradio_app import *
2.123   File "/opt/conda/lib/python3.8/site-packages/scraibe/app/__init__.py", line 2, in <module>
2.123     from .gradio_app import *
2.123   File "/opt/conda/lib/python3.8/site-packages/scraibe/app/gradio_app.py", line 37, in <module>
2.123     from tkinter import CURRENT
2.123   File "/opt/conda/lib/python3.8/tkinter/__init__.py", line 36, in <module>
2.123     import _tkinter # If this fails your Python may not be configured for Tk
2.123 ImportError: libX11.so.6: cannot open shared object file: No such file or directory
------
Dockerfile:22
--------------------
  20 |     RUN pip install /app/ 
  21 |     RUN pip install markupsafe==2.0.1 --force-reinstall
  22 | >>> RUN scraibe  --hf_token $hf_token
  23 |     # Expose port
  24 |     EXPOSE 7860
--------------------
ERROR: failed to solve: process "/bin/sh -c scraibe  --hf_token $hf_token" did not complete successfully: exit code: 1

haven't tried any further so far.

Find a solution to handle audio files with no speech inside

Traceback (most recent call last):
File "/home/usrPycharmProjects/autotranscript/transcribe.py", line 24, in
text = model.transcribe("test.MXF")
File "/home/usrPycharmProjects/autotranscript/autotranscript/autotranscript.py", line 78, in transcribe
diarisation = self.diariser.diarization(dia_audio,
File "/home/usr/PycharmProjects/autotranscript/autotranscript/diarisation.py", line 41, in diarization
diarization = self.model(audiofile,*args, **kwargs)
File "/home/usr/anaconda3/envs/whisper_new/lib/python3.9/site-packages/pyannote/audio/core/pipeline.py", line 238, in call
return self.apply(file, **kwargs)
File "/home/usr/anaconda3/envs/whisper_new/lib/python3.9/site-packages/pyannote/audio/pipelines/speaker_diarization.py", line 512, in apply
discrete_diarization = self.reconstruct(
File "/home/usr/anaconda3/envs/whisper_new/lib/python3.9/site-packages/pyannote/audio/pipelines/speaker_diarization.py", line 397, in reconstruct
clustered_segmentations = np.NAN * np.zeros(
ValueError: negative dimensions are not allowed

Raises when a file does not contain any speech or more precisely does not contain any audio but instead noise, fix is easy by just handle this error and finish with no transcription, but we need a more user-friendly error code.

Errors in readme example for cmdline usage

I just found a little error in the readme example cmdline usage, where it shows:
scraibe -f "audio.wav" --language "german" --num_speakers 2
But the language options are (only):

{af,am,ar,as,az,ba,be,bg,bn,bo,br,bs,ca,cs,cy,da,de,el,en,es,et,eu,fa,fi,fo,fr,gl,gu,ha,haw,he,hi,hr,ht,hu,hy,id,is,it,ja,jw,ka,kk,km,kn,ko,la,lb,ln,lo,lt,lv,mg,mi,mk,ml,mn,mr,ms,mt,my,ne,nl,nn,no,oc,pa,pl,ps,pt,ro,ru,sa,sd,si,sk,sl,sn,so,sq,sr,su,sv,sw,ta,te,tg,th,tk,tl,tr,tt,uk,ur,uz,vi,yi,yo,yue,zh,Afrikaans,Albanian,Amharic,Arabic,Armenian,Assamese,Azerbaijani,Bashkir,Basque,Belarusian,Bengali,Bosnian,Breton,Bulgarian,Burmese,Cantonese,Castilian,Catalan,Chinese,Croatian,Czech,Danish,Dutch,English,Estonian,Faroese,Finnish,Flemish,French,Galician,Georgian,German,Greek,Gujarati,Haitian,Haitian Creole,Hausa,Hawaiian,Hebrew,Hindi,Hungarian,Icelandic,Indonesian,Italian,Japanese,Javanese,Kannada,Kazakh,Khmer,Korean,Lao,Latin,Latvian,Letzeburgesch,Lingala,Lithuanian,Luxembourgish,Macedonian,Malagasy,Malay,Malayalam,Maltese,Mandarin,Maori,Marathi,Moldavian,Moldovan,Mongolian,Myanmar,Nepali,Norwegian,Nynorsk,Occitan,Panjabi,Pashto,Persian,Polish,Portuguese,Punjabi,Pushto,Romanian,Russian,Sanskrit,Serbian,Shona,Sindhi,Sinhala,Sinhalese,Slovak,Slovenian,Somali,Spanish,Sundanese,Swahili,Swedish,Tagalog,Tajik,Tamil,Tatar,Telugu,Thai,Tibetan,Turkish,Turkmen,Ukrainian,Urdu,Uzbek,Valencian,Vietnamese,Welsh,Yiddish,Yoruba}

So it should be "German" or "de" in the example.

Also --num_speakers is no available option for scraibe. scaribe --help only lists the following:

-h, --help            show this help message and exit
  -f AUDIO_FILES [AUDIO_FILES ...], --audio-files AUDIO_FILES [AUDIO_FILES ...]
  --whisper-type {whisper,whisperx}
  --whisper-model-name WHISPER_MODEL_NAME
  --whisper-model-directory WHISPER_MODEL_DIRECTORY
  --diarization-directory DIARIZATION_DIRECTORY
  --hf-token HF_TOKEN   HuggingFace token for private model download. (default: None)
  --inference-device INFERENCE_DEVICE
  --num-threads NUM_THREADS
  --output-directory OUTPUT_DIRECTORY, -o OUTPUT_DIRECTORY
  --output-format {txt,json,md,html}, -of {txt,json,md,html}
  --verbose-output VERBOSE_OUTPUT
  --task {autotranscribe,diarization,autotranscribe+translate,translate,transcribe}
  --language

But this would actually be nice, so adding this in the cli.py is probably a good idea.

Speaker Recognition

Thank you for a great package! I am wondering if you plan to support speaker recognition? Given a folder with voice samples for speakers, it assigns each speaker a name rather than a placeholder.

Thanks

jschmie / scraibe Goto Github PK

scraibe's People

Contributors

Stargazers

Watchers

Forkers

scraibe's Issues

Bug using `from_json` method using Transcript class

Dockerimage: Model is kept in vram after transcription

Mac Install Fails

Add front end to the project

Whisper model bias

add Sphinx support

Docker build problems: missing folder, problems with spelling in dockerfile

Find a solution to handle audio files with no speech inside

Errors in readme example for cmdline usage

Speaker Recognition

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent