Giter Club home page Giter Club logo

vosk-cli's Introduction

vosk-cli

Apache 2.0 License PyPI

This python package serves as an Vosk interface for Opencast. It allows to generate subtitles (WebVTT files) from Video and Audio sources via Vosk.

Installation

1. Install vosk-cli

To install the latest stable version of vosk-cli, run

pip install vosk-cli

Alternatively, to install the latest development version, clone this project and inside the project directory run

pip install .

2. Install dependencies

  • FFmpeg
  • ffprobe

Vosk-cli uses ffprobe to analyze and ffmpeg to preprocess input files. The easiest way to install ffmpeg is by using a package manager. If you want or need to install from source, visit FFmpeg.org/download.html and follow the instructions for your operating system.

3. Download the language model

Go to https://alphacephei.com/vosk/models and download at least the English language model. The larger models generally yield better results.

You can unzip the folder of the language model into any directory, but it is recommended to create and use a ./models folder in the project directory.

Usage

Now you are able to run vosk-cli -i <input_file_path> -o <output_file_path> -m <model_name_or_path>.

For example, if there is a video.mp4 file in your download folder and a model named vosk-model-en-us-0.22 in the ./models folder you created, you can run

vosk-cli -i ~/Downloads/video.mp4 -o text -m vosk-model-en-us-0.22

This will create a text.vtt file (which contains the transcribed captions) in your current directory.

vosk-cli's People

Contributors

arnei avatar lkiesow avatar marwyg avatar owi92 avatar user10293401 avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

vosk-cli's Issues

More robust model location autodetection?

# get all available models if we got the special value auto
if try_models == ['auto']:
try_models = glob('./models/*') + glob('/usr/share/vosk/models/*')
try_models = [model for model in try_models if os.path.isdir(model)]

# Try finding a matching module
modules = glob(f'/usr/share/vosk/models/*-{lang}-*') \
or glob(f'./models/*-{lang}-*')
modules = [model for model in modules if os.path.isdir(model)]

Searching in $XDG_DATA_DIRS would be nice. (Spec.) You don't always want to install models system-wide.

As it's not always set, it should also look through default values of /usr/share/ (as currently), /usr/local/share/, and $HOME/.local/share/.

In addition to exposing manually installed models in non-root locations, this would also allow automatic use of models installed via E.G. Flatpak or Nix, which set $XDG_DATA_DIRS.

Also, from the AUR, the share subdirectory seems to be vosk-models, not vosk/models:

$ pacman -Ql vosk-api-bin
vosk-api-bin /usr/
vosk-api-bin /usr/include/
vosk-api-bin /usr/include/vosk_api.h
vosk-api-bin /usr/lib/
vosk-api-bin /usr/lib/libvosk.so
vosk-api-bin /usr/local/
vosk-api-bin /usr/local/share/
vosk-api-bin /usr/local/share/vosk-models/
vosk-api-bin /usr/local/share/vosk-models/small-en-us/
vosk-api-bin /usr/local/share/vosk-models/small-en-us/README
vosk-api-bin /usr/local/share/vosk-models/small-en-us/am/
vosk-api-bin /usr/local/share/vosk-models/small-en-us/am/final.mdl
vosk-api-bin /usr/local/share/vosk-models/small-en-us/conf/
vosk-api-bin /usr/local/share/vosk-models/small-en-us/conf/mfcc.conf
vosk-api-bin /usr/local/share/vosk-models/small-en-us/conf/model.conf
vosk-api-bin /usr/local/share/vosk-models/small-en-us/graph/
vosk-api-bin /usr/local/share/vosk-models/small-en-us/graph/Gr.fst
vosk-api-bin /usr/local/share/vosk-models/small-en-us/graph/HCLr.fst
vosk-api-bin /usr/local/share/vosk-models/small-en-us/graph/disambig_tid.int
vosk-api-bin /usr/local/share/vosk-models/small-en-us/graph/phones/
vosk-api-bin /usr/local/share/vosk-models/small-en-us/graph/phones/word_boundary.int
vosk-api-bin /usr/local/share/vosk-models/small-en-us/ivector/
vosk-api-bin /usr/local/share/vosk-models/small-en-us/ivector/final.dubm
vosk-api-bin /usr/local/share/vosk-models/small-en-us/ivector/final.ie
vosk-api-bin /usr/local/share/vosk-models/small-en-us/ivector/final.mat
vosk-api-bin /usr/local/share/vosk-models/small-en-us/ivector/global_cmvn.stats
vosk-api-bin /usr/local/share/vosk-models/small-en-us/ivector/online_cmvn.conf
vosk-api-bin /usr/local/share/vosk-models/small-en-us/ivector/splice.conf

But that may be a packaging issue in the vosk-api-bin package, as vosk-api does use vosk/models.

  • What is the canonical place to put models? Do the docs give a recommendation?
  • Windows/Non-LSB/Non-FHS OS's?

Just a thought/convenience that might be nice to have.

Running bandit

I decided to run bandit to check for potential security issues are proposed in #10.

The only thing bandit found was the usage of subprocess to run ffmpeg, although it rates the severity as "low" since subprocess.Popen is already used in fairly safe way (i.e. not spawning a command shell). So the only issue i can really see here is that the ffmpeg command accepts an arbitrary input file.

One way to "fix" the arbitrary input file issue would be to avoid running ffmpeg and instead have the user provide a valid file format, but I assume we don't want that.

Then the other thing to do is just general safety measures, like sandboxing, limiting process resources and limiting the input file to well known formats. Not sure if any of these are "worth it" though, that would be up for discussion.

Bandit logs
Test results:
>> Issue: [B404:blacklist] Consider possible security implications associated with the subprocess module.
   Severity: Low   Confidence: High
   CWE: CWE-78 (https://cwe.mitre.org/data/definitions/78.html)
   Location: voskcli/transcribe.py:21:0
   More Info: https://bandit.readthedocs.io/en/1.7.4/blacklists/blacklist_imports.html#b404-import-subprocess
20	import os
21	import subprocess
22	import json

--------------------------------------------------
>> Issue: [B603:subprocess_without_shell_equals_true] subprocess call - check for execution of untrusted input.
   Severity: Low   Confidence: High
   CWE: CWE-78 (https://cwe.mitre.org/data/definitions/78.html)
   Location: voskcli/transcribe.py:189:14
   More Info: https://bandit.readthedocs.io/en/1.7.4/plugins/b603_subprocess_without_shell_equals_true.html
188	               '-ar', str(sample_rate), '-ac', '1', '-f', 's16le', '-']
189	    process = subprocess.Popen(command, stdout=subprocess.PIPE)
190	

--------------------------------------------------

Code scanned:
	Total lines of code: 195
	Total lines skipped (#nosec): 0

Run metrics:
	Total issues (by severity):
		Undefined: 0
		Low: 2
		Medium: 0
		High: 0
	Total issues (by confidence):
		Undefined: 0
		Low: 0
		Medium: 0
		High: 2

Collection of tasks

These are a bunch of smaller issues and possible improvements. If you want to work on one of these, please leave a comment and open an issue or PR, which I will link to the task below.

  • Deployment
    • Packaging: Create a distribution package for vosk-cli. See #12
    • Use Bandit to find possible security issues #16
    • Deployment to PyPI: Create entry for vosk-cli in PyPI and configure automatic deployment. See #13
    • Include language model(s) in some form
  • Usability/Documentation
    • Revise README with updated instructions on how to run vosk-cli. See #14
    • Extend README with specific installation instructions for different operating systems (if needed). See #14
    • Display detailed help when vosk-cli call is missing parameters
    • Create technical documentation (potentially larger task)
  • Features
    • Display average confidence coefficient for transcriptions. See #15
    • Enable the use of punctuation models: Collaborate on/contribute to #9
    • Language recognition: Use multiple models and automatically detect spoken language based on confidence coefficient (this might be a larger task)

Create distribution package

A distribution package for vosk-cli would simplify installation/deployment and would also enable us to publish to PyPI and potentially other package indexes.

Missing Dependencies

In testing opencast/opencast#3806, I installed vosk-cli per the current README. Functional testing with an Opencast workflow spat out this error:

2022-06-03 12:25:22,763 | ERROR | (AbstractJobProducer$JobRunner:343) - Error handling operation 'speechtotext':                                              
org.opencastproject.speechtotext.api.SpeechToTextServiceException: Error while generating subtitle from http://localhost:8080/files/mediapackage/a378a5fe-9cfd
-45e0-a9f0-69cdadbfbdb6/d5d24af2-d40c-4f39-b56f-91d81b5b9a0c/nonsegment_audio.mpg                                                                             
        at org.opencastproject.speechtotext.impl.SpeechToTextServiceImpl.process(SpeechToTextServiceImpl.java:166) ~[?:?]                                     
        at org.opencastproject.job.api.AbstractJobProducer$JobRunner.call(AbstractJobProducer.java:313) [!/:?]                                                
        at org.opencastproject.job.api.AbstractJobProducer$JobRunner.call(AbstractJobProducer.java:272) [!/:?]                                                
        at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]                                                                                     
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]                                                              
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]                                                              
        at java.lang.Thread.run(Thread.java:829) [?:?]                                                                                                        
Caused by: org.opencastproject.speechtotext.api.SpeechToTextEngineException: org.opencastproject.speechtotext.api.SpeechToTextEngineException: Vosk exited abn
ormally with status 1 (command: [vosk-cli, -i, /home/greg/opencast/upstream/build/opencast-dist-allinone/data/opencast/workspace/mediapackage/a378a5fe-9cfd-45
e0-a9f0-69cdadbfbdb6/d5d24af2-d40c-4f39-b56f-91d81b5b9a0c/nonsegment_audio.mpg, -o, /home/greg/opencast/upstream/build/opencast-dist-allinone/data/opencast/wo
rkspace/collection/subtitles/tmp_1773_nonsegment_audio.vtt, -l, ara])                                                                                         
 Output:                                                                                                                                                      
Traceback (most recent call last):                                                                                                                            
  File "/home/greg/.local/bin/vosk-cli", line 5, in <module>                                                                                                  
    from scripts.transcribe import main                                                                                                                       
ModuleNotFoundError: No module named 'scripts'                                                                                                                
                                                                                                                                                              
        at org.opencastproject.speechtotext.impl.engine.VoskEngine.generateSubtitlesFile(VoskEngine.java:123) ~[?:?]                                          
        at org.opencastproject.speechtotext.impl.SpeechToTextServiceImpl.process(SpeechToTextServiceImpl.java:156) ~[?:?]                                     
        ... 6 more                                                                                                                                            
Caused by: org.opencastproject.speechtotext.api.SpeechToTextEngineException: Vosk exited abnormally with status 1 (command: [vosk-cli, -i, /home/greg/opencast
/upstream/build/opencast-dist-allinone/data/opencast/workspace/mediapackage/a378a5fe-9cfd-45e0-a9f0-69cdadbfbdb6/d5d24af2-d40c-4f39-b56f-91d81b5b9a0c/nonsegme
nt_audio.mpg, -o, /home/greg/opencast/upstream/build/opencast-dist-allinone/data/opencast/workspace/collection/subtitles/tmp_1773_nonsegment_audio.vtt, -l, ar
a])                                                                                                                                                           
 Output:                                                                                                                                                      
Traceback (most recent call last):                                                                                                                            
  File "/home/greg/.local/bin/vosk-cli", line 5, in <module>                                                                                                  
    from scripts.transcribe import main                                                                                                                       
ModuleNotFoundError: No module named 'scripts'                                                                                                                
                                                                                                                                                              
        at org.opencastproject.speechtotext.impl.engine.VoskEngine.generateSubtitlesFile(VoskEngine.java:115) ~[?:?]                                          
        at org.opencastproject.speechtotext.impl.SpeechToTextServiceImpl.process(SpeechToTextServiceImpl.java:156) ~[?:?]                                     
        ... 6 more                                                                                                                                            
2022-06-03 12:25:25,656 | ERROR | (WorkflowOperationWorker:140) - Workflow operation 'operation:'speechtotext, state:'FAILED'' failed                         
org.opencastproject.workflow.api.WorkflowOperationException: Speech-to-Text job for media package 'a378a5fe-9cfd-45e0-a9f0-69cdadbfbdb6' failed
        at org.opencastproject.workflow.handler.speechtotext.SpeechToTextWorkflowOperationHandler.createSubtitle(SpeechToTextWorkflowOperationHandler.java:181) ~[?:?]
        at org.opencastproject.workflow.handler.speechtotext.SpeechToTextWorkflowOperationHandler.start(SpeechToTextWorkflowOperationHandler.java:146) ~[?:?]
        at org.opencastproject.workflow.impl.WorkflowOperationWorker.start(WorkflowOperationWorker.java:212) ~[!/:?]
        at org.opencastproject.workflow.impl.WorkflowOperationWorker.execute(WorkflowOperationWorker.java:117) [!/:?]
        at org.opencastproject.workflow.impl.WorkflowServiceImpl.runWorkflowOperation(WorkflowServiceImpl.java:719) [!/:?]
        at org.opencastproject.workflow.impl.WorkflowServiceImpl.process(WorkflowServiceImpl.java:1736) [!/:?]
        at org.opencastproject.workflow.impl.WorkflowServiceImpl$JobRunner.call(WorkflowServiceImpl.java:2097) [!/:?]
        at org.opencastproject.workflow.impl.WorkflowServiceImpl$JobRunner.call(WorkflowServiceImpl.java:2063) [!/:?]
        at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
        at java.lang.Thread.run(Thread.java:829) [?:?]

Installing scripts with pip install scripts does not resolve the issue.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.