Giter Club home page Giter Club logo

vosk-cli's Issues

Collection of tasks

These are a bunch of smaller issues and possible improvements. If you want to work on one of these, please leave a comment and open an issue or PR, which I will link to the task below.

  • Deployment
    • Packaging: Create a distribution package for vosk-cli. See #12
    • Use Bandit to find possible security issues #16
    • Deployment to PyPI: Create entry for vosk-cli in PyPI and configure automatic deployment. See #13
    • Include language model(s) in some form
  • Usability/Documentation
    • Revise README with updated instructions on how to run vosk-cli. See #14
    • Extend README with specific installation instructions for different operating systems (if needed). See #14
    • Display detailed help when vosk-cli call is missing parameters
    • Create technical documentation (potentially larger task)
  • Features
    • Display average confidence coefficient for transcriptions. See #15
    • Enable the use of punctuation models: Collaborate on/contribute to #9
    • Language recognition: Use multiple models and automatically detect spoken language based on confidence coefficient (this might be a larger task)

Create distribution package

A distribution package for vosk-cli would simplify installation/deployment and would also enable us to publish to PyPI and potentially other package indexes.

Missing Dependencies

In testing opencast/opencast#3806, I installed vosk-cli per the current README. Functional testing with an Opencast workflow spat out this error:

2022-06-03 12:25:22,763 | ERROR | (AbstractJobProducer$JobRunner:343) - Error handling operation 'speechtotext':                                              
org.opencastproject.speechtotext.api.SpeechToTextServiceException: Error while generating subtitle from http://localhost:8080/files/mediapackage/a378a5fe-9cfd
-45e0-a9f0-69cdadbfbdb6/d5d24af2-d40c-4f39-b56f-91d81b5b9a0c/nonsegment_audio.mpg                                                                             
        at org.opencastproject.speechtotext.impl.SpeechToTextServiceImpl.process(SpeechToTextServiceImpl.java:166) ~[?:?]                                     
        at org.opencastproject.job.api.AbstractJobProducer$JobRunner.call(AbstractJobProducer.java:313) [!/:?]                                                
        at org.opencastproject.job.api.AbstractJobProducer$JobRunner.call(AbstractJobProducer.java:272) [!/:?]                                                
        at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]                                                                                     
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]                                                              
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]                                                              
        at java.lang.Thread.run(Thread.java:829) [?:?]                                                                                                        
Caused by: org.opencastproject.speechtotext.api.SpeechToTextEngineException: org.opencastproject.speechtotext.api.SpeechToTextEngineException: Vosk exited abn
ormally with status 1 (command: [vosk-cli, -i, /home/greg/opencast/upstream/build/opencast-dist-allinone/data/opencast/workspace/mediapackage/a378a5fe-9cfd-45
e0-a9f0-69cdadbfbdb6/d5d24af2-d40c-4f39-b56f-91d81b5b9a0c/nonsegment_audio.mpg, -o, /home/greg/opencast/upstream/build/opencast-dist-allinone/data/opencast/wo
rkspace/collection/subtitles/tmp_1773_nonsegment_audio.vtt, -l, ara])                                                                                         
 Output:                                                                                                                                                      
Traceback (most recent call last):                                                                                                                            
  File "/home/greg/.local/bin/vosk-cli", line 5, in <module>                                                                                                  
    from scripts.transcribe import main                                                                                                                       
ModuleNotFoundError: No module named 'scripts'                                                                                                                
                                                                                                                                                              
        at org.opencastproject.speechtotext.impl.engine.VoskEngine.generateSubtitlesFile(VoskEngine.java:123) ~[?:?]                                          
        at org.opencastproject.speechtotext.impl.SpeechToTextServiceImpl.process(SpeechToTextServiceImpl.java:156) ~[?:?]                                     
        ... 6 more                                                                                                                                            
Caused by: org.opencastproject.speechtotext.api.SpeechToTextEngineException: Vosk exited abnormally with status 1 (command: [vosk-cli, -i, /home/greg/opencast
/upstream/build/opencast-dist-allinone/data/opencast/workspace/mediapackage/a378a5fe-9cfd-45e0-a9f0-69cdadbfbdb6/d5d24af2-d40c-4f39-b56f-91d81b5b9a0c/nonsegme
nt_audio.mpg, -o, /home/greg/opencast/upstream/build/opencast-dist-allinone/data/opencast/workspace/collection/subtitles/tmp_1773_nonsegment_audio.vtt, -l, ar
a])                                                                                                                                                           
 Output:                                                                                                                                                      
Traceback (most recent call last):                                                                                                                            
  File "/home/greg/.local/bin/vosk-cli", line 5, in <module>                                                                                                  
    from scripts.transcribe import main                                                                                                                       
ModuleNotFoundError: No module named 'scripts'                                                                                                                
                                                                                                                                                              
        at org.opencastproject.speechtotext.impl.engine.VoskEngine.generateSubtitlesFile(VoskEngine.java:115) ~[?:?]                                          
        at org.opencastproject.speechtotext.impl.SpeechToTextServiceImpl.process(SpeechToTextServiceImpl.java:156) ~[?:?]                                     
        ... 6 more                                                                                                                                            
2022-06-03 12:25:25,656 | ERROR | (WorkflowOperationWorker:140) - Workflow operation 'operation:'speechtotext, state:'FAILED'' failed                         
org.opencastproject.workflow.api.WorkflowOperationException: Speech-to-Text job for media package 'a378a5fe-9cfd-45e0-a9f0-69cdadbfbdb6' failed
        at org.opencastproject.workflow.handler.speechtotext.SpeechToTextWorkflowOperationHandler.createSubtitle(SpeechToTextWorkflowOperationHandler.java:181) ~[?:?]
        at org.opencastproject.workflow.handler.speechtotext.SpeechToTextWorkflowOperationHandler.start(SpeechToTextWorkflowOperationHandler.java:146) ~[?:?]
        at org.opencastproject.workflow.impl.WorkflowOperationWorker.start(WorkflowOperationWorker.java:212) ~[!/:?]
        at org.opencastproject.workflow.impl.WorkflowOperationWorker.execute(WorkflowOperationWorker.java:117) [!/:?]
        at org.opencastproject.workflow.impl.WorkflowServiceImpl.runWorkflowOperation(WorkflowServiceImpl.java:719) [!/:?]
        at org.opencastproject.workflow.impl.WorkflowServiceImpl.process(WorkflowServiceImpl.java:1736) [!/:?]
        at org.opencastproject.workflow.impl.WorkflowServiceImpl$JobRunner.call(WorkflowServiceImpl.java:2097) [!/:?]
        at org.opencastproject.workflow.impl.WorkflowServiceImpl$JobRunner.call(WorkflowServiceImpl.java:2063) [!/:?]
        at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
        at java.lang.Thread.run(Thread.java:829) [?:?]

Installing scripts with pip install scripts does not resolve the issue.

More robust model location autodetection?

# get all available models if we got the special value auto
if try_models == ['auto']:
try_models = glob('./models/*') + glob('/usr/share/vosk/models/*')
try_models = [model for model in try_models if os.path.isdir(model)]

# Try finding a matching module
modules = glob(f'/usr/share/vosk/models/*-{lang}-*') \
or glob(f'./models/*-{lang}-*')
modules = [model for model in modules if os.path.isdir(model)]

Searching in $XDG_DATA_DIRS would be nice. (Spec.) You don't always want to install models system-wide.

As it's not always set, it should also look through default values of /usr/share/ (as currently), /usr/local/share/, and $HOME/.local/share/.

In addition to exposing manually installed models in non-root locations, this would also allow automatic use of models installed via E.G. Flatpak or Nix, which set $XDG_DATA_DIRS.

Also, from the AUR, the share subdirectory seems to be vosk-models, not vosk/models:

$ pacman -Ql vosk-api-bin
vosk-api-bin /usr/
vosk-api-bin /usr/include/
vosk-api-bin /usr/include/vosk_api.h
vosk-api-bin /usr/lib/
vosk-api-bin /usr/lib/libvosk.so
vosk-api-bin /usr/local/
vosk-api-bin /usr/local/share/
vosk-api-bin /usr/local/share/vosk-models/
vosk-api-bin /usr/local/share/vosk-models/small-en-us/
vosk-api-bin /usr/local/share/vosk-models/small-en-us/README
vosk-api-bin /usr/local/share/vosk-models/small-en-us/am/
vosk-api-bin /usr/local/share/vosk-models/small-en-us/am/final.mdl
vosk-api-bin /usr/local/share/vosk-models/small-en-us/conf/
vosk-api-bin /usr/local/share/vosk-models/small-en-us/conf/mfcc.conf
vosk-api-bin /usr/local/share/vosk-models/small-en-us/conf/model.conf
vosk-api-bin /usr/local/share/vosk-models/small-en-us/graph/
vosk-api-bin /usr/local/share/vosk-models/small-en-us/graph/Gr.fst
vosk-api-bin /usr/local/share/vosk-models/small-en-us/graph/HCLr.fst
vosk-api-bin /usr/local/share/vosk-models/small-en-us/graph/disambig_tid.int
vosk-api-bin /usr/local/share/vosk-models/small-en-us/graph/phones/
vosk-api-bin /usr/local/share/vosk-models/small-en-us/graph/phones/word_boundary.int
vosk-api-bin /usr/local/share/vosk-models/small-en-us/ivector/
vosk-api-bin /usr/local/share/vosk-models/small-en-us/ivector/final.dubm
vosk-api-bin /usr/local/share/vosk-models/small-en-us/ivector/final.ie
vosk-api-bin /usr/local/share/vosk-models/small-en-us/ivector/final.mat
vosk-api-bin /usr/local/share/vosk-models/small-en-us/ivector/global_cmvn.stats
vosk-api-bin /usr/local/share/vosk-models/small-en-us/ivector/online_cmvn.conf
vosk-api-bin /usr/local/share/vosk-models/small-en-us/ivector/splice.conf

But that may be a packaging issue in the vosk-api-bin package, as vosk-api does use vosk/models.

  • What is the canonical place to put models? Do the docs give a recommendation?
  • Windows/Non-LSB/Non-FHS OS's?

Just a thought/convenience that might be nice to have.

Running bandit

I decided to run bandit to check for potential security issues are proposed in #10.

The only thing bandit found was the usage of subprocess to run ffmpeg, although it rates the severity as "low" since subprocess.Popen is already used in fairly safe way (i.e. not spawning a command shell). So the only issue i can really see here is that the ffmpeg command accepts an arbitrary input file.

One way to "fix" the arbitrary input file issue would be to avoid running ffmpeg and instead have the user provide a valid file format, but I assume we don't want that.

Then the other thing to do is just general safety measures, like sandboxing, limiting process resources and limiting the input file to well known formats. Not sure if any of these are "worth it" though, that would be up for discussion.

Bandit logs
Test results:
>> Issue: [B404:blacklist] Consider possible security implications associated with the subprocess module.
   Severity: Low   Confidence: High
   CWE: CWE-78 (https://cwe.mitre.org/data/definitions/78.html)
   Location: voskcli/transcribe.py:21:0
   More Info: https://bandit.readthedocs.io/en/1.7.4/blacklists/blacklist_imports.html#b404-import-subprocess
20	import os
21	import subprocess
22	import json

--------------------------------------------------
>> Issue: [B603:subprocess_without_shell_equals_true] subprocess call - check for execution of untrusted input.
   Severity: Low   Confidence: High
   CWE: CWE-78 (https://cwe.mitre.org/data/definitions/78.html)
   Location: voskcli/transcribe.py:189:14
   More Info: https://bandit.readthedocs.io/en/1.7.4/plugins/b603_subprocess_without_shell_equals_true.html
188	               '-ar', str(sample_rate), '-ac', '1', '-f', 's16le', '-']
189	    process = subprocess.Popen(command, stdout=subprocess.PIPE)
190	

--------------------------------------------------

Code scanned:
	Total lines of code: 195
	Total lines skipped (#nosec): 0

Run metrics:
	Total issues (by severity):
		Undefined: 0
		Low: 2
		Medium: 0
		High: 0
	Total issues (by confidence):
		Undefined: 0
		Low: 0
		Medium: 0
		High: 2

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.