coqui-ai / stt-examples Goto Github PK
View Code? Open in Web Editor NEW🐸STT integration examples
Home Page: https://github.com/coqui-ai/STT
License: Mozilla Public License 2.0
🐸STT integration examples
Home Page: https://github.com/coqui-ai/STT
License: Mozilla Public License 2.0
i am getting the segmentation fault(core dumped) error when doing the live transcription through Websockets after approx 1 sec or less than this. I only got one word of transcription after that error comes
... sometimes I got the malloc error.......I don't know how to fix this....
The web_microphone_websocket
example works for localhost, but on a domain there were CORS errors from the socket.io
code, e.g. Chrome:
Access to XMLHttpRequest at 'https://example.com:4000/socket.io/?EIO=3&transport=polling&t=Nu-iT4E' from origin 'https://example.com.cz:3000' has been blocked by CORS policy: No 'Access-Control-Allow-Origin' header is present on the requested resource.
Several recipes from SO and the socket.io docs did not work, perhaps someone with better JS knowledge might help.
Some example projects are not listed under the appropriate section of the main README.
Notably for Android examples, the main readme points to a depreciated example while we have a working example not listed.
Adding the missing ones manually can solve the issue temporarily but a more permanent solution would be to make checklist in the template for PR or something.
In the vad transcriber example readme the last link to https://mozilla-voice-stt.readthedocs.io/en/latest/Error-Codes.html return 404.
STT-examples/vad_transcriber/README.md
Line 106 in 6487e4f
After checking out the project and opening it in Android Studio (4.1), running the app throws this error:
Could not resolve all files for configuration ':app:debugRuntimeClasspath'.
Could not find ai.coqui:libstt:0.9.3.
Searched in the following locations:
- https://dl.google.com/dl/android/maven2/ai/coqui/libstt/0.9.3/libstt-0.9.3.pom
- https://jcenter.bintray.com/ai/coqui/libstt/0.9.3/libstt-0.9.3.pom
Hi,
was curious if it was possible to use the WASM file on nodejs for unsupported nodejs versions. I know I can load a wasm file but I was hoping there might be an example since I am not sure the javascript in a browser is 1:1 with that of nodejs (like FileReader) or audioinput.
Thanks for any advice.
def four_to_one(self, frame): #[ch1,ch2,ch3,ch4||ch1,ch2,ch3,ch4||ch1,ch2,ch3,ch4||ch1,ch2,ch3,ch4] frame = np.frombuffer(frame, np.int16) data = frame.reshape((self.CHANNELS,-1), order='F') b = 1/self.CHANNELS x = np.int16(0) for c in data: x+=c*b frame = (x.astype(np.int16)).tobytes() return frame
The above code is part of converting 4 channels frame to 1 channel. mic_vad_streaming.py file freezes when running on Raspberry and trying to record 4 channels. The function mentioned above is called inside the vad_collector function when length of the frame is larger than 2560.
I also posted this in the Deep speech examples repo, but I believe my chances for a reaction are better in here:
mozilla/DeepSpeech-examples#143
Hey everyone, thanks a lot for your great work.
To make coqui more accessible to non-tech folks, it would be great to have a small desktop client. The electron example shows that this is doable and that a cross-platform app is relatively easy to create. I want to work on this, but I have very little experience with electron apps, so it might take a while and I might need some help.
I imagine a very minimalistic frontend that contains:
What are your thoughts on this? I want to start working on this during August and start by adapting the example app in a separated repo. Do you think this is a doable plan? I am also open for other proposals.
For example:
pi@raspberrypi:~/Source/STT-examples $ git diff
diff --git a/mic_vad_streaming/requirements.txt b/mic_vad_streaming/requirements.txt
index e97d363..3eb12cc 100644
--- a/mic_vad_streaming/requirements.txt
+++ b/mic_vad_streaming/requirements.txt
@@ -1,7 +1,7 @@
-stt~=1.0.0
+stt~=1.3.0
pyaudio~=0.2.11
webrtcvad~=2.0.10
halo~=0.0.18
numpy>=1.15.1
scipy>=1.1.0
-pyautogui~=0.9.52
\ No newline at end of file
+pyautogui~=0.9.52
Unfortunately we forgot to add a license file to this repo when we moved the examples to a separate repository. The main repo had the Mozilla Public License 2.0 and our intention was always to keep it as is. I'm tagging contributors to this repo to confirm that you agree to license your contributions as MPL-2.0. If you agree, please reply with a comment here saying "I agree to license my contributions to this repository under the Mozilla Public License 2.0."
I try to change channels number to 4 to work with 4 channels of audio with mic_vad_streaming.py file. I have respeaker 4 mic array kit. When I change line 20 to "CHANNELS = 4" and line 134 to "is_speech = self.vad.is_speech(frame[0::self.CHANNELS], self.sample_rate)" it starts endless collecting frame until reaches to limit and starts again. How to solve this problem?
I tried to run the python websocket example but I am getting an error.
I cloned the repo and first did
sudo docker build .
This succeeded. Then I did the following. It is possible I am misunderstanding the build process, but I am getting this error.
$ sudo docker container run b6898d9a294d
TensorFlow: v2.2.0-24-g1c1b2b9
DeepSpeech: v0.8.2-0-g02e4c76
Warning: reading entire model file into memory. Transform model file into an mmapped graph to reduce heap usage.
2021-11-21 23:05:17.440646: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
Not found: /opt/deepspeech/model.tflite; No such file or directory
Traceback (most recent call last):
File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/opt/deepspeech/deepspeech_server/app.py", line 17, in <module>
scorer_path=Path(conf["deepspeech.scorer"]).absolute().as_posix(),
File "/opt/deepspeech/deepspeech_server/engine.py", line 30, in __init__
self.model = Model(model_path=model_path)
File "/usr/local/lib/python3.6/dist-packages/deepspeech/__init__.py", line 38, in __init__
raise RuntimeError("CreateModel failed with '{}' (0x{:X})".format(deepspeech.impl.ErrorCodeToErrorMessage(status),status))
RuntimeError: CreateModel failed with 'Error reading the proto buffer model file.' (0x3005)
I was wondering if a cpu-architecture check might be useful to have?
I am a bit stuck on how to fix this issue given, to my understanding, coqui uses its own fork of tensorflow.
Running it without the container
python -m deepspeech_server.app
produces the same error
Thanks!
https://github.com/coqui-ai/STT-examples/tree/r0.9/vad_transcriber
e.g. contains link to https://github.com/coqui-ai/STT-examples/blob/doc/audioTranscript.png 404
and mozilla references
error log>npm ERR! code E404
npm ERR! 404 Not Found - GET https://registry.npmjs.org/STT - Not found
npm ERR! 404
npm ERR! 404 'STT@^1.3.0' is not in this registry.
npm ERR! 404 This package name is not valid, because
npm ERR! 404 1. name can no longer contain capital letters
npm ERR! 404
npm ERR! 404 Note that you can also install from a
npm ERR! 404 tarball, folder, http url, or git url.
This is essentially a copy of
mozilla/DeepSpeech-examples#187
Where the problem seems the same.
Is there any support on this..?
do you have any example with c ?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.