Comments (2)
Thanks for the questions and all suggestions.
I think the global keyboard shortcuts feature is a feature that is notable enough that it should be mentioned in the README.
You are perfectly right but I need to polish these features more. In current form they are quite unpredictable. For instance, not all shortcuts are working out of the box, some apps don't accepts "inserting to active window" and so on. At least I have to identify in which condition everything should work fine.
I realise it probably depends on which model you are using, but what is the minimum amount of RAM required to run SN?
Everything depends on a model and engine. I can't say exact numbers because I didn't make any measurements or benchmarks. For STT tasks, the lightest is Vosk Small
. In TTS, eSpeak
(obviously) and RHVoice
. Piper
is pretty efficient on CPU as well.
What is the longest speech recording that we can reasonably hope to feed into SN and expect it to cope?
You are asking about transcribing a file? SN should not crash even on very long audio. There is Voice Activity Detector and non-speech removal pre-procesing that cuts audio into smaller parts containing speech. Parts are processed one by one, so RAM demanding should be stable and should not depend on a duration of the audio.
Presuming the host machine has a decent amount of disk space and RAM and they're prepared to wait for the results, could a user potentially let SN run for several hours and expect it to handle a recording of such length or is it only capable of dealing with shorter recordings?
In the settings, you can change "Listening mode" to "Always on". In this mode, SN always listens and transcribes. It tries to detect silence and process audio in chunks. RAM is freed after the chunk is processed. There is no any specific time limit. It should be able to run in this mode indefinitely.
Something like this maybe:
Thank you. It is perfect! I will definitely use it, but first I need to at least determine under what conditions these features are usable and under which they simply cannot work. I don't want to advertise half-baked functionalities.
from dsnote.
Something like this maybe:
Description
Speech Note let you take, read and translate notes in multiple languages. It uses Speech to Text, Text to Speech and Machine Translation to do so. Text and voice processing take place entirely offline, locally on your computer, without using a network connection. Your privacy is always respected. No data is sent to the Internet.
Speech note also features some virtual keyboard input support via the use of global keyboard shortcuts but this feature is currently only supported by some X11 apps and not under Wayland.
from dsnote.
Related Issues (20)
- Unable to add Custom TTS model (i.e Coqui TTS) HOT 3
- Guidance about settings for realtime STT on GPU HOT 2
- Flatpak Runtime End-of-Life HOT 5
- AppImage HOT 1
- mimic3 voices fail to download HOT 5
- I hope this app can use llms to chat to do more things HOT 2
- Read only selected text. HOT 4
- Added dictionary support HOT 1
- Error: “translation engine initialization has failed”. HOT 3
- Speech Note instantly crashes when opened on KDE Plasma. HOT 3
- Crashes when clicking listen with any whisper model HOT 23
- Start listening, text to active window not working HOT 4
- App stuck in tray icon HOT 4
- runtime org.kde.Platform branch 5.15-22.08 is end-of-life HOT 2
- The app is crashing when GPU acceleration is enabled using any Whisper model HOT 13
- flatpak v4.5.0 won't start showing `std::runtime error pa failed` HOT 12
- Flatpak Add-ons are missing HOT 4
- Add a good voice? HOT 2
- Add extra Arabic diacritic and TTS models HOT 3
- Redirect text output to cursor ? [suggestion] HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dsnote.