Giter Club home page Giter Club logo

Comments (2)

mkiol avatar mkiol commented on June 12, 2024 3

Thanks for the questions and all suggestions.

I think the global keyboard shortcuts feature is a feature that is notable enough that it should be mentioned in the README.

You are perfectly right but I need to polish these features more. In current form they are quite unpredictable. For instance, not all shortcuts are working out of the box, some apps don't accepts "inserting to active window" and so on. At least I have to identify in which condition everything should work fine.

I realise it probably depends on which model you are using, but what is the minimum amount of RAM required to run SN?

Everything depends on a model and engine. I can't say exact numbers because I didn't make any measurements or benchmarks. For STT tasks, the lightest is Vosk Small. In TTS, eSpeak (obviously) and RHVoice. Piper is pretty efficient on CPU as well.

What is the longest speech recording that we can reasonably hope to feed into SN and expect it to cope?

You are asking about transcribing a file? SN should not crash even on very long audio. There is Voice Activity Detector and non-speech removal pre-procesing that cuts audio into smaller parts containing speech. Parts are processed one by one, so RAM demanding should be stable and should not depend on a duration of the audio.

Presuming the host machine has a decent amount of disk space and RAM and they're prepared to wait for the results, could a user potentially let SN run for several hours and expect it to handle a recording of such length or is it only capable of dealing with shorter recordings?

In the settings, you can change "Listening mode" to "Always on". In this mode, SN always listens and transcribes. It tries to detect silence and process audio in chunks. RAM is freed after the chunk is processed. There is no any specific time limit. It should be able to run in this mode indefinitely.

Something like this maybe:

Thank you. It is perfect! I will definitely use it, but first I need to at least determine under what conditions these features are usable and under which they simply cannot work. I don't want to advertise half-baked functionalities.

from dsnote.

danboid avatar danboid commented on June 12, 2024

Something like this maybe:

Description

Speech Note let you take, read and translate notes in multiple languages. It uses Speech to Text, Text to Speech and Machine Translation to do so. Text and voice processing take place entirely offline, locally on your computer, without using a network connection. Your privacy is always respected. No data is sent to the Internet.

Speech note also features some virtual keyboard input support via the use of global keyboard shortcuts but this feature is currently only supported by some X11 apps and not under Wayland.

from dsnote.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.