Giter Club home page Giter Club logo

dsnote's People

Contributors

albanobattistella avatar dashinfantry avatar devsjr avatar flimm avatar karry avatar lfd3v avatar mkiol avatar popanz avatar zishan-rahman avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dsnote's Issues

This is really excellent software - I am genuinely impressed - but it needs a little bit of polishing.

  1. It needs better instructions and information.
    The OLD E-Speak Robot voice uses almost no processing power and it's very fast for text to speech conversions

OK the most basic voice uses almost no processing power, and the very best voices, use loads of processing power and unless you have a very powerful computer, it's lagging and buffering a lot.

The text to speech - save audio file to MP3 - that is great, but the need for processing power - if it does not exist - then it can take 12 to 18 hours to do a high rate conversion of a 400 page document to MP3.

  1. So we need to see the data rate each voice operates at, kind of like internet speeds of dial up, ADSL etc..
  2. And we need to have a selection of MP3 conversion rates, while super high fidelity is excellent 44 Khz is just fine for small file sizes and fairly good audio quality, where are super large file sizes and 256Khz - some people might like that - but for most text to speech work on written files instead of enormous amounts of reading... where are automated scripts for movie production.. We need a choice.
  3. The audio player needs a speed and pitch control, along with pause and stop - as cancel - yeah I get it that it's a new to market product - while generally excellent - cancel kind of crunches the point of it.

So the voice names need a scale beside them - I figure that the small, medium and large designations MIGHT be linked to a data rate, but they might be linked to a download file size...

For most of my work I have to read LARGE documents, like 400 pages etc.. and it's better to read them out, and save them as an MP3, so I can listen to them when driving long distances or when resting etc..

I don't need stereo phonic high fidelity... just low resolution audio... that is fine...

I also lack computers that are much beyond office work and playing a few videos.. So the down scale options are needed.. "Oh voice X uses 200 times the resources of E-Speak Robot.. Hmmmm brilliant, but I will be happy with 25 times the processing power of E-speak robot...

I am REALLY impressed with what you all have done so far... It's incredible... I mean this is really good.

Add reading speed and esport audio in other format.

HI,
thanks for this awesome APP.
Very useful for students and teachers and for students with some problems.
I would suggest entering various reading speeds when text is read.
Also, the ability to export audio to other formats like MP3,ogg,etc.
Thank you.
V/R,
A.

Support for aprilasr

Hello,

I really appreciate your project! I think it's going in a very nice and useful direction!

LiveCaptions uses aprilasr, which is very fast and only needs the CPU.

I think it would be great if you could add aprilasr as one of the speech recognition options in your project.

It would add a lot of value to your project by offering a fast and lightweight option for user's who don't have access to GPUs or want to conserve battery life on mobile devices.

Thanks in advance! Good luck with the rest of the project ;)

Not sanboxed package format (AUR, deb, rpm)

Flatpak is a great package format but has few limitations. The major ones are as follows:

  • UI theme is not synced with the OS
    • especially this affects Dark theme under GNOME
    • even on KDE Plasma, app does not use native theme
  • GPU computation acceleration does not work out-of-the-box
    • Flatpak runtime lacks CUDA and ROCm runtimes (all dependencies have to be shipped with the package)
    • ROCm requires extra elevated permission --device=all to start working
  • Package size is huge

Not-sandboxed package formats for consideration:

  • distribution via AUR (probably the easiest option)
  • deb (Debian and all derivatives)
  • rpm (Fedora, OpenSUSE)

DS note inside any text box?

Are there any upcoming plans to introduce a feature that enables DS Note to seamlessly insert dictated text into a any selected text box/where the cursor is -- , similar to the functionality found in Windows where you can simply press Windows + H?

As someone with SEVERELY limited dexterity and mobility due to a disability, this function is crucial for me to do my normal working day-to-day and personally, a big barrier for me in making a full-time switch from Windows to Linux -- especially when I need to work. Unfortunately, I lack the programming skills or the capacity to grasp anything more complex than a basic "Hello, World!" program. I'm curious to know if such a feature is feasible within the DS Note program.

But for what's it worth right now, just having something on flathub where I can have something similar at least, is a gamechanger.

GPU not working

Selecting GPU to transcribe an audio file is causing a crash

QIBusPlatformInputContext: invalid portal bus.
QSocketNotifier: Can only be used with threads started with QThread
qt.qpa.qgnomeplatform: Could not find color scheme  ""
whisper_init_from_file_no_state: loading model from '/home/user/.var/app/net.mkiol.SpeechNote/cache/net.mkiol/dsnote/speech-models/en_whisper_small.ggml'
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51864
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 768
whisper_model_load: n_audio_head  = 12
whisper_model_load: n_audio_layer = 12
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 768
whisper_model_load: n_text_head   = 12
whisper_model_load: n_text_layer  = 12
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 9
whisper_model_load: qntvr         = 2
whisper_model_load: type          = 3
whisper_model_load: mem required  =  459.00 MB (+   16.00 MB per decoder)
whisper_model_load: adding 1607 extra tokens
whisper_model_load: model ctx     =  180.95 MB
ggml_opencl: selecting platform: 'Clover'
ggml_opencl: selecting device: 'AMD Radeon RX 6800M (navi22, LLVM 15.0.7, DRM 3.54, 6.5.5-1-linux)'
ggml_opencl: device FP16 support: false
ggml_opencl: kernel compile error:

fatal error: cannot open file '/usr/lib/x86_64-linux-gnu/GL/default/share/clc/gfx1031-amdgcn-mesa-mesa3d.bc': No such file or directory

Donate button

Hello,

the new version of your program is really nice. Is it possible to spend you money for your hard work?
I look into the github page but i dont see any button.
:)

CUDA not recognized

I'm not sure why, but does seem to be related to flatpak

On system:
NVIDIA-SMI 535.113.01 Driver Version: 535.113.01 CUDA Version: 12.2
NVIDIA GeForce 940M (2GB VRAM) (Should be enough to run the small whisper)

On flatpak:

nvidia-535-104-05 org.freedesktop.Platform.GL.nvidia-535-104-05 1.4 user
nvidia-535-113-01 org.freedesktop.Platform.GL.nvidia-535-113-01 1.4 user
nvidia-535-98 org.freedesktop.Platform.GL.nvidia-535-98 1.4 user
nvidia-535-104-05 org.freedesktop.Platform.GL32.nvidia-535-104-05 1.4 user
nvidia-535-113-01 org.freedesktop.Platform.GL32.nvidia-535-113-01 1.4 user
nvidia-535-98 org.freedesktop.Platform.GL32.nvidia-535-98 1.4 user

Logs

[D] 14:13:45.593 0x7f5d825ff600 process_buff:226 - vad: no speech
[D] 14:13:45.593 0x7f5d825ff600 set_processing_state:430 - processing state: idle => decoding
[D] 14:13:45.593 0x7f5d825ff600 set_speech_detection_status:508 - speech detection status: speech-detected => decoding (no-speech)
[D] 14:13:45.593 0x7f5d825ff600 () - service refresh status, new state: listening-single-sentence
[D] 14:13:45.593 0x7f5d825ff600 () - task state changed: 1 => 2
[D] 14:13:45.593 0x7f5d825ff600 process_buff:284 - speech frame: samples=51360
[D] 14:13:45.593 0x7f5d825ff600 decode_speech:350 - speech decoding started
[D] 14:13:45.597 0x7f5de77bbd80 () - app task state: speech-detected => processing
CUDA error 209 at /run/build/whispercpp-cublas/ggml-cuda.cu:6102: no kernel image is available for execution on the device
[W] 14:13:46.168 0x7f5d825ff600 () - QObject::killTimer: Timers cannot be stopped from another thread
[W] 14:13:46.169 0x7f5d825ff600 () - QObject::~QObject: Timers cannot be stopped from another thread
[D] 14:13:46.178 0x7f5d825ff600 () - speech service dtor
[W] 14:13:46.179 0x7f5d825ff600 () - QtDBus: cannot relay signals from parent speech_service(0x5647aeab6ea0 "") unless they are emitted in the object's thread QThread(0x5647af143ed0 ""). Current thread is QThread(0x7f5d5c0016e0 "").
[D] 14:13:46.179 0x7f5d825ff600 () - mic source dtor
[W] 14:13:46.179 0x7f5d825ff600 () - QObject::killTimer: Timers cannot be stopped from another thread

why flatpak app is so big?

why flatpak app is so big?

does it have whisper and other apps included?

is it not better to move those to download mode just like language data?

Pause function :)

Hello,

thank you for this amazing program! It would be nice, if you can add a pause button for TTS.
Have a nice day.

Unable to add language second language model on Mint

If you have a language and language model in place, and your only interest is in changing the language model, it is confusing to have to select a language before seeing alternative language models. Some explanation would be nice.

Drag and drop support

Is drag and drop support for .mp3 files a possibility? Having to choose File:Transcribe a file:selecting a directory and changing from audio to all files for .mp3 to show up is tedious. A bonus would be for the name of the audio file to auto populate the text save dialog box. Maybe it could be fixed with Flatseal but I am not sure how.

Side note:
Using the whisper model gives great results. I can confirm that enabling GPU support in the settings does work as I see the GPU memory and usage spike when the transcription is occurring using Mint 21.2 and an Nvidia RTX 3050.
Would love to make a monetary contribution but I am unable to find a link unless I overlooked it.

Support GNOME Wayland for dsnote

Summary

The challenge is that as of August 23, 2023, dsnote does not support GNOME Wayland. This is a challenge because, by default, most of the recent versions of Linux distributions (distros) now use GNOME Wayland by default. Not GNOME X11. Linux's distributions, such as, but not limited to, Debian, Fedora, Manjaro, Red Hat Enterprise Linux, Ubuntu, etc.

The suggested resolution is to configure your Flatpak package appropriately so that it supports Wayland. The end result is that both GNOME Wayland and X11 are supported. If you're interested in this, this documentation about Flatpak sandbox might be useful. If somehow this documentation is not available, this archived page might be of interest. Alternatively, Flatpak support for maintainers is available here.

Below is the same as above. But with details if you're interested in those.

Steps to reproduce

  1. Using Linux Debian 10 (Buster) 64 bits, using GNOME 3.30.2, using GNOME with its option Wayland, using the steps from https://flathub.org install the Flatpak for dsnote 4.1.0
  2. dsnote will fail to start, and will not open. In other words, it is not usable. This is the challenge. When starting dsnote from Terminal/Console, this error message is display:
QSocketNotifier: Can only be used with threads started with QThread
qt.qpa.wayland: Creating a fake screen in order for Qt not to crash
qt.qpa.qgnomeplatform: Could not find color scheme  ""
Complété
  1. The needed end result is that dsnote is able to start with GNOME Wayland.
  2. Close the present GNOME Wayland session
  3. Using the GNOME user log-in front page, click on the cogwheel button on the right side of the log-in field. Using this button drop down menu, temporarily change the GNOME option from Wayland to the unsecured X11.
  4. Using GNOME X11, start dsnote
  5. It will open successfully. I don't know why dsnote does open in X11 but not in Wayland. My guess is that, somehow, dsnote does not yet support GNOME Wayland.
  6. Log out GNOME. If appropriate, switch back to Wayland.

Flatpak page

https://flathub.org/fr/apps/net.mkiol.SpeechNote

Contribute

If needed, both me and the Ubertus.org team would be happy to contribute beta testing and documentation for this improvement or new feature. Any volunteer for a patch?

Transcribe a file does not work with mounted Google Drive on Gnome

image

How it looks when it hangs.

To reproduce:

  • I launch Speech Note
  • I go to Files
  • I go to my mounted drive
    • (This drive is integrated as a Gnome Settings/Online Account)
  • I select a file
  • It hangs forever

If I first move the file to Downloads and then select it, it will start transcribing.

Context

Device

image

Startup logs

Sorry for how long this is, I don't really know what's useful here...

[chrisshaw@chris-fedora ~]$ flatpak run net.mkiol.SpeechNote --verbose
QSocketNotifier: Can only be used with threads started with QThread
qt.qpa.qgnomeplatform: Could not find color scheme  ""
[I] 13:28:20.174 0x7f658be10d80 init:49 - logging to stderr enabled
[D] 13:28:20.174 0x7f658be10d80 () - translation: "en_US"
[W] 13:28:20.174 0x7f658be10d80 () - failed to install translation
[D] 13:28:20.174 0x7f658be10d80 () - starting standalone app
[D] 13:28:20.175 0x7f658be10d80 () - app: net.mkiol dsnote
[D] 13:28:20.175 0x7f658be10d80 () - config location: "/home/chrisshaw/.var/app/net.mkiol.SpeechNote/config"
[D] 13:28:20.175 0x7f658be10d80 () - data location: "/home/chrisshaw/.var/app/net.mkiol.SpeechNote/data/net.mkiol/dsnote"
[D] 13:28:20.175 0x7f658be10d80 () - cache location: "/home/chrisshaw/.var/app/net.mkiol.SpeechNote/cache/net.mkiol/dsnote"
[D] 13:28:20.175 0x7f658be10d80 () - settings file: "/home/chrisshaw/.var/app/net.mkiol.SpeechNote/config/net.mkiol/dsnote/settings.conf"
[D] 13:28:20.176 0x7f658be10d80 () - available styles: ("Default", "Fusion", "Imagine", "Material", "org.kde.breeze", "org.kde.desktop", "Plasma", "Universal")
[D] 13:28:20.176 0x7f658be10d80 () - style paths: ("/usr/lib/qml/QtQuick/Controls.2")
[D] 13:28:20.176 0x7f658be10d80 () - switching to style: "org.kde.desktop"
[D] 13:28:20.343 0x7f658be10d80 () - supported audio input devices:
ALSA lib ../../oss/pcm_oss.c:397:(_snd_pcm_oss_open) Cannot open device /dev/dsp
[D] 13:28:20.359 0x7f658be10d80 () - "pulse"
[D] 13:28:20.427 0x7f658be10d80 () - "upmix"
[D] 13:28:20.588 0x7f658be10d80 () - "default"
ALSA lib ../../../src/pcm/pcm_direct.c:2045:(snd1_pcm_direct_parse_open_conf) The field ipc_gid must be a valid group (create group audio)
ALSA lib ../../../src/pcm/pcm_direct.c:2045:(snd1_pcm_direct_parse_open_conf) The field ipc_gid must be a valid group (create group audio)
[D] 13:28:20.598 0x7f658be10d80 () - "alsa_input.usb-046d_HD_Pro_Webcam_C920_2AE889FF-02.analog-stereo"
[D] 13:28:20.598 0x7f658be10d80 () - "alsa_output.pci-0000_00_1f.3.analog-stereo.monitor"
[D] 13:28:20.598 0x7f658be10d80 () - "alsa_input.pci-0000_00_1f.3.analog-stereo"
[D] 13:28:20.598 0x7f658be10d80 add_cuda_devices:226 - scanning for cuda devices
[D] 13:28:20.601 0x7f658be10d80 add_cuda_devices:235 - cuda version: driver=0, runtime=12020
[D] 13:28:20.601 0x7f658be10d80 add_cuda_devices:240 - cudaGetDeviceCount returned: 35
[D] 13:28:20.601 0x7f658be10d80 add_hip_devices:263 - scanning for hip devices
[D] 13:28:20.601 0x7f658be10d80 hip_api:170 - failed to open hip lib: libamdhip64.so: cannot open shared object file: No such file or directory
[D] 13:28:20.601 0x7f658be10d80 add_opencl_devices:300 - scanning for opencl devices
[D] 13:28:20.812 0x7f658be10d80 add_opencl_devices:317 - opencl number of platforms: 2
[D] 13:28:20.812 0x7f658be10d80 add_opencl_devices:342 - opencl platform: 0, name=Clover, vendor=Mesa
[D] 13:28:20.812 0x7f658be10d80 add_opencl_devices:356 - opencl number of devices: 0
[D] 13:28:20.812 0x7f658be10d80 add_opencl_devices:342 - opencl platform: 1, name=AMD Accelerated Parallel Processing, vendor=Advanced Micro Devices, Inc.
[D] 13:28:20.812 0x7f658be10d80 add_opencl_devices:356 - opencl number of devices: 0
[D] 13:28:20.815 0x7f6563fff600 loop:58 - py executor loop started
[D] 13:28:20.851 0x7f658be10d80 () - starting service: app-standalone
[D] 13:28:20.858 0x7f65621fe600 () - config version: 34 34
[D] 13:28:20.860 0x7f65621fe600 () - checksum ok: "6571cb18" "en_whisper_base.ggml"
[D] 13:28:20.860 0x7f65621fe600 () - found model: "en_whisper_base"
[D] 13:28:20.863 0x7f65621fe600 () - found model: "am_espeak_am"
[D] 13:28:20.863 0x7f65621fe600 () - found model: "ar_espeak_ar"
[D] 13:28:20.863 0x7f65621fe600 () - found model: "bg_espeak_bg"
[D] 13:28:20.863 0x7f65621fe600 () - found model: "bs_espeak_bs"
[D] 13:28:20.863 0x7f65621fe600 () - found model: "ca_espeak_ca"
[D] 13:28:20.863 0x7f65621fe600 () - found model: "cs_espeak_cs"
[D] 13:28:20.864 0x7f65621fe600 () - found model: "da_espeak_da"
[D] 13:28:20.864 0x7f65621fe600 () - found model: "de_espeak_de"
[D] 13:28:20.864 0x7f65621fe600 () - found model: "el_espeak_el"
[D] 13:28:20.864 0x7f65621fe600 () - found model: "en_espeak_en"
[D] 13:28:20.864 0x7f65621fe600 () - found model: "eo_espeak_eo"
[D] 13:28:20.864 0x7f65621fe600 () - found model: "es_espeak_es"
[D] 13:28:20.864 0x7f65621fe600 () - found model: "et_espeak_et"
[D] 13:28:20.864 0x7f65621fe600 () - found model: "eu_espeak_eu"
[D] 13:28:20.864 0x7f65621fe600 () - found model: "is_espeak_is"
[D] 13:28:20.864 0x7f65621fe600 () - found model: "fa_espeak_fa"
[D] 13:28:20.864 0x7f65621fe600 () - found model: "fi_espeak_fi"
[D] 13:28:20.864 0x7f65621fe600 () - found model: "fr_espeak_fr"
[D] 13:28:20.864 0x7f65621fe600 () - found model: "hi_espeak_hi"
[D] 13:28:20.864 0x7f65621fe600 () - found model: "hr_espeak_hr"
[D] 13:28:20.864 0x7f65621fe600 () - found model: "hu_espeak_hu"
[D] 13:28:20.864 0x7f65621fe600 () - found model: "id_espeak_id"
[D] 13:28:20.864 0x7f65621fe600 () - found model: "it_espeak_it"
[D] 13:28:20.864 0x7f65621fe600 () - found model: "ja_espeak_ja"
[D] 13:28:20.864 0x7f65621fe600 () - found model: "kk_espeak_kk"
[D] 13:28:20.864 0x7f65621fe600 () - found model: "ko_espeak_ko"
[D] 13:28:20.864 0x7f65621fe600 () - found model: "lv_espeak_lv"
[D] 13:28:20.864 0x7f65621fe600 () - found model: "lt_espeak_lt"
[D] 13:28:20.864 0x7f65621fe600 () - found model: "mk_espeak_mk"
[D] 13:28:20.864 0x7f65621fe600 () - found model: "ms_espeak_ms"
[D] 13:28:20.864 0x7f65621fe600 () - found model: "ne_espeak_ne"
[D] 13:28:20.864 0x7f65621fe600 () - found model: "nl_espeak_nl"
[D] 13:28:20.864 0x7f65621fe600 () - found model: "no_espeak_no"
[D] 13:28:20.864 0x7f65621fe600 () - found model: "pt_espeak_pt"
[D] 13:28:20.864 0x7f65621fe600 () - found model: "pt_espeak_pt_br"
[D] 13:28:20.864 0x7f65621fe600 () - found model: "ro_espeak_ro"
[D] 13:28:20.864 0x7f65621fe600 () - found model: "ru_espeak_ru"
[D] 13:28:20.864 0x7f65621fe600 () - found model: "sk_espeak_sk"
[D] 13:28:20.864 0x7f65621fe600 () - found model: "sl_espeak_sl"
[D] 13:28:20.864 0x7f65621fe600 () - found model: "sr_espeak_sr"
[D] 13:28:20.864 0x7f65621fe600 () - found model: "sv_espeak_sv"
[D] 13:28:20.864 0x7f65621fe600 () - found model: "sw_espeak_sw"
[D] 13:28:20.864 0x7f65621fe600 () - found model: "th_espeak_th"
[D] 13:28:20.864 0x7f65621fe600 () - found model: "tr_espeak_tr"
[D] 13:28:20.864 0x7f65621fe600 () - found model: "uk_espeak_uk"
[D] 13:28:20.864 0x7f65621fe600 () - found model: "ka_espeak_ka"
[D] 13:28:20.864 0x7f65621fe600 () - found model: "ky_espeak_ky"
[D] 13:28:20.864 0x7f65621fe600 () - found model: "la_espeak_la"
[D] 13:28:20.864 0x7f65621fe600 () - found model: "tt_espeak_tt"
[D] 13:28:20.864 0x7f65621fe600 () - found model: "sq_espeak_sq"
[D] 13:28:20.864 0x7f65621fe600 () - found model: "uz_espeak_uz"
[D] 13:28:20.864 0x7f658be10d80 () - module already unpacked: "rhvoicedata"
[D] 13:28:20.865 0x7f65621fe600 () - found model: "vi_espeak_vi"
[D] 13:28:20.865 0x7f65621fe600 () - found model: "zh_espeak_yue"
[D] 13:28:20.865 0x7f65621fe600 () - found model: "zh_espeak_hak"
[D] 13:28:20.865 0x7f65621fe600 () - found model: "zh_espeak_cmn"
[D] 13:28:20.865 0x7f65621fe600 () - found model: "ga_espeak_ga"
[D] 13:28:20.865 0x7f65621fe600 () - found model: "mt_espeak_mt"
[D] 13:28:20.865 0x7f65621fe600 () - found model: "bn_espeak_bn"
[D] 13:28:20.865 0x7f65621fe600 () - found model: "pl_espeak_pl"
[D] 13:28:20.865 0x7f658be10d80 () - module already unpacked: "rhvoiceconfig"
[D] 13:28:20.868 0x7f65621fe600 () - models changed
[D] 13:28:20.876 0x7f658be10d80 () - module already unpacked: "espeakdata"
[D] 13:28:20.877 0x7f658be10d80 () - default tts model not found: "en"
[D] 13:28:20.877 0x7f658be10d80 () - default mnt lang not found: "en"
[D] 13:28:20.877 0x7f658be10d80 () - new default mnt lang: "en"
[D] 13:28:20.877 0x7f658be10d80 () - service refresh status, new state: idle
[D] 13:28:20.877 0x7f658be10d80 () - service state changed: unknown => idle
[D] 13:28:21.115 0x7f658be10d80 () - starting app: app-standalone
[D] 13:28:21.115 0x7f658be10d80 () - app service state: unknown => idle
[D] 13:28:21.115 0x7f658be10d80 () - app stt available models: 0 => 1
[D] 13:28:21.115 0x7f658be10d80 () - update listen
[D] 13:28:21.115 0x7f658be10d80 () - app active stt model: "" => "en_whisper_base"
[D] 13:28:21.115 0x7f658be10d80 () - update listen
[W] 13:28:21.116 0x7f658be10d80 () - no available mnt langs
[W] 13:28:21.116 0x7f658be10d80 () - no available mnt out langs
[W] 13:28:21.116 0x7f658be10d80 () - no available tts models for in mnt
[W] 13:28:21.116 0x7f658be10d80 () - no available tts models for out mnt
[W] 13:28:21.116 0x7f658be10d80 () - invalid task, reseting task state
[D] 13:28:21.116 0x7f658be10d80 () - app stt configured: false => true
logger error: invalid format string
qrc:/qml/main.qml:165:5: QML Connections: Implicitly defined onFoo properties in Connections are deprecated. Use this syntax instead: function onFoo(<arguments>) { ... }
logger error: invalid format string
qrc:/qml/main.qml:156:5: QML Connections: Implicitly defined onFoo properties in Connections are deprecated. Use this syntax instead: function onFoo(<arguments>) { ... }
logger error: invalid format string
qrc:/qml/Notepad.qml:24:5: QML Connections: Implicitly defined onFoo properties in Connections are deprecated. Use this syntax instead: function onFoo(<arguments>) { ... }
logger error: invalid format string
qrc:/qml/Translator.qml:29:5: QML Connections: Implicitly defined onFoo properties in Connections are deprecated. Use this syntax instead: function onFoo(<arguments>) { ... }
[D] 13:28:21.309 0x7f658be10d80 onCompleted:85 - default font pixel size: 14
[D] 13:28:21.328 0x7f658be10d80 () - default tts model not found: "en"
[D] 13:28:21.328 0x7f658be10d80 () - default mnt lang not found: "en"
[D] 13:28:21.328 0x7f658be10d80 () - new default mnt lang: "en"
[D] 13:28:21.328 0x7f658be10d80 () - service refresh status, new state: idle
[D] 13:28:21.328 0x7f658be10d80 () - service refresh status, new state: idle
[W] 13:28:21.380 0x7f658be10d80 ():164 - qrc:/qml/Translator.qml:164:9: QML ColumnLayout (parent or ancestor of QQuickLayoutAttached): Binding loop detected for property "preferredWidth"
[D] 13:28:21.524 0x7f658be10d80 () - stt models changed
[D] 13:28:21.525 0x7f658be10d80 () - update listen
[D] 13:28:21.525 0x7f658be10d80 () - tts models changed
[D] 13:28:21.525 0x7f658be10d80 () - update listen
[W] 13:28:21.525 0x7f658be10d80 () - no available tts models for in mnt
[W] 13:28:21.525 0x7f658be10d80 () - no available tts models for out mnt
[D] 13:28:21.525 0x7f658be10d80 () - ttt models changed
[D] 13:28:21.526 0x7f658be10d80 () - mnt langs changed
[D] 13:28:21.526 0x7f658be10d80 () - update listen
[W] 13:28:21.526 0x7f658be10d80 () - no available mnt langs
[W] 13:28:21.526 0x7f658be10d80 () - no available mnt out langs
[D] 13:28:35.806 0x7f658be10d80 () - default tts model not found: "en"
[D] 13:28:35.807 0x7f658be10d80 () - default mnt lang not found: "en"
[D] 13:28:35.807 0x7f658be10d80 () - new default mnt lang: "en"
[D] 13:28:35.807 0x7f658be10d80 () - choosing model for id: "en_whisper_base" "en"
[D] 13:28:35.807 0x7f658be10d80 () - restart stt engine config: "lang=en, model-files=[model-file=/home/chrisshaw/.var/app/net.mkiol.SpeechNote/cache/net.mkiol/dsnote/speech-models/en_whisper_base.ggml, scorer-file=, ttt-model-file=], speech-mode=automatic, vad-mode=aggressiveness-3, speech-started=0, use-gpu=0, gpu-device=[id=-1, api=opencl, name=, platform-name=]"
[D] 13:28:35.807 0x7f658be10d80 () - new stt engine required
[D] 13:28:35.808 0x7f658be10d80 open_whisper_lib:109 - using whisper-openblas
[D] 13:28:37.109 0x7f658be10d80 make_wparams:340 - cpu info: arch=x86_64, cores=4
[D] 13:28:37.110 0x7f658be10d80 make_wparams:342 - using threads: 4/4
[D] 13:28:37.110 0x7f658be10d80 make_wparams:344 - system info: AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | METAL = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | COREML = 0 | OPENVINO = 0 | 
[D] 13:28:37.110 0x7f658be10d80 start:199 - starting engine
[D] 13:28:37.110 0x7f658be10d80 start:207 - engine started
[D] 13:28:37.110 0x7f658be10d80 () - creating audio source
[D] 13:28:37.110 0x7f658be10d80 () - mic source created
[D] 13:28:37.110 0x7f64fbc15600 start_processing:244 - processing started
[D] 13:28:37.110 0x7f64fbc15600 set_processing_state:430 - processing state: idle => initializing
[D] 13:28:37.110 0x7f64fbc15600 set_processing_state:437 - speech detection status: no-speech => initializing (no-speech)
[D] 13:28:37.110 0x7f64fbc15600 () - service refresh status, new state: idle
[D] 13:28:37.110 0x7f64fbc15600 () - task state changed: 0 => 3
[D] 13:28:37.110 0x7f64fbc15600 create_whisper_model:175 - creating whisper model
whisper_init_from_file_no_state: loading model from '/home/chrisshaw/.var/app/net.mkiol.SpeechNote/cache/net.mkiol/dsnote/speech-models/en_whisper_base.ggml'
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51864
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 512
whisper_model_load: n_audio_head  = 8
whisper_model_load: n_audio_layer = 6
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 512
whisper_model_load: n_text_head   = 8
whisper_model_load: n_text_layer  = 6
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 9
whisper_model_load: qntvr         = 2
whisper_model_load: type          = 2
whisper_model_load: adding 1607 extra tokens
whisper_model_load: model ctx     =   56.51 MB
[D] 13:28:37.340 0x7f658be10d80 () - using audio input: "alsa_input.usb-046d_HD_Pro_Webcam_C920_2AE889FF-02.analog-stereo"
whisper_model_load: model size    =   56.38 MB
whisper_init_state: kv self size  =    5.25 MB
whisper_init_state: kv cross size =   17.58 MB
whisper_init_state: compute buffer (conv)   =   14.10 MB
whisper_init_state: compute buffer (encode) =   81.85 MB
whisper_init_state: compute buffer (cross)  =    4.40 MB
whisper_init_state: compute buffer (decode) =   24.61 MB
[D] 13:28:37.440 0x7f64fbc15600 create_whisper_model:185 - whisper model created
[D] 13:28:37.440 0x7f64fbc15600 set_processing_state:430 - processing state: initializing => idle
[D] 13:28:37.440 0x7f64fbc15600 set_processing_state:437 - speech detection status: initializing => no-speech (no-speech)
[D] 13:28:37.440 0x7f64fbc15600 () - service refresh status, new state: idle
[D] 13:28:37.440 0x7f64fbc15600 () - task state changed: 3 => 0
[D] 13:28:37.657 0x7f658be10d80 () - audio state: IdleState
[D] 13:28:37.658 0x7f658be10d80 () - service refresh status, new state: listening-auto
[D] 13:28:37.658 0x7f658be10d80 () - service state changed: idle => listening-auto
[W] 13:28:37.660 0x7f658be10d80 () - ignore TaskStatePropertyChanged signal
[W] 13:28:37.660 0x7f658be10d80 () - ignore TaskStatePropertyChanged signal
[D] 13:28:37.660 0x7f658be10d80 () - app current task: -1 => 0
[W] 13:28:37.660 0x7f658be10d80 () - invalid task, reseting task state
[D] 13:28:37.660 0x7f658be10d80 () - app service state: idle => listening-auto
[W] 13:28:37.664 0x7f658be10d80 () - no available mnt langs
[W] 13:28:37.664 0x7f658be10d80 () - no available mnt out langs
[W] 13:28:37.664 0x7f658be10d80 () - no available tts models for in mnt
[W] 13:28:37.664 0x7f658be10d80 () - no available tts models for out mnt
[W] 13:28:37.664 0x7f658be10d80 () - invalid task, reseting task state
[D] 13:28:37.847 0x7f658be10d80 () - audio state: ActiveState
[D] 13:28:39.178 0x7f64fbc15600 process_buff:195 - process samples buf: mode=automatic, in-buf size=24000, speech-buf size=0, sof=true, eof=false
[D] 13:28:39.210 0x7f64fbc15600 process_buff:226 - vad: no speech
[D] 13:28:40.762 0x7f64fbc15600 process_buff:195 - process samples buf: mode=automatic, in-buf size=24000, speech-buf size=0, sof=false, eof=false
[D] 13:28:40.795 0x7f64fbc15600 process_buff:226 - vad: no speech
[D] 13:28:42.162 0x7f64fbc15600 process_buff:195 - process samples buf: mode=automatic, in-buf size=24000, speech-buf size=0, sof=false, eof=false
[D] 13:28:42.194 0x7f64fbc15600 process_buff:226 - vad: no speech
[D] 13:28:43.561 0x7f64fbc15600 process_buff:195 - process samples buf: mode=automatic, in-buf size=24000, speech-buf size=0, sof=false, eof=false
[D] 13:28:43.597 0x7f64fbc15600 process_buff:226 - vad: no speech
[D] 13:28:45.162 0x7f64fbc15600 process_buff:195 - process samples buf: mode=automatic, in-buf size=24000, speech-buf size=0, sof=false, eof=false
[D] 13:28:45.201 0x7f64fbc15600 process_buff:226 - vad: no speech
[D] 13:28:46.561 0x7f64fbc15600 process_buff:195 - process samples buf: mode=automatic, in-buf size=24000, speech-buf size=0, sof=false, eof=false

** (dsnote:2): WARNING **: 13:28:46.596: atk-bridge: get_device_events_reply: unknown signature
[D] 13:28:46.600 0x7f64fbc15600 process_buff:226 - vad: no speech
[D] 13:28:48.162 0x7f64fbc15600 process_buff:195 - process samples buf: mode=automatic, in-buf size=24000, speech-buf size=0, sof=false, eof=false
[D] 13:28:48.202 0x7f64fbc15600 process_buff:226 - vad: no speech
[D] 13:28:49.762 0x7f64fbc15600 process_buff:195 - process samples buf: mode=automatic, in-buf size=24000, speech-buf size=0, sof=false, eof=false
[D] 13:28:49.800 0x7f64fbc15600 process_buff:226 - vad: no speech
[D] 13:28:51.162 0x7f64fbc15600 process_buff:195 - process samples buf: mode=automatic, in-buf size=24000, speech-buf size=0, sof=false, eof=false
[D] 13:28:51.200 0x7f64fbc15600 process_buff:226 - vad: no speech
[D] 13:28:52.762 0x7f64fbc15600 process_buff:195 - process samples buf: mode=automatic, in-buf size=24000, speech-buf size=0, sof=false, eof=false
[D] 13:28:52.797 0x7f64fbc15600 process_buff:226 - vad: no speech
[D] 13:28:54.162 0x7f64fbc15600 process_buff:195 - process samples buf: mode=automatic, in-buf size=24000, speech-buf size=0, sof=false, eof=false
[D] 13:28:54.175 0x7f64fbc15600 process_buff:226 - vad: no speech
[D] 13:28:55.561 0x7f64fbc15600 process_buff:195 - process samples buf: mode=automatic, in-buf size=24000, speech-buf size=0, sof=false, eof=false
[D] 13:28:55.593 0x7f64fbc15600 process_buff:226 - vad: no speech
[D] 13:28:57.162 0x7f64fbc15600 process_buff:195 - process samples buf: mode=automatic, in-buf size=24000, speech-buf size=0, sof=false, eof=false
[D] 13:28:57.184 0x7f64fbc15600 process_buff:226 - vad: no speech
[D] 13:28:58.762 0x7f64fbc15600 process_buff:195 - process samples buf: mode=automatic, in-buf size=24000, speech-buf size=0, sof=false, eof=false
[D] 13:28:58.774 0x7f64fbc15600 process_buff:226 - vad: no speech
[D] 13:29:00.164 0x7f64fbc15600 process_buff:195 - process samples buf: mode=automatic, in-buf size=24000, speech-buf size=0, sof=false, eof=false
[D] 13:29:00.181 0x7f64fbc15600 process_buff:226 - vad: no speech
[D] 13:29:01.762 0x7f64fbc15600 process_buff:195 - process samples buf: mode=automatic, in-buf size=24000, speech-buf size=0, sof=false, eof=false
[D] 13:29:01.798 0x7f64fbc15600 process_buff:226 - vad: no speech
[D] 13:29:02.215 0x7f658be10d80 () - cancel
[D] 13:29:02.215 0x7f658be10d80 () - stop stt engine
[D] 13:29:02.215 0x7f658be10d80 stop:225 - stop requested
[D] 13:29:02.215 0x7f658be10d80 stop_processing_impl:166 - whisper cancel
[D] 13:29:02.215 0x7f64fbc15600 flush:446 - flush: exit
[D] 13:29:02.215 0x7f64fbc15600 reset_in_processing:356 - reset in processing
[D] 13:29:02.215 0x7f64fbc15600 start_processing:279 - processing ended
[D] 13:29:02.215 0x7f658be10d80 stop:240 - stop completed
[D] 13:29:02.215 0x7f658be10d80 () - mic source dtor
[D] 13:29:02.215 0x7f658be10d80 () - audio state: SuspendedState
[D] 13:29:02.215 0x7f658be10d80 () - audio ended
[D] 13:29:02.217 0x7f658be10d80 () - service refresh status, new state: idle
[D] 13:29:02.217 0x7f658be10d80 () - service state changed: listening-auto => idle
[D] 13:29:02.217 0x7f658be10d80 () - service refresh status, new state: idle
[D] 13:29:02.217 0x7f658be10d80 () - app current task: 0 => -1
[W] 13:29:02.217 0x7f658be10d80 () - invalid task, reseting task state
[D] 13:29:02.217 0x7f658be10d80 () - app service state: listening-auto => idle
[W] 13:29:02.221 0x7f658be10d80 () - no available mnt langs
[W] 13:29:02.221 0x7f658be10d80 () - no available mnt out langs
[W] 13:29:02.221 0x7f658be10d80 () - no available tts models for in mnt
[W] 13:29:02.221 0x7f658be10d80 () - no available tts models for out mnt
[W] 13:29:02.221 0x7f658be10d80 () - invalid task, reseting task state

Add support of FasterWhisper

Hello,

I really appreciate your project! I think it's going in a very nice and useful direction!

I note that you support the Coqui STT, Vosk and whisper.cpp engines.
Would it be possible to add guillaumekln's fasterwhisper STT engine? (Here)

FasterWhisper has the advantage of being incredibly faster than whisper.cpp, while consuming relatively little extra RAM (the differences are shown in a table on its github).
So I think it would be a great idea! The models have, if I've understood correctly, been modified but are available on HuggingFace (again, everything is very well indicated on its github).

Thanks in advance! Good luck with the rest of the project ;)

Breizhux

SpeechNote crashes

I have installed and run SpeechNote from flatpak. It starts up fine, but as soon as I press Listen, it loads the speech model and crashes.

$ flatpak run net.mkiol.SpeechNote
Gtk-Message: 13:44:53.142: Failed to load module "xapp-gtk3-module"
Qt: Session management error: Authentication Rejected, reason : None of the authentication protocols specified are supported and host-based authentication failed

I select the Speech to text model and press Listen

whisper_init_from_file_no_state: loading model from '/home/user/.var/app/net.mkiol.SpeechNote/cache/net.mkiol/dsnote/speech-models/multilang_whisper_base.ggml'
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51865
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 512
whisper_model_load: n_audio_head  = 8
whisper_model_load: n_audio_layer = 6
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 512
whisper_model_load: n_text_head   = 8
whisper_model_load: n_text_layer  = 6
whisper_model_load: n_mels        = 80
whisper_model_load: f16           = 1
whisper_model_load: type          = 2
whisper_model_load: mem required  =  218,00 MB (+    6,00 MB per decoder)
whisper_model_load: adding 1608 extra tokens
whisper_model_load: model ctx     =  140,60 MB

And SpeechNote crashes.
The same situation occurs for each selected speech model.

(Linux Mint 21.1, Xfce 4.16)

Hotkeys would be nice :)

hello,

enter for reading
p for pause
and more if you like.
Maybe you can add some things in the settings to config all hotkeys.
:)

Spellcheck

Speech note is an excellent software that can solve a lot of my tasks. A small improvement proposal on my part would be the implementation of a spell check (e.g., Hunspell, Aspell) in the notepad. This would be very useful, for example, if you want to have text translated and make sure that there are no unnecessary errors before translation due to small typos. Probably the best solution would be an integration of grammar checks via LanguageTool (remote API or local server).

TTS RHVoice and Coqui (and maybe others) fails to run speech engine for texts with new lines.

Flatpak 4.1.0
For now i tested only following engines:

  • Coqui (MMS, Mai VITS)
  • Espeak (MBROLA, Robot)
  • Piper
  • RHVoice

Espeak and Piper works for every text so far. Coqui and RHVoice can't read text if there's at least one new line.

Cause is probably that for newline it's creating empty task.

[D] 20:14:48.26 0x7fc8df77ed80 encode_speech:174 - task: SENTENCE_BEFORE_NEW_LINE
[D] 20:14:48.26 0x7fc8df77ed80 encode_speech:174 - task: 
[D] 20:14:48.26 0x7fc8df77ed80 encode_speech:174 - task: SENTENCE_AFTER_NEW_LINE
[E] 20:14:59.438 0x7fc8d09ff600 operator():260 - py error: ValueError: You need to define either `text` (for sythesis) or a `reference_wav` (for voice conversion) to use the Coqui TTS API.

Save to audio file seemingly not working for large texts

I must first say that this project is amazing, really a game changer for me since I don't need to fiddle with conda environments in terminals to get different models working.

I am right now trying to transcribe a book with about 700 pages, since there is no audio book version, and especially the Piper Joe Medium model sounded amazing.

But it just doesn't save. It does though if I cut it in smaller chunks. I tried wav and Opus, thinking compression might have broke it, but nothing seems to make it save. It outputs a initialization error. "Error: text to speech initialization engine has failed"

Also, it refuses to initialize TTS again afterwards, and the app needs a restart.

I am on a fedora linux 38 system. I'm using the latest version of Speech Note.

Here are outputs from terminal upon trying to save the Wav file:

image

Same colorful text all over til the very end.

Interestingly Vorbis had the same pattern, but something different at the very end:

image

'--action' and dbus issue

Thanks for this app!

I found the following issues while exploring the automation tools provided via the beta flatpak.

First, invoking any of the reading actions (start-reading, start-reading-clipboard, or pause-resume-reading) through --action cmdline option will not work, the program just prints:

Invalid action. Use one option from the following: start-listening, start-listening-active-window, start-listening-clipboard, stop-listening, start-reading, start-reading-clipboard, pause-resume-reading, cancel.

Second, I didn't have any problem using the dbus org.freedesktop.Application interface, calling ActivateAction works perfectly fine. But I could not find what is defined in dbus/org.mkiol.Speech.xml on the dbus session, it seems that powerful interface isn't exposed at all. Is this normal?

French translation (of the gui)

Some parts of Speechnote are note well translated.
I would like to help. I forked the git. Can I just use git, or do you have an other tool for translation?

no main menu .desktop

I just installed the software through flat hub and it does not produce a main menu icon in my start menu, I am using Zorin 16 os. I can get it running through the command but there is no entry in my start menu.

[idea] Translate option for non-english whisper models

As whisper is now supported (great stuff, thank You) it would be really cool if one could tick a box maybe and use the ability of whisper to translate to english, would be really handy when going abroad to be able just record people speaking local language and get instant translation

Flathub description has wrong verb tense and no article before network connection.

Speech Note enables you to take and read notes with your voice with multiple languages. It uses Speech to Text and Text to Speech conversions to do so. All voice processing is entirely done off-line, locally on your computer without the use of network connection. Your privacy is always respected. No data is send to the Internet.

It should be

a network connection
sent

service isn't restarted after switching storage directory

In Speech Notes' settings, when changing the directory where the Deep Speech models are stored, the harbour-dsnote.service isn't restarted and keep looking at the old (wrong) path.

Context:

  • I've been using Speech Notes on my Xperia XA2 (32GB edition, not that much storage available for /home)
  • I'm storing the models on the external SD card
  • I've recently switch to an Xperia 10iii, and installed Speech Notes on it too
  • I've moved the SD card to the new device
  • I've changed the Location of language files to point to the path on the SD Card.
  • Speech Notes now sees the already installed language models.
  • Speech Notes can even download new models.
  • BUT the settings doesn't allow me to select a model, only to download new
  • and the main pannel complains that no model has been configured.

Current work-around:

  • as the user (E.g.: nemo or defaultUser)
systemctl --user restart harbour-dsnote.service

Request:

  • Could it be possible for the app to trigger the service restart?
  • Or could you change the API so clients such as Speech Notes could send a "please restart" command?

Rename dsnote repository to SpeechNote

I notice the only way i can find your git repo is via the FlatHub package.
Would you be willing to change the name to "SpeechNote" as its better SEO as your tool will likely come up better on search and more people will find it i think.

Let me know what you think.

How to get GPU acceleration working? (Debian 12.2, Gnome, Wayland, X11, Nvidia P1000 GPU, Zbook Studio G5)

Hello,

I got the "Speech Note" Flatpak working on my Debian 12 system (Zbook Studio G5). I can use Whisper in offline mode here. After downloading Whisper (large and/or medium), the speech recognition is quite good, but very slow (50 sec.). GPU acceleration would help, so I installed the Nivida drivers for my P1000. They work just fine with games, eg., but not with "Speech Note" and Whisper. Any ideas how to fix this? How do I get my Nvidia card to accelerate the speech recognition of whisper on Debian 12? Maybe this is a bug?

My Nvidia Driver Version: 525.125.06

I have already libcudart11.0 and nvidia-cuda-toolkit installed.

I tried both Wayland and X11.

My Card, the P1000, seems to supports CUDA 6.1 - this should be enough?

Terminal output, when starting Speech Note:

flatpak run net.mkiol.SpeechNote 
QSocketNotifier: Can only be used with threads started with QThread
qt.qpa.qgnomeplatform: Could not find color scheme  ""
ALSA lib ../../oss/pcm_oss.c:397:(_snd_pcm_oss_open) Cannot open device /dev/dsp
ALSA lib ../../../src/pcm/pcm_direct.c:2045:(snd1_pcm_direct_parse_open_conf) The field ipc_gid must be a valid group (create group audio)


Some screens: 
![Bildschirmfoto vom 2023-10-16 19-43-06](https://github.com/mkiol/dsnote/assets/148144728/9fd7c5af-15b6-405c-bb53-d69e603fda99)
![Bildschirmfoto vom 2023-10-16 19-42-36](https://github.com/mkiol/dsnote/assets/148144728/a79775e9-9f99-43bf-b80c-36a9ed15a3a4)



Random jumps in language downloading menu.

Flatpak 4.1.0

After clicking download, scroll is jumping to top of list (most often if while clicked it was scrolled down), then when i will click download to some model at top, scroll may jump down.

Nothing is showing on output when runned with dsnote --verbose command, except following, but these occur only when opening languages menu.

[W] 20:33:49.225 0x7fe9ee845d80 () -   OpenType support missing for "Unifont", script 12
[W] 20:33:49.312 0x7fe9d1066600 () -   OpenType support missing for "Unifont", script 12
[W] 20:33:49.371 0x7fe9ee845d80 () -   OpenType support missing for "Biwidth", script 11
[W] 20:33:49.380 0x7fe9ee845d80 () -   OpenType support missing for "Fixed", script 11
[W] 20:33:49.398 0x7fe9d1066600 () -   OpenType support missing for "Biwidth", script 11
[W] 20:33:49.407 0x7fe9d1066600 () -   OpenType support missing for "Fixed", script 11

Add open dyslexic font

Hi,
It might be very useful to add open dyslexic fonts for some people who need it.
Also, import PDF files for transformation into audio files.
Thanks.
A.

Speech Note crashes on start

I'm on OpenSUSE Tumbleweed and I'm using the Flatpak version of Speech Note.

$ flatpak run net.mkiol.SpeechNote 
Qt: Session management error: Could not open network socket
ALSA lib ../../oss/pcm_oss.c:397:(_snd_pcm_oss_open) Cannot open device /dev/dsp
ALSA lib ../../../src/pcm/pcm_direct.c:2045:(snd1_pcm_direct_parse_open_conf) The field ipc_gid must be a valid group (create group audio)
ALSA lib ../../../src/pcm/pcm_direct.c:2045:(snd1_pcm_direct_parse_open_conf) The field ipc_gid must be a valid group (create group audio)
free(): invalid size

[TTS] Add Mimic 3 models

I would like to see the Mimic 3 models in this app

A link to the GitHub is HERE.

It does a better job than Piper in my opinion and sounds more real.

P.S Awesome Project, keep up the good work.

"Error: couldn't download the model file"

Hello. First of all, thank you for your work. It looks fantastic. At least, until now, I couldn't try it, since the following problem appears (I took a screenshot so you can see it: "01. Text to speech Spanish - Error") By the way, I had no problem downloading in English.

02  English OK
01  Text to speech Spanish - Error

Sorry if this request is not well made, it's my first time using Github.

Thanks for your job.

Crash (Illegal instruction) with DeepSpeech model

Original issue #8

backtrace:

Thread 1 "dsnote" received signal SIGILL, Illegal instruction.
0x00007fffd02795a7 in ?? () from /app/lib/libkenlm.so

cpu flags:

flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl cpuid aperfmperf pni dtes64 monitor ds_cpl smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 xsave lahf_lm pti dtherm

Should OpenCL work on an Ice Lake v11 intel processor?

On stable and beta versions, it is saying that a suitable GPU isn't available. I've installed OpenCL packages on Fedora 38 and the equivalent flatpak OpenCL packages but it still says not available.

I get that it might be the case that it wouldn't be useful given it isn't a powerful discrete gpu, but wondered if it might be a bug causing it to report as unavailable.

Add option for bigger whisper models

Looking through whisper.cpp it needs 3x less memory than the original, which would make it possible to run even large model on xperia 10 III (3.3GB vs 10GB), this would probably be overkill, especially speed would suffer a lot, but adding small and medium would probably make sense

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.