Comments (6)
'Pause', 'save audio in MP3' and 'MP3 quality/compression level' are available since version 4.2.0.
- better instruction / help option in the app
- sample rate info in a voice description
- save audio in MP3
- option to set MP3 quality/compression level
- speed control
- pitch control (maybe)
- pause
- improved voice names
- info about how many resources requires particular model
from dsnote.
Thank you for valuable remarks! Let me answer to them one by one.
It needs better instructions and information.
Totally agree. I know, there are a lot of various models with weird names. User simply don't know which she/he should choose. This is something I need to improve.
So we need to see the data rate each voice operates
Do you mean sample rate? For example 16 kHz, 22,05 kHz or 44,1 kHz? It is good idea I think.
MP3 conversion rates
Correctly you can save only in uncompressed format (WAV) but I'm working right now, to add an option to save to MP3 and OGG. There will be option for compression level as well.
The audio player needs a speed
Already implemented 😄 Will be included in upcoming release.
and pitch control,
It is doable but do you really need to change a pitch?
along with pause and stop
Yes. Pause feature is already in the roadmap.
So the voice names need a scale beside them - I figure that the small, medium and large designations MIGHT be linked to a data rate
Actually they are just the names taken from original models. For Piper voices, "Low", "Medium", "High" usually relates to sample rate of output audio and "Low" requires less memory and CPU power than "High".
Definitely, names of all voices have to be improved.
So the down scale options are needed.. "Oh voice X uses 200 times the resources of E-Speak Robot..
In general, in terms of needed resources it looks like this: espeak < espeak-mbrola < rhvoice < piper-low < piper-medium < piper-high < coqui.
I'm thinking about adding tags to voice description. Something like: "fast", "slow", "very-slow"...
I am REALLY impressed with what you all have done so far... It's incredible... I mean this is really good.
Thank you. It is very nice to know that my work is useful 😀.
Sum up:
- better instruction / help option in the app
- sample rate info in a voice description
- save audio in MP3
- option to set MP3 quality/compression level
- speed control
- pitch control (maybe)
- pause
- improved voice names
- info about how many resources requires particular model
from dsnote.
Using the voice: English British (Southern Low Female) - it's perfect enough - runs all right on the laptop. The highest rate voices - buffer badly.
Yeah and some how I got it wrong in saving to MP3... there is only MS Wav... No Idea how I got that so wrong....
from dsnote.
I had similar thoughts regarding getting a better understanding of which models will create larger files, and/or require more processing power.
But most of all, I just wanted to let you know how much I appreciate this software! It's already quite powerful and has great potential for the future! Keep up the good work! :)
from dsnote.
It already can save in different formats, including MP3, but there is no bit rate setting. In the early days of the earliest Mp3 players, that ran a AAA battery and ear plugs, and were quite good actually, and they had like 128 meg of memory, well this was excellent training for the trade off's between audio quality and file size... And most of my recordings were the typical AA speaker, from the podium telling their life stories. So in order to get an hours recording down to a "clear enough" sound quality, and to be able to fit as many files into the memory, one had to become quite creative... Now even basic phones come with slots for Terror Bite micro-SD cards to store audio files on, and so the necessity to be rather desperate space is gone, the fundamental issue, of if an document saved as an audio file can be saved as a 30 meg file instead of a 300 meg file, arises because the 300 meg file, while technically better in sound quality, it's not that much better - in order to justify the file size, the processing time, and the overheads to save files in very high fidelity audio... The lower spec MP3 files they are good enough for 98% of most peoples work. But there might be people who have the processors, and the time and the need for almost flawless MP3's and other formats, so cutting them out just because I am a cheapskate is not a good idea... but I processing time and file sizes, if a good enough audio is available from 44 Khz sampling - that is fine... where as 66 Khz, 96 Khz, 128 Khz, 256 Khz and 516 Khz - offers me no tangible benefit...
from dsnote.
This is what I mean by acceptably shit audio quality - it's HIGHLY compressed, the audio is a little bit tinny and a little bit hissy, but the file size is small and it's clear enough to listen too.
https://www.recoveryaudio.org/aa-speaker-tapes/scott-gallagher-all-addictions-anonymous-founder
Going much below this in audio quality and a lower sampling rate and higher compression - from the original, it started to go from "acceptably shit and understandable" to "kind of really shit and hard to understand".
Where as making it heaps and heaps better, doesn't make it THAT much better...
But when reading out the long documents, so I can listen to them on long drives, or when resting or doing housework etc.. a nicer quality voice and a little bit better is a good thing...
I mean I used to convert microsoft documents into plain text, and then convert them into the robot voice, at the very beginning..
So anything above the most basic robot voice is an improvement - the question then becomes how much of an improvement is really necessary.
The other thing is to convert and have control of the audio level...
Now this is sort of kind of necessary... I have a lovely shit box work type car that has almost no sound insulation in the cabin...
So when I get recordings or zoom meetings where the speaker is very quiet, it's not hard to get my phone and the (protect your hearing and limited amplification) ear plugs drowned out by the car noises...
A preset of say 80% to 90% of the way towards clipping for everything could be a viable option.
Then all audio would be nice and strong, and not run out of sound before the amplifier does.
from dsnote.
Related Issues (20)
- Drag and drop support HOT 2
- why flatpak app is so big? HOT 15
- Transcribe a file does not work with mounted Google Drive on Gnome HOT 5
- stdout option please HOT 2
- Subtitle output from Whisper models HOT 1
- distil-whisper HOT 2
- Configure audio source
- CUDA does not appear to be working on Fedora with switchable graphics HOT 5
- Use Dbus for Desktop Integration HOT 8
- It is hard to see what the "Download" button corresponds to in the model download dialogue HOT 1
- Stop button smaller than cancel HOT 2
- Limit the number of CPU cores HOT 2
- Call dsnote system wide, read highlighted text from other applications HOT 1
- Drag and drop does not work on Kde HOT 1
- [Request] Start And Minimize To Tray HOT 1
- Estimate reading time HOT 2
- Transcribe + Translate any Whisper language into English HOT 3
- dsnot is not stopping STT while still being usabel for STT HOT 7
- Markdown formatting removed HOT 3
- Dsnote saves the last spoken text - Privacy HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dsnote.