It needs better instructions and information. The OLD E-Speak Robot voice us

This is really excellent software - I am genuinely impressed - but it needs a little bit of polishing. about dsnote HOT 6 OPEN

mkiol commented on May 9, 2024 1

This is really excellent software - I am genuinely impressed - but it needs a little bit of polishing.

from dsnote.

Comments (6)

mkiol commented on May 9, 2024 3

'Pause', 'save audio in MP3' and 'MP3 quality/compression level' are available since version 4.2.0.

better instruction / help option in the app
sample rate info in a voice description
save audio in MP3
option to set MP3 quality/compression level
speed control
pitch control (maybe)
pause
improved voice names
info about how many resources requires particular model

from dsnote.

mkiol commented on May 9, 2024 1

Thank you for valuable remarks! Let me answer to them one by one.

It needs better instructions and information.

Totally agree. I know, there are a lot of various models with weird names. User simply don't know which she/he should choose. This is something I need to improve.

So we need to see the data rate each voice operates

Do you mean sample rate? For example 16 kHz, 22,05 kHz or 44,1 kHz? It is good idea I think.

MP3 conversion rates

Correctly you can save only in uncompressed format (WAV) but I'm working right now, to add an option to save to MP3 and OGG. There will be option for compression level as well.

The audio player needs a speed

Already implemented 😄 Will be included in upcoming release.

and pitch control,

It is doable but do you really need to change a pitch?

along with pause and stop

Yes. Pause feature is already in the roadmap.

So the voice names need a scale beside them - I figure that the small, medium and large designations MIGHT be linked to a data rate

Actually they are just the names taken from original models. For Piper voices, "Low", "Medium", "High" usually relates to sample rate of output audio and "Low" requires less memory and CPU power than "High".

Definitely, names of all voices have to be improved.

So the down scale options are needed.. "Oh voice X uses 200 times the resources of E-Speak Robot..

In general, in terms of needed resources it looks like this: espeak < espeak-mbrola < rhvoice < piper-low < piper-medium < piper-high < coqui.

I'm thinking about adding tags to voice description. Something like: "fast", "slow", "very-slow"...

I am REALLY impressed with what you all have done so far... It's incredible... I mean this is really good.

Thank you. It is very nice to know that my work is useful 😀.

Sum up:

better instruction / help option in the app
sample rate info in a voice description
save audio in MP3
option to set MP3 quality/compression level
speed control
pitch control (maybe)
pause
improved voice names
info about how many resources requires particular model

from dsnote.

Me2U2 commented on May 9, 2024 1

Using the voice: English British (Southern Low Female) - it's perfect enough - runs all right on the laptop. The highest rate voices - buffer badly.

Yeah and some how I got it wrong in saving to MP3... there is only MS Wav... No Idea how I got that so wrong....

from dsnote.

Greenheart commented on May 9, 2024 1

I had similar thoughts regarding getting a better understanding of which models will create larger files, and/or require more processing power.

But most of all, I just wanted to let you know how much I appreciate this software! It's already quite powerful and has great potential for the future! Keep up the good work! :)

from dsnote.

Me2U2 commented on May 9, 2024

It already can save in different formats, including MP3, but there is no bit rate setting. In the early days of the earliest Mp3 players, that ran a AAA battery and ear plugs, and were quite good actually, and they had like 128 meg of memory, well this was excellent training for the trade off's between audio quality and file size... And most of my recordings were the typical AA speaker, from the podium telling their life stories. So in order to get an hours recording down to a "clear enough" sound quality, and to be able to fit as many files into the memory, one had to become quite creative... Now even basic phones come with slots for Terror Bite micro-SD cards to store audio files on, and so the necessity to be rather desperate space is gone, the fundamental issue, of if an document saved as an audio file can be saved as a 30 meg file instead of a 300 meg file, arises because the 300 meg file, while technically better in sound quality, it's not that much better - in order to justify the file size, the processing time, and the overheads to save files in very high fidelity audio... The lower spec MP3 files they are good enough for 98% of most peoples work. But there might be people who have the processors, and the time and the need for almost flawless MP3's and other formats, so cutting them out just because I am a cheapskate is not a good idea... but I processing time and file sizes, if a good enough audio is available from 44 Khz sampling - that is fine... where as 66 Khz, 96 Khz, 128 Khz, 256 Khz and 516 Khz - offers me no tangible benefit...

from dsnote.

Me2U2 commented on May 9, 2024

This is what I mean by acceptably shit audio quality - it's HIGHLY compressed, the audio is a little bit tinny and a little bit hissy, but the file size is small and it's clear enough to listen too.

https://www.recoveryaudio.org/aa-speaker-tapes/scott-gallagher-all-addictions-anonymous-founder

Going much below this in audio quality and a lower sampling rate and higher compression - from the original, it started to go from "acceptably shit and understandable" to "kind of really shit and hard to understand".

Where as making it heaps and heaps better, doesn't make it THAT much better...

But when reading out the long documents, so I can listen to them on long drives, or when resting or doing housework etc.. a nicer quality voice and a little bit better is a good thing...

I mean I used to convert microsoft documents into plain text, and then convert them into the robot voice, at the very beginning..
So anything above the most basic robot voice is an improvement - the question then becomes how much of an improvement is really necessary.

The other thing is to convert and have control of the audio level...

Now this is sort of kind of necessary... I have a lovely shit box work type car that has almost no sound insulation in the cabin...

So when I get recordings or zoom meetings where the speaker is very quiet, it's not hard to get my phone and the (protect your hearing and limited amplification) ear plugs drowned out by the car noises...

A preset of say 80% to 90% of the way towards clipping for everything could be a viable option.

Then all audio would be nice and strong, and not run out of sound before the amplifier does.

from dsnote.

Recommend Projects

This is really excellent software - I am genuinely impressed - but it needs a little bit of polishing. about dsnote HOT 6 OPEN

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent