Giter Club home page Giter Club logo

Comments (6)

mkiol avatar mkiol commented on May 9, 2024 3

'Pause', 'save audio in MP3' and 'MP3 quality/compression level' are available since version 4.2.0.

  • better instruction / help option in the app
  • sample rate info in a voice description
  • save audio in MP3
  • option to set MP3 quality/compression level
  • speed control
  • pitch control (maybe)
  • pause
  • improved voice names
  • info about how many resources requires particular model

from dsnote.

mkiol avatar mkiol commented on May 9, 2024 1

Thank you for valuable remarks! Let me answer to them one by one.

It needs better instructions and information.

Totally agree. I know, there are a lot of various models with weird names. User simply don't know which she/he should choose. This is something I need to improve.

So we need to see the data rate each voice operates

Do you mean sample rate? For example 16 kHz, 22,05 kHz or 44,1 kHz? It is good idea I think.

MP3 conversion rates

Correctly you can save only in uncompressed format (WAV) but I'm working right now, to add an option to save to MP3 and OGG. There will be option for compression level as well.

The audio player needs a speed

Already implemented 😄 Will be included in upcoming release.

and pitch control,

It is doable but do you really need to change a pitch?

along with pause and stop

Yes. Pause feature is already in the roadmap.

So the voice names need a scale beside them - I figure that the small, medium and large designations MIGHT be linked to a data rate

Actually they are just the names taken from original models. For Piper voices, "Low", "Medium", "High" usually relates to sample rate of output audio and "Low" requires less memory and CPU power than "High".

Definitely, names of all voices have to be improved.

So the down scale options are needed.. "Oh voice X uses 200 times the resources of E-Speak Robot..

In general, in terms of needed resources it looks like this: espeak < espeak-mbrola < rhvoice < piper-low < piper-medium < piper-high < coqui.

I'm thinking about adding tags to voice description. Something like: "fast", "slow", "very-slow"...

I am REALLY impressed with what you all have done so far... It's incredible... I mean this is really good.

Thank you. It is very nice to know that my work is useful 😀.

Sum up:

  • better instruction / help option in the app
  • sample rate info in a voice description
  • save audio in MP3
  • option to set MP3 quality/compression level
  • speed control
  • pitch control (maybe)
  • pause
  • improved voice names
  • info about how many resources requires particular model

from dsnote.

Me2U2 avatar Me2U2 commented on May 9, 2024 1

Using the voice: English British (Southern Low Female) - it's perfect enough - runs all right on the laptop. The highest rate voices - buffer badly.

Yeah and some how I got it wrong in saving to MP3... there is only MS Wav... No Idea how I got that so wrong....

from dsnote.

Greenheart avatar Greenheart commented on May 9, 2024 1

I had similar thoughts regarding getting a better understanding of which models will create larger files, and/or require more processing power.

But most of all, I just wanted to let you know how much I appreciate this software! It's already quite powerful and has great potential for the future! Keep up the good work! :)

from dsnote.

Me2U2 avatar Me2U2 commented on May 9, 2024

It already can save in different formats, including MP3, but there is no bit rate setting. In the early days of the earliest Mp3 players, that ran a AAA battery and ear plugs, and were quite good actually, and they had like 128 meg of memory, well this was excellent training for the trade off's between audio quality and file size... And most of my recordings were the typical AA speaker, from the podium telling their life stories. So in order to get an hours recording down to a "clear enough" sound quality, and to be able to fit as many files into the memory, one had to become quite creative... Now even basic phones come with slots for Terror Bite micro-SD cards to store audio files on, and so the necessity to be rather desperate space is gone, the fundamental issue, of if an document saved as an audio file can be saved as a 30 meg file instead of a 300 meg file, arises because the 300 meg file, while technically better in sound quality, it's not that much better - in order to justify the file size, the processing time, and the overheads to save files in very high fidelity audio... The lower spec MP3 files they are good enough for 98% of most peoples work. But there might be people who have the processors, and the time and the need for almost flawless MP3's and other formats, so cutting them out just because I am a cheapskate is not a good idea... but I processing time and file sizes, if a good enough audio is available from 44 Khz sampling - that is fine... where as 66 Khz, 96 Khz, 128 Khz, 256 Khz and 516 Khz - offers me no tangible benefit...

from dsnote.

Me2U2 avatar Me2U2 commented on May 9, 2024

This is what I mean by acceptably shit audio quality - it's HIGHLY compressed, the audio is a little bit tinny and a little bit hissy, but the file size is small and it's clear enough to listen too.

https://www.recoveryaudio.org/aa-speaker-tapes/scott-gallagher-all-addictions-anonymous-founder

Going much below this in audio quality and a lower sampling rate and higher compression - from the original, it started to go from "acceptably shit and understandable" to "kind of really shit and hard to understand".

Where as making it heaps and heaps better, doesn't make it THAT much better...

But when reading out the long documents, so I can listen to them on long drives, or when resting or doing housework etc.. a nicer quality voice and a little bit better is a good thing...

I mean I used to convert microsoft documents into plain text, and then convert them into the robot voice, at the very beginning..
So anything above the most basic robot voice is an improvement - the question then becomes how much of an improvement is really necessary.

The other thing is to convert and have control of the audio level...

Now this is sort of kind of necessary... I have a lovely shit box work type car that has almost no sound insulation in the cabin...

So when I get recordings or zoom meetings where the speaker is very quiet, it's not hard to get my phone and the (protect your hearing and limited amplification) ear plugs drowned out by the car noises...

A preset of say 80% to 90% of the way towards clipping for everything could be a viable option.

Then all audio would be nice and strong, and not run out of sound before the amplifier does.

from dsnote.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.