Great project! Is it possible that gtts does not make audio pauses between tokens

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Here is an example: <div class="snippet-clipboard-content notranslate position-rel

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

No pauses if 100 characters limit about gtts HOT 6 CLOSED

Rom888 commented on August 25, 2024

No pauses if 100 characters limit

from gtts.

Comments (6)

pndurette commented on August 25, 2024

Hi @Rom888, that's a though one!
So the upstream API will introduce a break after 100 characters and there's no way to control this. Which is why gTTS tries to pre-emptively split (tokenize) where pauses would naturally occur (e.g. punctuation) to remediate this, which works pretty well most of the time. But if your input is more than 100 characters w/o any break that gTTS' tokenizer could use to split on, there will have a break no matter what.

Edit: So your best bet if you control the input is to introduce punctuation (commas, etc.).

Edit 2: Wondering if I understood your question correctly actually. Do you have an example where this occurs?

from gtts.

Rom888 commented on August 25, 2024

Here is an example:

split-string-1 <split by tokenizer>
split-string-2 <split by tokenizer>
split-string-3 <split by minimizer (because larger than 100 characters)>
split-string-4 <split by tokenizer>

If I understand correctly, gTTS gets all the split strings, makes audio, and then joins all audio fragments into one and adds pauses between those audio fragments.

Is it possible to not add pauses between audio fragments 3 and 4 when joining?

from gtts.

pndurette commented on August 25, 2024

@Rom888 Sorry for the delay—

So what you said is almost correct. gTTS splits the strings (where the speech would typically pause), then generate that audio, and puts the audio bits together. It doesn't add any breaks in the audio because it doesn't have to. It's only the natural break happening between the end of an audio phrase and the next.

So to answer your question, it's not something we can easily control other than by changing the text that is sent, i.e. with some punctuation, to make it sound at least more natural.

from gtts.

Rom888 commented on August 25, 2024

Okay, do you think we can add an option to gtts-cli, for example:
--cut-if-minimizer=500ms
and cut the end of the audio, if that audio was because of minimizer?
(the audio from split-string-3 in the example above).

from gtts.

pndurette commented on August 25, 2024

Sorry for the delay—
Hmm, that would be pretty hard. Pretty much the same conclusion to what I wrote in #398 (comment). This library has no knowledge of the data it gets (audio, words, timing), it just saves it to a file.

from gtts.

keisanng commented on August 25, 2024

If there's consistent pauses you could do some post-processing with MoviePy or FFmpeg on the generated audio to trim them off.

from gtts.

No pauses if 100 characters limit about gtts HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent