aedocw / epub2tts Goto Github PK

View Code? Open in Web Editor NEW

354.0 9.0 41.0 4.32 MB

Turn an epub or text file into an audiobook

License: Apache License 2.0

Python 96.83% Dockerfile 0.84% Shell 2.32%

audiobooks epub generative-ai tts

epub2tts's People

Contributors

Stargazers

Watchers

epub2tts's Issues

Question: Should I use XTTS-v2?

I find it a bit confusing to decide which model to use. At first I wanted to use XTTS-v2 because the coqui team claims it is their best model. However, the xtts parameter requires samples, presumably for voice cloning. I assume the default model is vits. My question: is it possible to use xtts without voice cloning to get better quality than vits? After listening to the samples provided, I think sample-p307-coquiTTS sounds better than sample-shadow-coquiXTTS.

Keep up the great work! Your efforts are making a difference!

Add a GUI

Though a user would still have to know how install this with pip, it would be nice to add a GUI. Looks like this could be accomplished by adding just a few lines with https://github.com/chriskiehl/Gooey

Don't read links to footnotes

Some books have links to footnotes, with the link text something like "[15]". This is really annoying for books that have many footnotes. When pulling out the text to read, it would be great if any linked text was ignored.

Can not run on GPU

The programs works fine, but from the task manager I only see spikes in CPU usage when it starts.

The closest issue I could find was coqui-ai/TTS#2267 , but they are talking about the training part, while I'm going for simple tts from a base model.

Trying:
import torch torch.cuda.is_available()
before returned false, but that was fixed by reinstalling torch and torchaudio, so they now recognize my GPU. (also checked compatibility with cuda version using nvidia-smi )

Still, even when adding --use_cuda or gpu=True flags to epub2tts.py in the TTS call, I can see no GPU usage. Am I doing something wrong?
I'm no dev and I would not be surprised if I was just adding arguments in the wrong place or something, but this really has me confused as I have been playing around with stable diffusion with no issues.

Include good Windows install instructions

Right now there are no instructions to run this under Windows, other than one line for how to run it under docker.

There should be a section specifically for doing a windows install, with as many helpful links as possible.

If anyone has gone through a windows install and would like to include their steps here, it would be much appreciated. I can validate them and update the README. I took a quick first pass last night and was surprised as the level of hassle, but maybe it's easier than I realize and I just did not follow a good path.

Add ability to take plain text file as input

It would be nice if the script took a guess at what the input file type was based on the extension, and was able to handle plain text files. Other than determining where chapter breaks might be so as to cut this up into small enough chunks for Coqui-TTS, this should be pretty easy to implement.

Dimension out of range error

I have a book that ends up throwing an error when trying to encode it.

Here's the error output:

Traceback (most recent call last):
  File "/home/tymme/epub2tts/epub2tts.py", line 72, in <module>
    tts.tts_to_file(text=chapters_to_read[i], speaker='p307', file_path=outputwav)
  File "/home/tymme/epub2tts/lib/python3.10/site-packages/TTS/api.py", line 220, in tts_to_file
    wav = self.tts(text=text, speaker=speaker, language=language, speaker_wav=speaker_wav)
  File "/home/tymme/epub2tts/lib/python3.10/site-packages/TTS/api.py", line 183, in tts
    wav = self.synthesizer.tts(
  File "/home/tymme/epub2tts/lib/python3.10/site-packages/TTS/utils/synthesizer.py", line 276, in tts
    outputs = synthesis(
  File "/home/tymme/epub2tts/lib/python3.10/site-packages/TTS/tts/utils/synthesis.py", line 213, in synthesis
    outputs = run_model_torch(
  File "/home/tymme/epub2tts/lib/python3.10/site-packages/TTS/tts/utils/synthesis.py", line 50, in run_model_torch
    outputs = _func(
  File "/home/tymme/epub2tts/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/tymme/epub2tts/lib/python3.10/site-packages/TTS/tts/models/vits.py", line 1147, in inference
    attn = generate_path(w_ceil.squeeze(1), attn_mask.squeeze(1).transpose(1, 2))
IndexError: Dimension out of range (expected to be in range of [-2, 1], but got 2)

Some (most?) epubs skip reading chapter number and title

Seems like many books skip over reading the chapter number and title of chapter. This is probably a heading/section that is being ignored because of how chapters are being identified.

Punctuation not cleaned up for text files

When reading an epub file, troublesome punctuation is properly removed. This does not happen when reading a text file. This should be fixed because some punctuation can cause random issues with text-to-speech (smart-quotes, semicolons, etc).

Add ability to read copy from a URL

Adding something like this would be nice, treat it as text, just from a URL rather than local file:

(add newspaper3k to requirements.txt)

import requests
from bs4 import BeautifulSoup
from newspaper import Article

def get_main_body(url):
    article = Article(url)
    article.download()
    article.parse()
    return article.text

url = "https://<url>"
text = get_main_body(url)

Make this pip installable

This project really should follow standard python conventions and be pip installable. Make it so!

instructions don't cd back to project root

instructions say to cd into venv dir to source bin/activate, but that means that user isn't at project root when they try to do the pip3 install, and thus requirements.txt isn't found

failing on long texts, sample tested was 430000 words

Hi again,
I'm reporting what I believe to be an edge case, although epub2tts seems to fail when writing longer texts, I tested a sample that was 437000 words.
The wav files are written fine, and silences also seem to be removed.
However, when ffmpeg is called it fails with the following error:
I'm also left with an m4a file of 0 bytes.
Low priority on this one of course, although I wanted to flag it up so you know.
If there's any more detail that might help let me know.

Traceback (most recent call last):
File "", line 198, in _run_module_as_main
File "", line 88, in run_code
File "C:\Users\daniel\Documents\epub2tts.venvgpu\Scripts\epub2tts.exe_main.py", line 7, in
File "C:\Users\daniel\Documents\epub2tts.venvgpu\Lib\site-packages\epub2tts.py", line 382, in main
mybook.read_book(voice_samples=args.xtts, engine=args.engine, openai=args.openai, model_name=args.model, speaker=args.speaker, bitrate=args.bitrate)
File "C:\Users\daniel\Documents\epub2tts.venvgpu\Lib\site-packages\epub2tts.py", line 334, in read_book
concatenated.export(outputm4a, format="ipod", bitrate=bitrate)
File "C:\Users\daniel\Documents\epub2tts.venvgpu\Lib\site-packages\pydub\audio_segment.py", line 895, in export
wave_data.writeframesraw(pcm_for_wav)
File "C:\Users\daniel\AppData\Local\Programs\Python\Python311\Lib\wave.py", line 547, in writeframesraw
self._ensure_header_written(len(data))
File "C:\Users\daniel\AppData\Local\Programs\Python\Python311\Lib\wave.py", line 588, in _ensure_header_written
self._write_header(datasize)
File "C:\Users\daniel\AppData\Local\Programs\Python\Python311\Lib\wave.py", line 600, in _write_header
self._file.write(struct.pack('<L4s4sLHHLLHH4s',
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
struct.error: argument out of range
Exception ignored in: <function Wave_write.del at 0x00000137269E8180>
Traceback (most recent call last):
File "C:\Users\daniel\AppData\Local\Programs\Python\Python311\Lib\wave.py", line 447, in del
self.close()
File "C:\Users\daniel\AppData\Local\Programs\Python\Python311\Lib\wave.py", line 565, in close
self._ensure_header_written(0)
File "C:\Users\daniel\AppData\Local\Programs\Python\Python311\Lib\wave.py", line 588, in _ensure_header_written
self._write_header(datasize)
File "C:\Users\daniel\AppData\Local\Programs\Python\Python311\Lib\wave.py", line 600, in _write_header
self._file.write(struct.pack('<L4s4sLHHLLHH4s',
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
struct.error: argument out of range

mp3 duration usually wrong on ios

When adding output mp3s to Mac OS "Books" application, the audio duration is correct, but rarely shows the full duration on ios devices (ex 8 hour mp3 will show 3:55 for instance on iPhone). The files continue to play past the duration, but if you skip ahead or back, it resets to whatever it thinks the end is.

Based on https://superuser.com/questions/607703/wrong-audio-duration-with-ffmpeg it looks like adding the option "write_xing 0" to the command pydub calls should fix this.

Speech cut off mid sentence and punctuation error

Not sure if I raised this issue here before.

Issue 1: When using the epub2tts script with the default settings, I noticed many sentences were cut off in the middle, and the speech got muddled and skipped to the next sentence. This happens across models.

For example this text from the epub
_“I’m sorry we’re late,” Captain Malloy said to Yousif’s father.

“You’re welcome any time,” the doctor answered, shaking his hand.

“The District Commissioner planned to be here,” Malloy explained. “But at the last minute something came up and he couldn’t make it. He asked me to convey to you his regrets and his congratulations. Mabrook.”

“Thank you,” the doctor said.

“The house is truly magnificent.”

“You’re very kind.”_

The audio :

OTOH_error.-.Copy.mp4

Issue 2
There are punctuation errors where the letters after an apostrophe ' are also vocalized. Eg; They're is spoken as They Re

Save book in m4b format with chapters

Professionally made audio books usually include chapters rather than just being one very long mp3. If the files are saved as m4a, by adding a metadata file that includes when each chapter starts and ends the book can be saved in m4b format with chapters. That would be nice.

loading finetuned xtts v2 model

Hi again,
This may very well be something I'm doing wrong, however I"m trying to load a finetuned xtts v2 model.
I've placed the config, model and vocab files in a directory, for this example we'll refer to as modelname in the root where all of the other models are stored.
However, when I call epub2tts --xtts --model=modelname sample.txt I receive the following error:
Traceback (most recent call last):
File "", line 198, in _run_module_as_main
File "", line 88, in run_code
File "C:\Users\daniel\Documents\epub2tts.venvgpu\Scripts\epub2tts.exe_main.py", line 7, in
File "C:\Users\daniel\Documents\epub2tts.venvgpu\Lib\site-packages\epub2tts.py", line 385, in main
mybook.read_book(voice_samples=args.xtts, engine=args.engine, openai=args.openai, model_name=args.model, speaker=args.speaker, bitrate=args.bitrate)
File "C:\Users\daniel\Documents\epub2tts.venvgpu\Lib\site-packages\epub2tts.py", line 259, in read_book
self.tts = TTS(model_name).to(self.device)
^^^^^^^^^^^^^^^
File "C:\Users\daniel\Documents\epub2tts.venvgpu\Lib\site-packages\TTS\api.py", line 85, in init
self.load_model_by_name(model_name, gpu)
File "C:\Users\daniel\Documents\epub2tts.venvgpu\Lib\site-packages\TTS\api.py", line 166, in load_model_by_name
self.load_tts_model_by_name(model_name, gpu)
File "C:\Users\daniel\Documents\epub2tts.venvgpu\Lib\site-packages\TTS\api.py", line 195, in load_tts_model_by_name
model_path, config_path, vocoder_path, vocoder_config_path, model_dir = self.download_model_by_name(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\daniel\Documents\epub2tts.venvgpu\Lib\site-packages\TTS\api.py", line 149, in download_model_by_name
model_path, config_path, model_item = self.manager.download_model(model_name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\daniel\Documents\epub2tts.venvgpu\Lib\site-packages\TTS\utils\manage.py", line 407, in download_model
model_item, model_full_name, model, md5sum = self._set_model_item(model_name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\daniel\Documents\epub2tts.venvgpu\Lib\site-packages\TTS\utils\manage.py", line 322, in _set_model_item
model_type, lang, dataset, model = model_name.split("/")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: not enough values to unpack (expected 4, got 1)
Is there something I'm doing wrong here?
Thanks in advance.

NameError: name 'ensure_punkt' is not defined

Hi again,
Getting the following error after pulling the most recent changes.
As such, I've reverted to the previous commit for now.
Is there something special I have to do, as punkt is already downloaded in my case?

Traceback (most recent call last):
File "C:\Users\daniel\Documents\epub2tts\epub2tts.py", line 419, in
main()
File "C:\Users\daniel\Documents\epub2tts\epub2tts.py", line 408, in main
mybook = EpubToAudiobook(source=args.sourcefile, start=args.start, end=args.end, skiplinks=args.skiplinks, engine=args.engine, minratio=args.minratio, model_name=args.model, debug=args.debug)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\daniel\Documents\epub2tts\epub2tts.py", line 67, in init
ensure_punkt()
^^^^^^^^^^^^
NameError: name 'ensure_punkt' is not defined

chapter markers become offset after removing silence

Back again with another issue I'm afraid.
Although this is quite minor in the grand scheme of things, I've noticed that the chapter markers in m4b audiobooks become more offset throughout the book.
It seems to me as though they are ahead in time, so this makes me suspect that the chapter breaks need to be re-calculated after silence is removed as the initial flacs are fine.
Let me know if there's any info I can provide on this.

With XTTS, sometimes the audio makes no sense

Sometimes when using XTTS, one sentence group/chunk will sound like nonsense. I can't reproduce it at will, but it has come up a few times in one of the first long books I created using XTTS.

Run in docker?

This would be much easier for folks to use if it was in docker and did not require any installation.

Unrecognized option 'p335.m4b'

Getting this error on the final concatenation of the m4a into m4b:

 > Processing time: 107.58983039855957
 > Real-time factor: 0.06694619643155328
100.00% spoken so far.
Elapsed: 35 minutes, ETA: 0 minutes
Bitrate: 69k
ffmpeg version 5.1.3-1 Copyright (c) 2000-2022 the FFmpeg developers
  built with gcc 12 (Debian 12.2.0-14)
  configuration: --prefix=/usr --extra-version=1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libglslang --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librabbitmq --enable-librist --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libsvtav1 --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzimg --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --disable-sndio --enable-libjxl --enable-pocketsphinx --enable-librsvg --enable-libmfx --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libx264 --enable-libplacebo --enable-librav1e --enable-shared
  libavutil      57. 28.100 / 57. 28.100
  libavcodec     59. 37.100 / 59. 37.100
  libavformat    59. 27.100 / 59. 27.100
  libavdevice    59.  7.100 / 59.  7.100
  libavfilter     8. 44.100 /  8. 44.100
  libswscale      6.  7.100 /  6.  7.100
  libswresample   4.  7.100 /  4.  7.100
  libpostproc    56.  6.100 / 56.  6.100
Unrecognized option 'p335.m4b'.
Error splitting the argument list: Option not found

it deleted everything when it crashed, but I believe the .m4a file was named -p335.m4a (with the dash) and nothing else. Could that be the problem? Unfortunately my terminal doesn't spit out what the command it was trying to run in ffmpeg.

Looking at the code, it looks like maybe the book's name didn't parse? I used a shell script that did epub2tts ./books/*.epub, could that be the reason? Looks like you're pulling bookname from args?

If I can't *.epub the files in bash, do you have any recommendations for batch processing a folder of .epubs with minimal overhead?

I'm thinking this is a nothingburger caused by the wildcard. It's late so I'll try again tomorrow morning without the asterisk.

I am running Debian 12 bookworm, using a GPU.

Crash if sentence is too long

Some books seem to cause the sentence segmenter TTS uses (https://pypi.org/project/pysbd/) to fail to actually detect the end of the sentence. I can't tell what is causing this, but then the resulting sentence is very long as it's made up of multiple sentences strung together. That in turn causes epub2tts to get killed due to exceeding available memory.

Because this is external to epub2tts (and even external to coqui TTS) I don't think there's much I can do about it. Logging the bug here though in case anyone else runs into this. Best way to tell is to look at the output where it says "> Text splitted to sentences." followed by the list containing the sentences to be read. If you look through that you'll probably find one sentence that is enormously long. As of right now that just means this is a book that can't be turned into an m4b.

Add "resume" functionality

Sometimes coqui-tts crashes. It would be nice to check to see if the intermediate wave files exist, and start back up from where we left off.

Make an epub2tts service with an API

Make this so it can run as a service with an API, to allow for a future enhancement that adds a web UI so the service and UI can be run easily under docker or docker-compose.

Proposed API endpoints:
/upload/ - for uploading an epub file
/list/ - list available epub files
/scan/ - print first few lines of each chapter
/read/ - create m4b file
/download/ - for downloading completed m4b files
/delete/ - for deleting epub or m4b files

BUG: LookupError: Resource punkt not found

After following MAC OS installation guide

#install dependencies
brew install espeak pyenv ffmpeg mecab
#install epub2tts
git clone https://github.com/aedocw/epub2tts
cd epub2tts
pyenv install 3.11
pyenv local 3.11
#OPTIONAL - install this in a virtual environment
python -m venv .venv && source .venv/bin/activate
pip install .

and running epub2tts my-book.txt I got

LookupError: 
**********************************************************************
  Resource punkt not found.
  Please use the NLTK Downloader to obtain the resource:

  >>> import nltk
  >>> nltk.download('punkt')

After running it in the shell I successfully generated my audiobook. I think this step should be added to the setup script.

P.S. Thank you very much for a really great project 🙂!

Extract epub metadata

It would be useful to extract epub metadata and add these as tags to the generated audio file.

Also, it would be nice to be able to specify bitrate of the audio file.

Add <pause> between paragraphs

Coqui-tts did not seem to have a way to add any pauses or adjust timing between sentences, but they might add that feature in the future. Tortois-TTS does have this ability by adding things like [pause] or [laugh]. Assuming there will be some option for a TTS engine to accept instruction for a brief pause, we will need to detect paragraph changes and have the ability to insert some text at that point.

Add XTTS support option

Now that Coqui has opened use of XTTS for non-commercial use, add support for using that instead of VITS.

Something like including:
tts = TTS("tts_models/multilingual/multi-dataset/xtts_v1", gpu=True)

and:
tts.tts_to_file(text = chapters_to_read[i], file_path = outputwav, speaker_wav="sample.wav", language="en")

Note this requires speaker sample file to be included.

Read url text when not a hyperlink or footnote

Some of my books have hyperlinks that have important text as a pointer to the url. For example:

‘’’
There are many features that could lead to new research opportunities in the field.
‘’’

I believe the current skip-links flag would obliterate the contextual text and just read the sentence as “There are many features.”

Is there a way we could skip the reading only if the hyperlink starts with “http” or a number (for footnotes)? The system works well for footnotes and plainjane urls but kills contextual urls.

Thank you for the work on this repo, it is very exciting to have such a quality voice tts for free.

xtts v2: sentanses exceed length of 400 tokens

Hello,
I'm currently having an issue when using xtts v2.
Although it works fine normally, if there is a very long paragraph that exceeds the 400 tokens as judged by the model it crashes and refuses to continue.
Not sure how doable this is, but is there any possibility of adding an ability to split paragraphs further to not exceed this length, or is manually editing the files my best bet?
Thanks for this project none the less, as it's proving extremely useful.

xtts mode doesn't download model?

epub2tts cosa.txt --xtts voz1/voz1-1.wav,voz1/voz1-2.wav,voz1/voz1-3.wav

Returns:

Namespace(sourcefile='cosa.txt', engine='tts', xtts='voz1/voz1-1.wav,voz1/voz1-2.wav,voz1/voz1-3.wav', openai='zzz', model='tts_models/en/vctk/vits', speaker='p335', scan=False, start=1, end=999, minratio=88, skiplinks=False, bitrate='69k', debug=False)

--- my Spanish text --

Saving to cosa-voz.m4b
Total characters: 214
Loading model: /home/ubuntu/.local/share/tts/tts_models--multilingual--multi-dataset--xtts_v2
Traceback (most recent call last):
File "/home/ubuntu/.local/bin/epub2tts", line 8, in <module>
sys.exit(main())
File "/home/ubuntu/.local/lib/python3.10/site-packages/epub2tts.py", line 371, in main
mybook.read_book(voice_samples=args.xtts, engine=args.engine, openai=args.openai, model_name=args.model, speaker=args.speaker, bitrate=args.bitrate)
File "/home/ubuntu/.local/lib/python3.10/site-packages/epub2tts.py", line 225, in read_book
config.load_json(model_json)
File "/home/ubuntu/.local/lib/python3.10/site-packages/coqpit/coqpit.py", line 726, in load_json
with open(file_name, "r", encoding="utf8") as f:
FileNotFoundError: [Errno 2] No such file or directory: '/home/ubuntu/.local/share/tts/tts_models--multilingual--multi-dataset--xtts_v2/config.json'

and the folder tts_models--multilingual--multi-dataset--xtts_v2/ doesn't exists

Then I did epub2tts --bitrate 128k --model tts_models/multilingual/multi-dataset/xtts_v2 cosa.txt and agreed to the terms and conditions of the license. Just to download the model, then break the operation, re-run the inital command and it worked.

how to launch program

when i type

epub2tts new 1.txt --xtts sample-1.wav

into powershell i get

The term 'epub2tts' is not recognized as the name of a cmdlet, function, script file, or operable program.

please help, thank you

Address BeautifulSoup deprecation warning

The following depeication warning is thrown each time this is run:

DeprecationWarning: The 'text' argument to find()-type methods is deprecated. Use 'string' instead.
  text = soup.find_all(text=True)

Skip chapter footnotes

I have a book with footnotes at the end of each chapter, rather than at the end of the book. It's really annoying since they're read out of context, and in this particular book lots of them have written out hyperlinks so there is lots of reading of "colon slash slash double-u double-u double-u".
There is probably a way to detect this "footnotes" section at the end of each chapter and stop reading. This would be a great option to add.

enhancement: add StyleTTS2 support

You may very well be aware of this already, although there is a rather recent project called StyleTTS2 which raises the bar even further for open-source and local tts generation.
No pressure of course, although it would be great to have this integrated at some point in the future.
I've tested the demo on a cpu and it runs fairly quickly.
As of now there's an http api and also python integration at this repo.
https://github.com/NeuralVox/StyleTTS2

using newly murged xtts v2 speakers

Opening a new issue to keep things tidy.
Is there a way currently to use the new speakers that were recently released in coqui tts v0.22.0?
I've tried passing the --speaker flag although it currently doesn't seem to accept spaces even when quoting properly, at least it's giving me a syntax error.
Completely fine if not, however I assume the initial framework is already in place for this.
Thanks in advance.

Postprocessing and better inference

Some post-processing of of the audio would be nice, experiment with https://github.com/spotify/pedalboard.

Also try inference_stream https://discord.com/channels/1037326658807533628/1062887209352581151/1177002620339114046

after recent murge, unable to use gpu for xtts

Hi again,
i'm currently using a 4 GB graphics card, and find that after the most recent murge I'm unable to use xtts v2 with my gpu as before.
This very much may be by design particularly for folks who don't have a gpu capable of running the model, although I see on the coqui discord that the model is capable of running on cards with 4 GB and it has performed fine on mine.
I see that currently torch.cuda.get_device_properties(0).total_memory > 7500000000 is defined, and am just wondering if this could please be lowered to 4000000000 or similar, as if my math is right currently the minimum listed is 7.5 GB approximately.
Thanks, and do of course let me know if my thinking is wrong here.

EPUB3 Support

Any chance you are able to add EPUB3 support, What I mean by that is instead of a m4b file, it would still be epub with a multimedia layer. This would mean you can read the book normally, with tts and highlighted words, as well just it like an audiobook.

Its really fantastic for people with dyslexia, or just those who like immersive reading like Amazon's whispersync

Here is an article that explain SMIL and the EPUB3.3 standard
https://kb.daisy.org/publishing/docs/sync-media/overlays.html

Add a little time between chapters

Every chapter/part just bleeds together and it's often not obvious a new chapter has started. Ensuring a 3 second gap would be nice.

Other languages?

Hello. Just discovered this.

Is there a way to set language? Maybe changing speaker?

I'd like to read Spanish epub.

Add windows instructions

Docker instructions are specific to linux/macOS. They should be updated to indicate that, and also include instructions for running on windows with docker desktop.

Docker doesn't work well

I think it still works via docker as noted in the README, but using this through docker is clunky and barely works. You can't use things like "--scan" or specify which chapters to start and end on. It should operate more or less as a command-line replacement where you can alias epub2tts to something like 'docker run -it --rm -u $(id -u):$(id -g) -v "$(pwd)":/mnt ghcr.io/aedocw/epub2tts:latest'. Or maybe exactly that.

Add OpenAI TTS option

The TTS from OpenAI is pretty good even though it is not free, adding an option to use that instead of Coqui-TTS would be nice.

Reference: https://platform.openai.com/docs/guides/text-to-speech

Note: character limit is 4096 per request, so will have to break down number of sentences to send at a time.

Add an option to use tortoise-tts

Seems like the output from tortoise-tts (https://github.com/neonbjb/tortoise-tts) is better (much better?), would be nice to have that as an option. May need to be optional as it could require Nvidia GPU (definitely required for training, might not be required for just TTS).

Guide with Mac and python 3.

Thanks for your greak work!

It really helps.

But I am wondering how to use it on Mac with python 3.12.0. and pip 3?

Can you provide an example?

There is what I tried:

Follow the steps at: https://github.com/aedocw/epub2tts#mac-installation
But I used the pip3 to install, rather than pip because I don't have pip.
and it shows me an error:

Thanks!!

xtts: running out of vram after recent change (#80)

Hi there,
After murging pull request #80, I notice that I am now running out of vram on my modest 4 GB graphics card.
For the moment I've solved the problem on my end by commenting tts = TTS("tts_models/multilingual/multi-dataset/xtts_v2").to(self.device)
I'm just getting started in the AI space, although I assume that the model is effectively getting loaded into vram twice and not unloaded the first time.
I'm not sure if there is a way to easily solve this, although I just thought I would let you know.
Thanks for all of your work on this.

punctuation: ’ character appears to not be processed with vits model

Hello again,
I've just been listening to some of the output I've processed, and have noticed that with the vits model words like "I'd" or "I'm" aren't getting processed correctly.
I now see that instead of the traditional apostrophe (') another variation (’) is being used.
There is a similar patern I've observed for quotation marks, although I believe those are stripped regardless.
I'm not sure if there is a way to normalize these characters, or if the ' and ’ are stripped if they cause issues for the models to process as well.
This happens on both Windows and Linux.

Specifying --start 1 with --xtts throws an index error

epub2tts mybook.epub --start 1 --end 3 --xtts speaker-1.wav --model adam
[...]
  File "/home/doc/repos/epub2tts/epub2tts.py", line 354, in read_book
    position += len(self.chapters_to_read[i])
IndexError: list index out of range

This did not happen when I did the same thing without the xtts options.

aedocw / epub2tts Goto Github PK

epub2tts's People

Contributors

Stargazers

Watchers

Forkers

epub2tts's Issues

Recommend Projects

Recommend Topics

Recommend Org