rany2 / edge-srt-to-speech Goto Github PK

View Code? Open in Web Editor NEW

42.0 42.0 10.0 55 KB

Convert SubRip to speech using Microsoft Edge's TTS service

Home Page: https://pypi.org/project/edge-srt-to-speech/

License: GNU General Public License v3.0

Shell 2.80% Python 97.20%

edge-srt-to-speech's People

Contributors

Stargazers

Watchers

Forkers

chengjingfeng bk111 ganjunhong bobwong89757 jsliugang huamengxing virtualrobotix oldsolo serjik777 pywarnfixerrepos

edge-srt-to-speech's Issues

[Feature Request] Progress bar shows progress

Show progress bar when debug mode is disabled

Unspoken lines, leaving an empty stretch

He still doesn't say a few words, leaving that sound space empty.

I sent you the print to take a look, where the speech was not reproduced, which was absent from the spoken sentence.
I can't send you the audio and .srt to analyze.

oerro.mp4

skipping a few sentences

When I'm watching the clip with the subtitles, I see that some phrases were not spoken, it seems that some words are skipped.
Is there any command that you can put the second voice behind, to be able to speak all the lines of the subtitle?

ailed to read frame size: Could not seek to 1026

It stopped working for me, now I'm displaying this error in all the .srt files I use
[mp3 @ 0x57ff4b431c80] Failed to read frame size: Could not seek to 1026.
/tmp/tmplxc7l66i/15.mp3: Invalid argument
Traceback (most recent call last):
File "/content/drive/MyDrive/edge-srt-to-speech/src/edge_srt_to_speech/main.py", line 346, in
main()
File "/content/drive/MyDrive/edge-srt-to-speech/src/edge_srt_to_speech/main.py", line 332, in main
asyncio.get_event_loop().run_until_complete(
File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
return future.result()
File "/content/drive/MyDrive/edge-srt-to-speech/src/edge_srt_to_speech/main.py", line 251, in _main
last_end += get_duration(j)
File "/content/drive/MyDrive/edge-srt-to-speech/src/edge_srt_to_speech/main.py", line 39, in get_duration
duration = subprocess.check_output(
File "/usr/lib/python3.10/subprocess.py", line 421, in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
File "/usr/lib/python3.10/subprocess.py", line 526, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ffprobe', '-v', 'error', '-show_entries', 'format=duration', '-of', 'default=noprint_wrappers=1:nokey=1', '/tmp/tmplxc7l66i/15.mp3']' returned non-zero exit status 1.

cannot generate audio

There seems to be some kind of problem when generating audio resulting with error:
TypeError: Communicate.run() got an unexpected keyword argument 'customspeak'

And it seems that commenting the argument line customspeak=bool(ssml_template), in audio_gen(queue, ssml_template, ssml_variables) solves it.

My approach here was quite simple - I just installed edge-srt-to-speech with pip. And run edge-srt-to-speech --enable-debug --voice pl-PL-MarekNeural --disable-enhanced-srt W.srt W.mp3.

Sorry for this lousy insight I am not proficient with Python.

Error when not using SSML template

edge-srt-to-speech .\example.srt test.mp3 --voice zh-CN-YunyangNeural --default-speed +1% --default-pitch +1Hz --default-volume +1%

Generating audio:   0%|                                 | 0/591 [00:00<?, ?it/s]Traceback (most recent call last):
  File "d:\develop\python\python38\lib\site-packages\edge_srt_to_speech\__main__.py", line 256, in _main
    await f
  File "d:\develop\python\python38\lib\asyncio\tasks.py", line 619, in _wait_for_one
    return f.result()  # May raise f.exception().
  File "d:\develop\python\python38\lib\site-packages\edge_srt_to_speech\__main__.py", line 146, in audio_gen
    arg, text = get_enhanced_srt_params(text, arg, ssml_variables)
  File "d:\develop\python\python38\lib\site-packages\edge_srt_to_speech\__main__.py", line 126, in get_enhanced_srt_params
    raise ValueError("edge_tts{} is invalid")
ValueError: edge_tts{} is invalid

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "d:\develop\python\python38\lib\shutil.py", line 616, in _rmtree_unsafe
    os.unlink(fullname)
PermissionError: [WinError 32] Another program is using this file and the process cannot access it.: 'C:\\Users\\tuike\\AppData\\Local\\Temp\\tmpx18h1lws\\0.mp3'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "d:\develop\python\python38\lib\tempfile.py", line 802, in onerror
    _os.unlink(path)
PermissionError: [WinError 32] Another program is using this file and the process cannot access it.: 'C:\\Users\\tuike\\AppData\\Local\\Temp\\tmpx18h1lws\\0.mp3'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "d:\develop\python\python38\lib\runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "d:\develop\python\python38\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "D:\Develop\Python\Python38\Scripts\edge-srt-to-speech.exe\__main__.py", line 7, in <module>
  File "d:\develop\python\python38\lib\site-packages\edge_srt_to_speech\__main__.py", line 367, in main
    asyncio.get_event_loop().run_until_complete(
  File "d:\develop\python\python38\lib\asyncio\base_events.py", line 616, in run_until_complete
    return future.result()
  File "d:\develop\python\python38\lib\site-packages\edge_srt_to_speech\__main__.py", line 317, in _main
    pdbar.close()
  File "d:\develop\python\python38\lib\tempfile.py", line 827, in __exit__
    self.cleanup()
  File "d:\develop\python\python38\lib\tempfile.py", line 831, in cleanup
    self._rmtree(self.name)
  File "d:\develop\python\python38\lib\tempfile.py", line 813, in _rmtree
    _shutil.rmtree(name, onerror=onerror)
  File "d:\develop\python\python38\lib\shutil.py", line 740, in rmtree
    return _rmtree_unsafe(path, onerror)
  File "d:\develop\python\python38\lib\shutil.py", line 618, in _rmtree_unsafe
    onerror(os.unlink, fullname, sys.exc_info())
  File "d:\develop\python\python38\lib\tempfile.py", line 805, in onerror
    cls._rmtree(path)
  File "d:\develop\python\python38\lib\tempfile.py", line 813, in _rmtree
    _shutil.rmtree(name, onerror=onerror)
  File "d:\develop\python\python38\lib\shutil.py", line 740, in rmtree
    return _rmtree_unsafe(path, onerror)
  File "d:\develop\python\python38\lib\shutil.py", line 599, in _rmtree_unsafe
    onerror(os.scandir, path, sys.exc_info())
  File "d:\develop\python\python38\lib\shutil.py", line 596, in _rmtree_unsafe
    with os.scandir(path) as scandir_it:
NotADirectoryError: [WinError 267] The directory name is invalid.: 'C:\\Users\\tuike\\AppData\\Local\\Temp\\tmpx18h1lws\\0.mp3'
Generating audio:   0%|                                 | 0/591 [00:00<?, ?it/s]

[Feature Request] GUI Interface

Functions similar to: https://azure.microsoft.com/en-us/services/cognitive-services/text-to-speech/#features
I made a simple sketch with this page and included another paragraph of reference data

roleList = ['Default', 'YoungAdultFemale', 'YoungAdultMale', 'OlderAdultFemale', 'OlderAdultMale', 'SeniorFemale', 'SeniorMale', 'Girl', 'Boy']
voices = {
    'zh-CN': {
        'zh-CN-YunyeNeural': {
            'name': 'Yunye',
            'style': ['General', 'angry', 'calm', 'cheerful', 'disgruntled', 'embarrassed', 'fearful', 'sad', 'serious'],
            'role': roleList,
            'style_support': 'Optimized for story narrating, multiple role-play and styles available using SSML'},
        'zh-CN-XiaoxiaoNeural': {
            'name': 'Xiaoxiao',
            'style': ['General', 'affectionate', 'angry', 'assistant', 'calm', 'chat', 'cheerful', 'customerservice', 'disgruntled', 'fearful', 'gentle', 'lyrical', 'newscast', 'sad', 'serious'],
            'role': None,
            'style_support': 'General, multiple voice styles available using SSML'},
        'zh-CN-XiaohanNeural': {
            'name': 'Xiaohan',
            'style': ['General', 'affectionate', 'angry', 'calm', 'cheerful', 'disgruntled', 'embarrassed', 'fearful', 'gentle', 'sad', 'serious'],
            'role': None,
            'style_support': 'General, multiple styles available using SSML'},
        'zh-CN-XiaoruiNeural': {
            'name': 'Xiaorui',
            'style': ['General', 'angry', 'calm', 'fearful', 'sad'],
            'role': None,
            'style_support': 'Senior voice, multiple styles available using SSML'},
        'zh-CN-XiaomoNeural': {
            'name': 'Xiaomo',
            'style': ['General', 'affectionate', 'angry', 'calm', 'cheerful', 'depressed', 'disgruntled', 'embarrassed', 'envious', 'fearful', 'gentle', 'sad', 'serious'],
            'role': roleList,
            'style_support': 'General, multiple role-play and styles available using SSML'},
        'zh-CN-XiaochenNeural': {
            'name': 'Xiaochen',
            'style': None,
            'role': None,
            'style_support': 'Optimized for spontaneous conversation'},
        'zh-CN-XiaoqiuNeural': {
            'name': 'Xiaoqiu',
            'style': None,
            'role': None,
            'style_support': 'Optimized for narrating'}
    },

    'en-US': {
        'en-US-GuyNeural': {
            'name': 'Guy',
            'style': ['General', 'newscast'],
            'role': None,
            'style_support': 'General, multiple voice styles available using SSML'},
        'en-US-AriaNeural': {
            'name': 'Aria',
            'style': ['General', 'chat', 'cheerful', 'customerservice', 'empathetic', 'narration-professional', 'newscast-casual', 'newscast-formal'],
            'role': None, 'style_support': 'General, multiple voice styles available using SSML'},
        'en-US-JennyNeural': {
            'name': 'Jenny',
            'style': ['General', 'assistant', 'chat', 'customerservice', 'newscast'],
            'role': None,
            'style_support': 'General, multiple voice styles available using SSML'},
        'en-US-SaraNeural': {
            'name': 'Sara',
            'style': ['General', 'angry', 'cheerful', 'sad'],
            'role': None,
            'style_support': 'General, multiple voice styles available using SSML'}
    }
}

# role、style Data from：https://azure.microsoft.com/en-us/services/cognitive-services/text-to-speech/#features
# Style_support Data from：https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/language-support#text-to-speech

Supports accepting command line arguments to start： srt_file ,out_file, voice.

A SSML template needs to be built in, e.g.

<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis"
       xmlns:mstts="https://www.w3.org/2001/mstts" xml:lang="{lang}">
    <voice name="{voice}">
        <mstts:express-as role="{role}" style="{style}" styledegree="{styledegree}">
        <prosody pitch="{pitch}" rate="{rate}" volume="{volume}">
            {text}
        </prosody>
        </mstts:express-as>
    </voice>
</speak>

How to change the voice?

I liked your project, I need it for my studies to transform srt into audio, but I can't make it work. Could you give me instructions on how to make it work?

How to solve the directory error problem

After run the command:

edge-srt-to-speech --voice zh-CN-XiaoxiaoNeural test.srt test.mp3

I got an error as follow:

`
During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "D:\Python\Python310\lib\runpy.py", line 196, in run_module_as_main
return run_code(code, main_globals, None,
File "D:\Python\Python310\lib\runpy.py", line 86, in run_code
exec(code, run_globals)
File "D:\Python\Python310\Scripts\edge-srt-to-speech.exe_main.py", line 7, in
File "D:\Python\Python310\lib\site-packages\edge_srt_to_speech_main.py", line 323, in main
asyncio.get_event_loop().run_until_complete(
File "D:\Python\Python310\lib\asyncio\base_events.py", line 649, in run_until_complete
return future.result()
File "D:\Python\Python310\lib\site-packages\edge_srt_to_speech_main.py", line 190, in _main
with tempfile.TemporaryDirectory() as temp_dir:
File "D:\Python\Python310\lib\tempfile.py", line 869, in exit
self.cleanup()
File "D:\Python\Python310\lib\tempfile.py", line 873, in cleanup
self._rmtree(self.name, ignore_errors=self._ignore_cleanup_errors)
File "D:\Python\Python310\lib\tempfile.py", line 855, in _rmtree
_shutil.rmtree(name, onerror=onerror)
File "D:\Python\Python310\lib\shutil.py", line 750, in rmtree
return _rmtree_unsafe(path, onerror)
File "D:\Python\Python310\lib\shutil.py", line 620, in _rmtree_unsafe
onerror(os.unlink, fullname, sys.exc_info())
File "D:\Python\Python310\lib\tempfile.py", line 846, in onerror
cls._rmtree(path, ignore_errors=ignore_errors)
File "D:\Python\Python310\lib\tempfile.py", line 855, in _rmtree
_shutil.rmtree(name, onerror=onerror)
File "D:\Python\Python310\lib\shutil.py", line 750, in rmtree
return _rmtree_unsafe(path, onerror)
File "D:\Python\Python310\lib\shutil.py", line 601, in _rmtree_unsafe
onerror(os.scandir, path, sys.exc_info())
File "D:\Python\Python310\lib\shutil.py", line 598, in _rmtree_unsafe
with os.scandir(path) as scandir_it:
NotADirectoryError: [WinError 267] 目录名称无效。: 'C:\Users\suife\AppData\Local\Temp\tmpeph4xjnc\50.mp3'
Generating audio: 94%|███████████████████████████████████████████████████████▌ | 80/85 [00:28<00:01, 2.83it/s]
`

the command line tool show "目录名称无效", it means "Directory name is invalid"， how to solve this problem?

Thank you!

[Feature Request] Support for SSML templates

--ssml_template path/example.xml

<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis"
       xmlns:mstts="https://www.w3.org/2001/mstts" xml:lang="en-US">
    <voice name="en-US-AriaNeural">
        <mstts:express-as style="cheerful">
            {text}
        </mstts:express-as>
    </voice>
</speak>

where "{text}" is replaced by the current sentence.

--ssml_elements "voice:en-US-SaraNeural,speed:+1%,style:cheerful"
This format, potentially, is simpler to write for the user, but because it needs to be first written into the SSML template, it requires more work on your part to do so.

A single sentence, in addition to being able to support SSML templates, then add support for using full SSML files instead of SSML templates, thus enabling finer control of the voice read aloud effect.

<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis"
       xmlns:mstts="https://www.w3.org/2001/mstts" xml:lang="en-US">
    <voice name="en-US-AriaNeural">
        <mstts:express-as style="cheerful">
            That'd be just amazing!
        </mstts:express-as>
    </voice>
</speak>

These features can be tricky to implement, so please decide which parts you want to implement based on the level of difficulty and your own time.

When the subtitle line is empty, the program will get stuck and unable to run.

I am testing with automatically generated subtitle files, but sometimes there are empty lines in the subtitles, for example:

253
00:23:46,280 --> 00:23:50,280
If I just plug this in here, you can see there's probably the opposite way around.

254
00:23:51,280 --> 00:23:51,280


255
00:23:51,280 --> 00:23:55,280
It is. But that doesn't matter. We can just change these values around here.

When the program synthesizes speech, it gets stuck and doesn't continue running.

[Feature Request] Easy GUI

I've been able to generate mp3 files in one click with another software, but the parameters are fixed, which isn't ideal.
I was wondering if it would be possible to make a simple GUI for the time being that could still be started by passing in parameters, which could be tweaked in the GUI interface.
I looked at a library that might be easier to implement, but unfortunately I have too little programming knowledge and don't know how to use this library, would it be helpful for you to look at this: https://github.com/chriskiehl/Gooey

Here's a picture attached to show you how I'm using it now. please ignore the Chinese in the menu, that's just the name of a different VOICE.

[Feature Request] Export rate information in JSON format

In actual use, sometimes certain segments are read too quickly, making it difficult to hear clearly. Therefore, it is requested to export the original srt file in JSON format along with the audio file, which includes the actual reading speed for each text segment. With this JSON file, we can achieve better reading effects by re-editing the video or re-editing the srt text.

srt：

1
00:05:00,400 --> 00:05:15,300
If you want to use the edge-tts command, you can simply run it with the following command:

2
00:05:16,400 --> 00:05:25,300
Note the above requires the installation of the mpv command line player.

json：
In the following example snippets, rate represents the actual reading speed.

{
  "subtitles": [
    {
      "id": "1",
      "text": "If you want to use the edge-tts command, you can simply run it with the following command:",
      "start_time": "00:05:00.400",
      "end_time": "00:05:15.300",
      "rate": 1.8
    },
    {
      "id": "2",
      "text": "Note the above requires the installation of the mpv command line player.",
      "start_time": "00:05:16.400",
      "end_time": "00:05:25.300",
      "rate": 1.2
    }
  ]
}

[Feature Request] setting parameters for a sentence in an SRT file

Set the SSML parameters separately for a sentence in the SRT file in the following two formats For reference only.

1
00:00:00,498 --> 00:00:02,827
Here's what I love most about food and diet.

2
00:00:02,827 --> 00:00:06,383
We all eat several times a day,and we're totally in charge
voice:en-US-SaraNeural,speed:+10%,volume:-5%,style:cheerful,styledegree:2,role:OlderAdultMale……

3
00:00:06,383 --> 00:00:09,427
of what goes on our plate and what stays off.

1
00:00:00,498 --> 00:00:02,827
Here's what I love most about food and diet.

2
00:00:02,827 --> 00:00:06,383
{voice:en-US-SaraNeural,speed:+10%,volume:-5%,style:cheerful,styledegree:2,role:OlderAdultMale……}We all eat several times a day,and we're totally in charge

3
00:00:06,383 --> 00:00:09,427
of what goes on our plate and what stays off.