rany2 / edge-srt-to-speech Goto Github PK
View Code? Open in Web Editor NEWConvert SubRip to speech using Microsoft Edge's TTS service
Home Page: https://pypi.org/project/edge-srt-to-speech/
License: GNU General Public License v3.0
Convert SubRip to speech using Microsoft Edge's TTS service
Home Page: https://pypi.org/project/edge-srt-to-speech/
License: GNU General Public License v3.0
Show progress bar when debug mode is disabled
When I'm watching the clip with the subtitles, I see that some phrases were not spoken, it seems that some words are skipped.
Is there any command that you can put the second voice behind, to be able to speak all the lines of the subtitle?
It stopped working for me, now I'm displaying this error in all the .srt files I use
[mp3 @ 0x57ff4b431c80] Failed to read frame size: Could not seek to 1026.
/tmp/tmplxc7l66i/15.mp3: Invalid argument
Traceback (most recent call last):
File "/content/drive/MyDrive/edge-srt-to-speech/src/edge_srt_to_speech/main.py", line 346, in
main()
File "/content/drive/MyDrive/edge-srt-to-speech/src/edge_srt_to_speech/main.py", line 332, in main
asyncio.get_event_loop().run_until_complete(
File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
return future.result()
File "/content/drive/MyDrive/edge-srt-to-speech/src/edge_srt_to_speech/main.py", line 251, in _main
last_end += get_duration(j)
File "/content/drive/MyDrive/edge-srt-to-speech/src/edge_srt_to_speech/main.py", line 39, in get_duration
duration = subprocess.check_output(
File "/usr/lib/python3.10/subprocess.py", line 421, in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
File "/usr/lib/python3.10/subprocess.py", line 526, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ffprobe', '-v', 'error', '-show_entries', 'format=duration', '-of', 'default=noprint_wrappers=1:nokey=1', '/tmp/tmplxc7l66i/15.mp3']' returned non-zero exit status 1.
There seems to be some kind of problem when generating audio resulting with error:
TypeError: Communicate.run() got an unexpected keyword argument 'customspeak'
And it seems that commenting the argument line customspeak=bool(ssml_template),
in audio_gen(queue, ssml_template, ssml_variables)
solves it.
My approach here was quite simple - I just installed edge-srt-to-speech with pip. And run edge-srt-to-speech --enable-debug --voice pl-PL-MarekNeural --disable-enhanced-srt W.srt W.mp3
.
Sorry for this lousy insight I am not proficient with Python.
edge-srt-to-speech .\example.srt test.mp3 --voice zh-CN-YunyangNeural --default-speed +1% --default-pitch +1Hz --default-volume +1%
Generating audio: 0%| | 0/591 [00:00<?, ?it/s]Traceback (most recent call last):
File "d:\develop\python\python38\lib\site-packages\edge_srt_to_speech\__main__.py", line 256, in _main
await f
File "d:\develop\python\python38\lib\asyncio\tasks.py", line 619, in _wait_for_one
return f.result() # May raise f.exception().
File "d:\develop\python\python38\lib\site-packages\edge_srt_to_speech\__main__.py", line 146, in audio_gen
arg, text = get_enhanced_srt_params(text, arg, ssml_variables)
File "d:\develop\python\python38\lib\site-packages\edge_srt_to_speech\__main__.py", line 126, in get_enhanced_srt_params
raise ValueError("edge_tts{} is invalid")
ValueError: edge_tts{} is invalid
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "d:\develop\python\python38\lib\shutil.py", line 616, in _rmtree_unsafe
os.unlink(fullname)
PermissionError: [WinError 32] Another program is using this file and the process cannot access it.: 'C:\\Users\\tuike\\AppData\\Local\\Temp\\tmpx18h1lws\\0.mp3'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "d:\develop\python\python38\lib\tempfile.py", line 802, in onerror
_os.unlink(path)
PermissionError: [WinError 32] Another program is using this file and the process cannot access it.: 'C:\\Users\\tuike\\AppData\\Local\\Temp\\tmpx18h1lws\\0.mp3'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "d:\develop\python\python38\lib\runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "d:\develop\python\python38\lib\runpy.py", line 87, in _run_code
exec(code, run_globals)
File "D:\Develop\Python\Python38\Scripts\edge-srt-to-speech.exe\__main__.py", line 7, in <module>
File "d:\develop\python\python38\lib\site-packages\edge_srt_to_speech\__main__.py", line 367, in main
asyncio.get_event_loop().run_until_complete(
File "d:\develop\python\python38\lib\asyncio\base_events.py", line 616, in run_until_complete
return future.result()
File "d:\develop\python\python38\lib\site-packages\edge_srt_to_speech\__main__.py", line 317, in _main
pdbar.close()
File "d:\develop\python\python38\lib\tempfile.py", line 827, in __exit__
self.cleanup()
File "d:\develop\python\python38\lib\tempfile.py", line 831, in cleanup
self._rmtree(self.name)
File "d:\develop\python\python38\lib\tempfile.py", line 813, in _rmtree
_shutil.rmtree(name, onerror=onerror)
File "d:\develop\python\python38\lib\shutil.py", line 740, in rmtree
return _rmtree_unsafe(path, onerror)
File "d:\develop\python\python38\lib\shutil.py", line 618, in _rmtree_unsafe
onerror(os.unlink, fullname, sys.exc_info())
File "d:\develop\python\python38\lib\tempfile.py", line 805, in onerror
cls._rmtree(path)
File "d:\develop\python\python38\lib\tempfile.py", line 813, in _rmtree
_shutil.rmtree(name, onerror=onerror)
File "d:\develop\python\python38\lib\shutil.py", line 740, in rmtree
return _rmtree_unsafe(path, onerror)
File "d:\develop\python\python38\lib\shutil.py", line 599, in _rmtree_unsafe
onerror(os.scandir, path, sys.exc_info())
File "d:\develop\python\python38\lib\shutil.py", line 596, in _rmtree_unsafe
with os.scandir(path) as scandir_it:
NotADirectoryError: [WinError 267] The directory name is invalid.: 'C:\\Users\\tuike\\AppData\\Local\\Temp\\tmpx18h1lws\\0.mp3'
Generating audio: 0%| | 0/591 [00:00<?, ?it/s]
Functions similar to: https://azure.microsoft.com/en-us/services/cognitive-services/text-to-speech/#features
I made a simple sketch with this page and included another paragraph of reference data
roleList = ['Default', 'YoungAdultFemale', 'YoungAdultMale', 'OlderAdultFemale', 'OlderAdultMale', 'SeniorFemale', 'SeniorMale', 'Girl', 'Boy']
voices = {
'zh-CN': {
'zh-CN-YunyeNeural': {
'name': 'Yunye',
'style': ['General', 'angry', 'calm', 'cheerful', 'disgruntled', 'embarrassed', 'fearful', 'sad', 'serious'],
'role': roleList,
'style_support': 'Optimized for story narrating, multiple role-play and styles available using SSML'},
'zh-CN-XiaoxiaoNeural': {
'name': 'Xiaoxiao',
'style': ['General', 'affectionate', 'angry', 'assistant', 'calm', 'chat', 'cheerful', 'customerservice', 'disgruntled', 'fearful', 'gentle', 'lyrical', 'newscast', 'sad', 'serious'],
'role': None,
'style_support': 'General, multiple voice styles available using SSML'},
'zh-CN-XiaohanNeural': {
'name': 'Xiaohan',
'style': ['General', 'affectionate', 'angry', 'calm', 'cheerful', 'disgruntled', 'embarrassed', 'fearful', 'gentle', 'sad', 'serious'],
'role': None,
'style_support': 'General, multiple styles available using SSML'},
'zh-CN-XiaoruiNeural': {
'name': 'Xiaorui',
'style': ['General', 'angry', 'calm', 'fearful', 'sad'],
'role': None,
'style_support': 'Senior voice, multiple styles available using SSML'},
'zh-CN-XiaomoNeural': {
'name': 'Xiaomo',
'style': ['General', 'affectionate', 'angry', 'calm', 'cheerful', 'depressed', 'disgruntled', 'embarrassed', 'envious', 'fearful', 'gentle', 'sad', 'serious'],
'role': roleList,
'style_support': 'General, multiple role-play and styles available using SSML'},
'zh-CN-XiaochenNeural': {
'name': 'Xiaochen',
'style': None,
'role': None,
'style_support': 'Optimized for spontaneous conversation'},
'zh-CN-XiaoqiuNeural': {
'name': 'Xiaoqiu',
'style': None,
'role': None,
'style_support': 'Optimized for narrating'}
},
'en-US': {
'en-US-GuyNeural': {
'name': 'Guy',
'style': ['General', 'newscast'],
'role': None,
'style_support': 'General, multiple voice styles available using SSML'},
'en-US-AriaNeural': {
'name': 'Aria',
'style': ['General', 'chat', 'cheerful', 'customerservice', 'empathetic', 'narration-professional', 'newscast-casual', 'newscast-formal'],
'role': None, 'style_support': 'General, multiple voice styles available using SSML'},
'en-US-JennyNeural': {
'name': 'Jenny',
'style': ['General', 'assistant', 'chat', 'customerservice', 'newscast'],
'role': None,
'style_support': 'General, multiple voice styles available using SSML'},
'en-US-SaraNeural': {
'name': 'Sara',
'style': ['General', 'angry', 'cheerful', 'sad'],
'role': None,
'style_support': 'General, multiple voice styles available using SSML'}
}
}
# role、style Data from:https://azure.microsoft.com/en-us/services/cognitive-services/text-to-speech/#features
# Style_support Data from:https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/language-support#text-to-speech
Supports accepting command line arguments to start: srt_file
,out_file
, voice
.
A SSML template needs to be built in, e.g.
<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis"
xmlns:mstts="https://www.w3.org/2001/mstts" xml:lang="{lang}">
<voice name="{voice}">
<mstts:express-as role="{role}" style="{style}" styledegree="{styledegree}">
<prosody pitch="{pitch}" rate="{rate}" volume="{volume}">
{text}
</prosody>
</mstts:express-as>
</voice>
</speak>
I liked your project, I need it for my studies to transform srt into audio, but I can't make it work. Could you give me instructions on how to make it work?
After run the command:
edge-srt-to-speech --voice zh-CN-XiaoxiaoNeural test.srt test.mp3
I got an error as follow:
`
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "D:\Python\Python310\lib\runpy.py", line 196, in run_module_as_main
return run_code(code, main_globals, None,
File "D:\Python\Python310\lib\runpy.py", line 86, in run_code
exec(code, run_globals)
File "D:\Python\Python310\Scripts\edge-srt-to-speech.exe_main.py", line 7, in
File "D:\Python\Python310\lib\site-packages\edge_srt_to_speech_main.py", line 323, in main
asyncio.get_event_loop().run_until_complete(
File "D:\Python\Python310\lib\asyncio\base_events.py", line 649, in run_until_complete
return future.result()
File "D:\Python\Python310\lib\site-packages\edge_srt_to_speech_main.py", line 190, in _main
with tempfile.TemporaryDirectory() as temp_dir:
File "D:\Python\Python310\lib\tempfile.py", line 869, in exit
self.cleanup()
File "D:\Python\Python310\lib\tempfile.py", line 873, in cleanup
self._rmtree(self.name, ignore_errors=self._ignore_cleanup_errors)
File "D:\Python\Python310\lib\tempfile.py", line 855, in _rmtree
_shutil.rmtree(name, onerror=onerror)
File "D:\Python\Python310\lib\shutil.py", line 750, in rmtree
return _rmtree_unsafe(path, onerror)
File "D:\Python\Python310\lib\shutil.py", line 620, in _rmtree_unsafe
onerror(os.unlink, fullname, sys.exc_info())
File "D:\Python\Python310\lib\tempfile.py", line 846, in onerror
cls._rmtree(path, ignore_errors=ignore_errors)
File "D:\Python\Python310\lib\tempfile.py", line 855, in _rmtree
_shutil.rmtree(name, onerror=onerror)
File "D:\Python\Python310\lib\shutil.py", line 750, in rmtree
return _rmtree_unsafe(path, onerror)
File "D:\Python\Python310\lib\shutil.py", line 601, in _rmtree_unsafe
onerror(os.scandir, path, sys.exc_info())
File "D:\Python\Python310\lib\shutil.py", line 598, in _rmtree_unsafe
with os.scandir(path) as scandir_it:
NotADirectoryError: [WinError 267] 目录名称无效。: 'C:\Users\suife\AppData\Local\Temp\tmpeph4xjnc\50.mp3'
Generating audio: 94%|███████████████████████████████████████████████████████▌ | 80/85 [00:28<00:01, 2.83it/s]
`
the command line tool show "目录名称无效", it means "Directory name is invalid", how to solve this problem?
Thank you!
--ssml_template path/example.xml
<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis"
xmlns:mstts="https://www.w3.org/2001/mstts" xml:lang="en-US">
<voice name="en-US-AriaNeural">
<mstts:express-as style="cheerful">
{text}
</mstts:express-as>
</voice>
</speak>
where "{text}" is replaced by the current sentence.
--ssml_elements "voice:en-US-SaraNeural,speed:+1%,style:cheerful"
This format, potentially, is simpler to write for the user, but because it needs to be first written into the SSML template, it requires more work on your part to do so.
A single sentence, in addition to being able to support SSML templates, then add support for using full SSML files instead of SSML templates, thus enabling finer control of the voice read aloud effect.
<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis"
xmlns:mstts="https://www.w3.org/2001/mstts" xml:lang="en-US">
<voice name="en-US-AriaNeural">
<mstts:express-as style="cheerful">
That'd be just amazing!
</mstts:express-as>
</voice>
</speak>
These features can be tricky to implement, so please decide which parts you want to implement based on the level of difficulty and your own time.
I am testing with automatically generated subtitle files, but sometimes there are empty lines in the subtitles, for example:
253
00:23:46,280 --> 00:23:50,280
If I just plug this in here, you can see there's probably the opposite way around.
254
00:23:51,280 --> 00:23:51,280
255
00:23:51,280 --> 00:23:55,280
It is. But that doesn't matter. We can just change these values around here.
When the program synthesizes speech, it gets stuck and doesn't continue running.
I've been able to generate mp3 files in one click with another software, but the parameters are fixed, which isn't ideal.
I was wondering if it would be possible to make a simple GUI for the time being that could still be started by passing in parameters, which could be tweaked in the GUI interface.
I looked at a library that might be easier to implement, but unfortunately I have too little programming knowledge and don't know how to use this library, would it be helpful for you to look at this: https://github.com/chriskiehl/Gooey
Here's a picture attached to show you how I'm using it now. please ignore the Chinese in the menu, that's just the name of a different VOICE.
In actual use, sometimes certain segments are read too quickly, making it difficult to hear clearly. Therefore, it is requested to export the original srt file in JSON format along with the audio file, which includes the actual reading speed for each text segment. With this JSON file, we can achieve better reading effects by re-editing the video or re-editing the srt text.
srt:
1
00:05:00,400 --> 00:05:15,300
If you want to use the edge-tts command, you can simply run it with the following command:
2
00:05:16,400 --> 00:05:25,300
Note the above requires the installation of the mpv command line player.
json:
In the following example snippets, rate
represents the actual reading speed.
{
"subtitles": [
{
"id": "1",
"text": "If you want to use the edge-tts command, you can simply run it with the following command:",
"start_time": "00:05:00.400",
"end_time": "00:05:15.300",
"rate": 1.8
},
{
"id": "2",
"text": "Note the above requires the installation of the mpv command line player.",
"start_time": "00:05:16.400",
"end_time": "00:05:25.300",
"rate": 1.2
}
]
}
Set the SSML parameters separately for a sentence in the SRT file in the following two formats For reference only.
1
00:00:00,498 --> 00:00:02,827
Here's what I love most about food and diet.
2
00:00:02,827 --> 00:00:06,383
We all eat several times a day,and we're totally in charge
voice:en-US-SaraNeural,speed:+10%,volume:-5%,style:cheerful,styledegree:2,role:OlderAdultMale……
3
00:00:06,383 --> 00:00:09,427
of what goes on our plate and what stays off.
1
00:00:00,498 --> 00:00:02,827
Here's what I love most about food and diet.
2
00:00:02,827 --> 00:00:06,383
{voice:en-US-SaraNeural,speed:+10%,volume:-5%,style:cheerful,styledegree:2,role:OlderAdultMale……}We all eat several times a day,and we're totally in charge
3
00:00:06,383 --> 00:00:09,427
of what goes on our plate and what stays off.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.