Giter Club home page Giter Club logo

srt-to-ssml's Introduction

SRT-To-SSML

Converts SRT subtitle file to SSML file with speech durations.

Note: If looking for a more comprehensive tool for also generating synced and translated dubs, visit my other repo.

Use Cases

  • Using TTS to generate speech for a video using only subtitles
  • Automated translation and dubbing of videos while keeping the dub in sync. You can simply translate the text portions of the subtitles before feeding it into the script. This allows the translations of each line remain the same length of the original speech, so the generated speech should theoretically be a drop-in replacement of the original.

How it Works:

  • It takes the text lines from the subtitle file and puts each on a separate line within the speak tag
  • It takes the timestamps for the start/end for each subtitle line, and calculates that time difference in milliseconds. Then uses that for the duration attribute for the prosody tag. This tells the TTS how long it should take to say the line, so it will stay in sync with the original video.
    • Note: Not every neural TTS service supports/uses the duration feature. Amazon Polly non-neural voices and Azure Speech do, but use their own tags, which this script will automatically use instead.
  • It also calculates the time difference between the end of one subtitle line and the beginning of the next, and uses that as the time attribute for the break tag at the end of each text line. This is also to keep it in sync with the original video.

Other Notable Features

  • Automatic tag configuration based on TTS service (currently supports Microsoft Azure and Amazon Polly non-neural voices)
    • Note: Currently only Azure Speech seems to support specifying the duration of speech for neural voices. Therefore that is the only service that can properly take advantage of this script. Amazon Polly does too, but only for standard non-neural voices.

SSML Options Changeable With Variables

  • Language
  • TTS Voice Name
  • SSML Version
  • xmlns Attributes for tag
  • Whether to include the xmlns:xsi and xsi:schemaLocation attributes
  • Input and Output file names (Defaults: subtitles.srt for input and SSML.txt for output)
  • Duration Attribute Name

Example

Input (SRT Subtitle File)

1
00:00:00,140 --> 00:00:05,050
This is an example of a subtitle file with a bunch of random words I've added with various timestamps.

2
00:00:05,240 --> 00:00:13,290
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim

3
00:00:13,480 --> 00:00:14,250
veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.

4
00:00:14,340 --> 00:00:19,930
Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla.

5
00:00:20,130 --> 00:00:23,419
Now some examples of some escaped characters such as & and ' and " and < and > just to name a few

Output

<?xml version="1.0" encoding="UTF-8"?>
<speak xmlns="http://www.w3.org/2001/10/synthesis" xmlns:mstts="http://www.w3.org/2001/mstts" xmlns:emo="http://www.w3.org/2009/10/emotionml" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2001/10/synthesis http://www.w3.org/TR/speech-synthesis/synthesis.xsd" version="1.0" xml:lang="en-US"><voice name="en-US-DavisNeural">
	<prosody duration="4910ms">This is an example of a subtitle file with a bunch of random words I&apos;ve added with various timestamps.</prosody><break time="190ms"/>
	<prosody duration="8050ms">Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim</prosody><break time="190ms"/>
	<prosody duration="770ms">veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.</prosody><break time="90ms"/>
	<prosody duration="5590ms">Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla.</prosody><break time="200ms"/>
	<prosody duration="3289ms">Now some examples of some escaped characters such as &amp; and &apos; and &quot; and &lt; and &gt; just to name a few</prosody>
</voice></speak>

srt-to-ssml's People

Contributors

minionguyjpro avatar thiojoe avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

srt-to-ssml's Issues

Set encoding when opening file

Hit the following while testing with a danish SRT file.
image

Setting the opening encoding mode to utf-8-sig fixed the issue for me

with open(srtFile, 'r', encoding='utf-8-sig') as f:
    lines = f.readlines()

SSML creates longer times and not in sync.

Thank you for your work on this project. I have an SRT audio file and an SSML Audio file from Amazon Polly.

For some reason which I cannot figure out, the SSML audio files created by Amazon Polly are much longer and out of sync with original SRT timestamps.

Here is a view of the audio files in Resolve showing different timestamps and SSML being longer. I used the example 23 second SRT listed in this project.

1
00:00:00,140 --> 00:00:05,050
This is an example of a subtitle file with a bunch of random words I've added with various timestamps.

2
00:00:05,240 --> 00:00:13,290
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim

3
00:00:13,480 --> 00:00:14,250
veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.

4
00:00:14,340 --> 00:00:19,930
Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla.

5
00:00:20,130 --> 00:00:23,419
Now some examples of some escaped characters such as & and ' and " and < and > just to name a few


This is an example of a subtitle file with a bunch of random words I've added with various timestamps.
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim
veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla.
Now some examples of some escaped characters such as & and ' and " and < and > just to name a few

Screenshot 2024-07-20 at 9 43 29 PM

Support multi-line subtitle lines

I noticed that the current version of this script doesn't support multi-line subtitles like the following.

1
00:00:23,650 --> 00:00:27,862
FORLAD BRONX OG BLIV
SKREVET OP TIL ET NYT HUS

2
00:00:27,946 --> 00:00:31,157
I DET FORTRYLLENDE
NEW MEXICO

I knocked together a PoC of how you could fix this if you want to keep the current "enumerate each subtitle line" concept, another option would be just to go back and append the text in subsDict until we hit line 97.

for lineNum, line in enumerate(lines):
    line = line.strip()
    # If line has no text
    lineWithSubtitleText = ""
    if line.isdigit() and subtitleTimeLineRegex.match(lines[lineNum + 1]):
        lineWithTimestamps = lines[lineNum + 1].strip()
        
        # We may have a multi-line sentence, read each of the lines until we hit the next subtitle line.
        for count, sublineItem in enumerate(lines):
            if(count <= lineNum+1):
                continue
            subline = sublineItem.strip()
            if (subline.isdigit() and subtitleTimeLineRegex.match(lines[count + 1])) or subline == "":
                break
            else:
                lineWithSubtitleText += f"{subline}\n"

Here's what it looks like with the code above.
image

Can't Recreate Example

So I'm trying to get SRT to SSML working for another TTS service (https://blakify.com/).

I (mostly) have SRT-To-SSML working with my own SRT file... yet as a baseline, I can't get the example SRT from README.md to even work. See the example below.

*note, I've modified the script to use sys.argv as arguments
image

I also noticed in both my own SRT file and by commenting out lines 102-109... that the SSML generated won't include elements.

I'd appreciate any guidance or help, thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.