Giter Club home page Giter Club logo

Comments (12)

cglatot avatar cglatot commented on September 25, 2024

The way I have the track / subtitle matching set up is that it will look at the name, title, language, and language code.

It will try to make a perfect match if it can - so this should already be doing what you need, and if it can't it will go back and see which track matches the most criteria. Matching based on name alone is actually pretty low down in the priority.

I could add a button to have exact matching only, but if there isn't an exact match across all 4 criteria, then it will not change the tracks, and honestly I don't see the value in doing that.

from pasta.

phexgmbh avatar phexgmbh commented on September 25, 2024

@cglatot Thanks for your reply!

Lets take Game of Thrones for instance. Theres a lot of "alien language" where forced subtitles are required to be able to understand. However, like most people, I wouldn't want an english full subtitle on my movie, only an english forced subtitle.
So instead of only choosing forced subtitles on the episodes that contain alien language, it activates the full subtitle on those that dont have a forced subtitle, making me have to turn off the full subtitle manually, which technically is what I'm trying to avoid with your tool (turning off full subtitle manually <-> turning on forced subtitle manually).

I hope you understand what Im trying to say :c

from pasta.

cglatot avatar cglatot commented on September 25, 2024

Ahhhhhhhhhhhhhhhhhh I see what you mean now. That is actually a very valid point.

I will need to have some thought about how to best implement this as it is a bit of a niche case. It's also not just as straightforward as implementing only matching on "everything" as the "forced" keyword can exist in either the Name or the Title - and they might be named differently across episodes or seasons.

from pasta.

phexgmbh avatar phexgmbh commented on September 25, 2024

@cglatot gotcha! If its possible, that'd be great. If not, thats alright, too :) still a great tool.
Maybe a general option to just search for forced subtitles and activate them (rather than choose one option and trying to duplicate on other episodes/movies?)

from pasta.

david-kalbermatten avatar david-kalbermatten commented on September 25, 2024

This seems to be an issue with a lot of uploads, especially anime. MKV supports the "forced"-flag, but many uploaders don't set it correctly or not at all, so you end up with subtitle tracks that have the keyword "forced" in the name but lack the flag. I made a python script that updates mkv files with the correct flags which can be run like so:

It needs MKVToolNix installed which in turn has to be added to the PATH otherwise the script can't find "mkvpropedit.exe"
py script-name.py path/to/files/to/update

#!/usr/bin/env python3
import concurrent.futures as futures
import json
import os
import subprocess
import sys

languages = [
    ('ger', 'German', 'Deutsch'),
    ('eng', 'English', 'Englisch'),
    ('jpn', 'Japanese', 'Japanisch'),
    ('fre', 'French', 'Französisch', 'Franzoesisch'),
    ('rus', 'Russian', 'Russisch'),
]


def get_all_subfolders(rootdir):
    subfolders = []
    for it in os.scandir(rootdir):
        if it.is_dir():
            subfolders.append(it.path)
            subfolders.extend(get_all_subfolders(it))
    return subfolders


def get_all_files(folder):
    mkv_files = []
    for file in os.listdir(folder):
        if file.endswith(".mkv"):
            mkv_files.append(os.path.join(folder, file))
    return mkv_files


def subtitle_name_is_forced(text):
    forced_terms = ["forced", "sign", "song", "s&s"]
    if not text:
        return False
    return any([(term in text.lower()) for term in forced_terms])


def subtitle_name_is_language(text, language_key_words: tuple):
    eng_terms = [word.lower() for word in language_key_words]
    if not text:
        return False
    return any([(term in text.lower()) for term in eng_terms])


def update_mkv_file(mkv_file, props: dict, property_to_be_set: str):
    subprocess.Popen(
        ["mkvpropedit", mkv_file, "--edit", "track:=" + str(props["uid"]), "--set",
         property_to_be_set], stdout=subprocess.DEVNULL).wait()


def repair_forced_subtitles(mkv_file, track_name: str, props: dict) -> str:
    report = ""
    if subtitle_name_is_forced(track_name):
        if not props.get("forced_track"):
            report += f"\n- Track \"{track_name}\" set to forced"
            update_mkv_file(mkv_file, props, "flag-forced=true")
        else:
            report += f"\n- Track \"{track_name}\" was already set to forced"
    else:
        report += f"\n- Track \"{track_name}\" is not forced"
    return report


def repair_language_subtitles(mkv_file, track_name: str, props: dict, language: tuple[str]) -> str:
    report = ""
    if subtitle_name_is_language(props.get("track_name"), language):
        if not props.get("language") or not props.get("language") == language[0]:
            report += f"\n- Track \"{track_name}\" set to {language[1]}"
            update_mkv_file(mkv_file, props, f"language={language[0]}")
        else:
            report += f"\n- Track \"{track_name}\" was already set to {language[1]}"
    return report


def analyse_and_correct_mkv(mkv_file: str):
    metadata = json.loads(subprocess.check_output(
        ["mkvmerge", "-J", mkv_file]).decode())
    orig_report = f"File: \"{os.path.basename(mkv_file)}\" had the following changes applied:"
    report = orig_report
    for track in metadata["tracks"]:
        props = track["properties"]
        track_name = props.get("track_name") if props.get(
            "track_name") else "Unnamed"
        if track["type"] == "subtitles":
            report += repair_forced_subtitles(mkv_file, track_name, props)
            for lang_tuple in languages:
                report += repair_language_subtitles(mkv_file, track_name, props, lang_tuple)
    if report == orig_report:
        return f"File: \"{os.path.basename(mkv_file)}\" was not modified"
    else:
        return report


def main(argv):
    folder_list = get_all_subfolders(argv[1])
    folder_list.append(argv[1])
    mkv_files = [
        file for sublist in folder_list for file in get_all_files(sublist)]
    with futures.ThreadPoolExecutor() as ex:
        for future in ex.map(analyse_and_correct_mkv, mkv_files):
            print(future)


if __name__ == "__main__":
    main(sys.argv)

Maybe this is useful to someone. Be aware, that this script only works if the subtitle track names have any of the following keywords in them ["forced", "sign", "song", "s&s"].

from pasta.

Ninelpienel avatar Ninelpienel commented on September 25, 2024

Why isn't it possible to set "subtitle track 2" instead of "subtitle tracks with name XY"? That would be so much easier.

from pasta.

david-kalbermatten avatar david-kalbermatten commented on September 25, 2024

Why isn't it possible to set "subtitle track 2" instead of "subtitle tracks with name XY"? That would be so much easier.

Because not all videos have the subs in the same order. I've seen releases that add directors' commentary subs which throw off the order of the subs. Just one example from the top of my head...

from pasta.

Ninelpienel avatar Ninelpienel commented on September 25, 2024

Why isn't it possible to set "subtitle track 2" instead of "subtitle tracks with name XY"? That would be so much easier.

Because not all videos have the subs in the same order. I've seen releases that add directors' commentary subs which throw off the order of the subs. Just one example from the top of my head...

Would be nice if you could add this as an optional function. I don't want to remux thousands of anime episodes.

from pasta.

david-kalbermatten avatar david-kalbermatten commented on September 25, 2024

One way of handling this would be to treat subtitles as forced if they either have the "forced"-flag set to true OR have a substring match for user-provided keywords. The keyword match could then be treated as a quasi forced flag.

But I just noticed that this issue is actually about another bug. Here the issue revolves around the logic picking the closest match in a show that might not have forced subtitles for every episode and picks the full-sub for those that have no forced sub.

@Ninelpienel
Your issue is more about the way it decides what it picks. In your screenshots, I'd guess, it's the episode numbers in the names throwing the logic off.

Would be nice if you could add this as an optional function. I don't want to remux thousands of anime episodes.

The script I posted above wouldn't remux the files. It would just rewrite the tag information which doesn't cause the entire file to be rewritten. Also, I made it so it runs multi-threaded which should make it run rather quickly.

from pasta.

cglatot avatar cglatot commented on September 25, 2024

Why isn't it possible to set "subtitle track 2" instead of "subtitle tracks with name XY"? That would be so much easier.

Because not all videos have the subs in the same order. I've seen releases that add directors' commentary subs which throw off the order of the subs. Just one example from the top of my head...

Would be nice if you could add this as an optional function. I don't want to remux thousands of anime episodes.

The issue that you had should be resolved when I resolve this issue (#53), But if you want the option of choosing a specific track number (knowing that it may not always be the same track) then please raise a separate feature request and I can take a look at that in time.

from pasta.

Ninelpienel avatar Ninelpienel commented on September 25, 2024

This seems to be an issue with a lot of uploads, especially anime. MKV supports the "forced"-flag, but many uploaders don't set it correctly or not at all, so you end up with subtitle tracks that have the keyword "forced" in the name but lack the flag. I made a python script that updates mkv files with the correct flags which can be run like so:

It needs MKVToolNix installed which in turn has to be added to the PATH otherwise the script can't find "mkvpropedit.exe" py script-name.py path/to/files/to/update

#!/usr/bin/env python3
import concurrent.futures as futures
import json
import os
import subprocess
import sys

languages = [
    ('ger', 'German', 'Deutsch'),
    ('eng', 'English', 'Englisch'),
    ('jpn', 'Japanese', 'Japanisch'),
    ('fre', 'French', 'Französisch', 'Franzoesisch'),
    ('rus', 'Russian', 'Russisch'),
]


def get_all_subfolders(rootdir):
    subfolders = []
    for it in os.scandir(rootdir):
        if it.is_dir():
            subfolders.append(it.path)
            subfolders.extend(get_all_subfolders(it))
    return subfolders


def get_all_files(folder):
    mkv_files = []
    for file in os.listdir(folder):
        if file.endswith(".mkv"):
            mkv_files.append(os.path.join(folder, file))
    return mkv_files


def subtitle_name_is_forced(text):
    forced_terms = ["forced", "sign", "song", "s&s"]
    if not text:
        return False
    return any([(term in text.lower()) for term in forced_terms])


def subtitle_name_is_language(text, language_key_words: tuple):
    eng_terms = [word.lower() for word in language_key_words]
    if not text:
        return False
    return any([(term in text.lower()) for term in eng_terms])


def update_mkv_file(mkv_file, props: dict, property_to_be_set: str):
    subprocess.Popen(
        ["mkvpropedit", mkv_file, "--edit", "track:=" + str(props["uid"]), "--set",
         property_to_be_set], stdout=subprocess.DEVNULL).wait()


def repair_forced_subtitles(mkv_file, track_name: str, props: dict) -> str:
    report = ""
    if subtitle_name_is_forced(track_name):
        if not props.get("forced_track"):
            report += f"\n- Track \"{track_name}\" set to forced"
            update_mkv_file(mkv_file, props, "flag-forced=true")
        else:
            report += f"\n- Track \"{track_name}\" was already set to forced"
    else:
        report += f"\n- Track \"{track_name}\" is not forced"
    return report


def repair_language_subtitles(mkv_file, track_name: str, props: dict, language: tuple[str]) -> str:
    report = ""
    if subtitle_name_is_language(props.get("track_name"), language):
        if not props.get("language") or not props.get("language") == language[0]:
            report += f"\n- Track \"{track_name}\" set to {language[1]}"
            update_mkv_file(mkv_file, props, f"language={language[0]}")
        else:
            report += f"\n- Track \"{track_name}\" was already set to {language[1]}"
    return report


def analyse_and_correct_mkv(mkv_file: str):
    metadata = json.loads(subprocess.check_output(
        ["mkvmerge", "-J", mkv_file]).decode())
    orig_report = f"File: \"{os.path.basename(mkv_file)}\" had the following changes applied:"
    report = orig_report
    for track in metadata["tracks"]:
        props = track["properties"]
        track_name = props.get("track_name") if props.get(
            "track_name") else "Unnamed"
        if track["type"] == "subtitles":
            report += repair_forced_subtitles(mkv_file, track_name, props)
            for lang_tuple in languages:
                report += repair_language_subtitles(mkv_file, track_name, props, lang_tuple)
    if report == orig_report:
        return f"File: \"{os.path.basename(mkv_file)}\" was not modified"
    else:
        return report


def main(argv):
    folder_list = get_all_subfolders(argv[1])
    folder_list.append(argv[1])
    mkv_files = [
        file for sublist in folder_list for file in get_all_files(sublist)]
    with futures.ThreadPoolExecutor() as ex:
        for future in ex.map(analyse_and_correct_mkv, mkv_files):
            print(future)


if __name__ == "__main__":
    main(sys.argv)

Maybe this is useful to someone. Be aware, that this script only works if the subtitle track names have any of the following keywords in them ["forced", "sign", "song", "s&s"].

There is small bug: The script interprets the anime name "Tengen Toppa Gurren Lagann" as english an sets the language to "eng". Maybe you can tweek the script a little bit.

from pasta.

david-kalbermatten avatar david-kalbermatten commented on September 25, 2024

Yea, I gave it the whole tuple ('eng', 'English', 'Englisch') which falsely flags the eng in "Tengen" . Should have taken all the elements from the tuple except the first one. Like this:

def repair_language_subtitles(mkv_file, track_name: str, props: dict, language: tuple[str]) -> str:
    report = ""
    if subtitle_name_is_language(props.get("track_name"), language):
        if not props.get("language") or not props.get("language") == language[0]:
            report += f"\n- Track \"{track_name}\" set to {language[1]}"
            update_mkv_file(mkv_file, props, f"language={language[0]}")
        else:
            report += f"\n- Track \"{track_name}\" was already set to {language[1]}"
    return report
>>> ('ger', 'German', 'Deutsch')
def repair_language_subtitles(mkv_file, track_name: str, props: dict, language: tuple[str]) -> str:
    report = ""
    if subtitle_name_is_language(props.get("track_name"), language[1:]):
        if not props.get("language") or not props.get("language") == language[0]:
            report += f"\n- Track \"{track_name}\" set to {language[1]}"
            update_mkv_file(mkv_file, props, f"language={language[0]}")
        else:
            report += f"\n- Track \"{track_name}\" was already set to {language[1]}"
    return report
>>> ('German', 'Deutsch')

from pasta.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.