Giter Club home page Giter Club logo

scinoephile's Introduction

build coverage docs license

Introduction

Python package for working with Chinese/English bilingual subtitles. Useful for converting image-based Chinese subtitles into text using OCR, and for combining separate Chinese and English subtitles into synchronized bilingual subtitles. May optionally add romanization below Chinese subtitles using Mandarin Hanyu Pinyin or the Yale romanization of Cantonese.

Dependencies

All features require the following modules:

Selected features may also require:

Installation

python setup.py install

Usage

Derasterizer

usage: Derasterizer.py [-h] [-v | -q] -if FILE [-rm FILE] [-sf FILE] [-t]
                       [-of FILE] [-o]

Converts image-based subtitles into text using a deep neural network-based
optical character recognition model.

optional arguments:
  -h, --help            show this help message and exit
  -v, --verbose         enable verbose output, may be specified more than once
  -q, --quiet           disable verbose output

input arguments:
  -if FILE, --infile FILE
                        image-based Chinese Hanzi subtitle infile
  -rm FILE, --recognition_model FILE
                        character recognition model infile
  -sf FILE, --standard FILE
                        known accurate text-based Chinese Hanzi subtitle
                        infile for validation of OCR results

operation arguments:
  -t, --tesseract       use tesseract library for OCR rather than scinoephile

output arguments:
  -of FILE, --outfile FILE
                        text-based Chinese Hanzi subtitle outfile
  -o, --overwrite       overwrite outfile if it exists

Compositor

usage: Compositor.py [-h] [-v | -q] [-bif FILE] [-cif FILE] [-eif FILE]
                     [-pif FILE] [-c] [-m] [-s] [-bof FILE] [-cof FILE]
                     [-eof FILE] [-pof FILE] [-o]

Compiles Chinese and English subtitles into a single file, optionally adding
Mandarin or Cantonese pinyin, converting traditional characters to simplified,
or adding machine translation.

Operations are inferred from provided infiles and outfiles, e.g.:

  Merge Chinese and English:
    Compositor.py -cif /chinese/infile
                  -eif /english/infile
                  -bof /bilingual/outfile

  Convert Chinese Hanzi to Cantonese Yale pinyin:
    Compositor.py -cif /chinese/infile
                  -pof /chinese/outfile
                  --cantonese

  Translate Chinese Hanzi to English, overwriting if necessary:
    Compositor.py -cif /chinese/infile
                  -eof /english/outfile
                  -o

  Convert traditional Chinese to simplified, translate to English, and merge:
    Compositor.py -cif /chinese/infile
                  -bof /bilingual/outfile
                  --simplify

optional arguments:
  -h, --help            show this help message and exit
  -v, --verbose         enable verbose output, may be specified more than once
  -q, --quiet           disable verbose output

input arguments:
  -bif FILE, --bilingual_infile FILE
                        bilingual subtitle infile
  -cif FILE, --chinese_infile FILE
                        Chinese Hanzi subtitle infile
  -eif FILE, --english_infile FILE
                        English subtitle infile
  -pif FILE, --pinyin_infile FILE
                        Chinese pinyin subtitle infile

operation arguments:
  -c, --cantonese       add Cantonese Yale pinyin (耶鲁粤语拼音); mainly useful for
                        older Hong Kong movies (1980s to early 1990s) whose
                        Chinese subtitles are in 粤文 (i.e. using 係, 喺, and 唔
                        rather than 是, 在, and 不, etc.)
  -m, --mandarin        add Mandarin Hanyu pinyin (汉语拼音)
  -s, --simplify        convert traditional Hanzi characters to simplified

output arguments:
  -bof FILE, --bilingual_outfile FILE
                        bilingual subtitle outfile
  -cof FILE, --chinese_outfile FILE
                        Chinese Hanzi subtitle outfile
  -eof FILE, --english_outfile FILE
                        English subtitle outfile
  -pof FILE, --pinyin_outfile FILE
                        Chinese pinyin subtitle outfile
  -o, --overwrite       overwrite outfiles if they exist

Authorship

Scinoephile is developed by Karl T. Debiec.

License

Released under a 3-clause BSD license.

scinoephile's People

Contributors

karltdebiec avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

scinoephile's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.