Giter Club home page Giter Club logo

livetranslator's Introduction

LiveTranslator

LiveTranslator finds and translates texts from images within the region or window using OCR and overlays the translated texts.

Getting Started

Requirements

  • Python, Pip, Pipenv

Setup Instructions

  1. Clone the repository:

    git clone <repository-url>
    cd LiveTranslator
  2. Run the environment setup script:

    python setup_env.py
  3. Install the dependencies:

    pipenv install
  4. Use the virtual environment:

    pipenv shell
  5. Exit/Deactivate the virtual environment:

    exit

Usage

Translate Text from Images

Use the following command to translate text from images:

python3 scripts/main.py translate --src <path_to_image> --print --print-boxes --from <lang> --to <lang> --translator <translator>
CLI Options Explanation
  • --src (Required): Specify the path to the image file to be translated.
  • --print (Optional): Print extracted texts to the console.
  • --print-boxes (Optional): Draw bounding boxes around the detected text in the image.
  • --from (Optional): Specify the source language for OCR (e.g., "eng", "jpn"). If not specified, it's auto-detected.
  • --to (Optional): Specify the target language for translation (default is "eng").
  • --show (Optional): Show the final translated image.
  • --translator (Optional): Choose the translation service (default is "google"). Options: google, deepl.

Run the GUI

Use the following command to run the application's GUI:

python3 scripts/main.py gui

GUI Features

The LiveTranslator GUI allows users to capture and translate text from specific windows or regions of their screen. The main features include:

  1. Select Capture Mode:

    • Capture in Intervals: Capture the selected window/region at regular intervals.
    • Capture and Wait: Capture the selected window/region once and wait until the translated window is closed.
  2. Set Capture Interval:

    • Set the interval (in seconds) for capturing the screen in Capture in Intervals mode.
  3. Select Source and Target Languages:

    • Choose from supported languages (e.g., English, Japanese, French) for OCR and translation.
  4. Select Capture Type:

    • Window: Capture a specific window.
    • Region: Capture a user-defined region of the screen.
  5. Window List:

    • Refresh and select from a list of currently open windows for capturing.
  6. Region Selection:

    • Manually select a region of the screen for capturing and translation.
  7. Start and Stop Capture:

    • Start capturing the selected window/region and translating the text.
    • Stop the capture process.
  8. Status Display:

    • A status label displays the current state of the capture process, with tooltips for additional information.

Testing

Use the following command to run tests:

python3 tests/run_tests.py <modules>
CLI Options Explanation
  • modules (Optional): Paths to specific test modules or classes. Leaving empty will run all tests
  • -v, --verbosity (Optional): Verbosity level for test output.
    • 0: Minimal output
    • 1: Normal output (default)
    • 2: Detailed output
    • 3: Debug-level output
  • --failfast (Optional): Stop running tests on the first failure.
  • --buffer (Optional): Buffer stdout and stderr during test execution.

Troubleshooting

  • If you encounter issues, you can remove the virtual environment and reinstall dependencies:
    pipenv --rm
    pipenv --python 3.9
    pipenv install

Example Usage

To translate text from an image using the Google Translator, run:

python3 scripts/main.py translate --src path/to/your/image.png --print --print-boxes --from eng --to jpn --translator google --show

Running the Application

To run the application GUI, use:

python3 scripts/main.py gui

livetranslator's People

Contributors

nlmeng avatar

Stargazers

 avatar Ryan Olsen avatar yuji-taira avatar

Watchers

 avatar  avatar

livetranslator's Issues

Refresh App's Status

  • resets between switches
  • after stop recording, reset all status (right now, the region is stored and blocks the window option)

Dynamic OCR

  • Dynamic preprocess: chooses appropriate methods according to the image
  • Dynamic segmenting: use different psm according to the image
  • Separate translate logic and allow different methods of translation #40

Grouping Characters (horizontal)

A pre-processing (pre-translation) step that groups nearby characters into clusters.

DBscan, kmeans:

TODO:

  • use a clustering algorithm to group text_boxes, then contcat them ensuring the order is correct (horizontally: going left-right -> top-bottom) (going pixels by pixels if necessary).
  • fits the translated within the box.
  • deal with outliers
  • #21
  • points that are not clustered should be translated individually normally

Improve OCR

  • translate only one selected lang to another selected lang

  • separate capture logic for later; send in a frame instead

  • preprocess: #26
    https://tesseract-ocr.github.io/tessdoc/ImproveQuality.html

    • Noise removal using median blur.
    • Add a 10px border around the image to improve OCR accuracy at edges.
    • Use pytesseract OSD to detect orientation and script detection.
    • Correct skew in images if orientation is detected as non-zero.
    • There is a minimum text size for reasonable accuracy. You have to consider resolution as well as point size. Accuracy drops off below 10pt x 300dpi, rapidly below 8pt x 300dpi. A quick check is to count the pixels of the x-height of your characters. (X-height is the height of the lower case x.) At 10pt x 300dpi x-heights are typically about 20 pixels, although this can vary dramatically from font to font. Below an x-height of 10 pixels, you have very little chance of accurate results, and below about 8 pixels, most of the text will be "noise removed".
    • Big borders (especially when processing a single letter/digit or one word on a large background) can cause problems (“empty page”). Please try to crop you input image to a text area with reasonable border (e.g. 10 px).
  • segmenting #29
    https://pyimagesearch.com/2021/11/15/tesseract-page-segmentation-modes-psms-explained-how-to-improve-your-ocr-accuracy/

    • segment image into blocks and use psm 3
  • postprocess #29

    • blur background for ocr'd texts
    • overlay translated text
  • translate to non-English: boxes (need font?)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.