livetranslator's Introduction

LiveTranslator

LiveTranslator finds and translates texts from images within the region or window using OCR and overlays the translated texts.

Getting Started

Requirements

Python, Pip, Pipenv

Setup Instructions

Clone the repository:

git clone <repository-url>
cd LiveTranslator

Run the environment setup script:
```
python setup_env.py
```
Install the dependencies:
```
pipenv install
```
Use the virtual environment:
```
pipenv shell
```
Exit/Deactivate the virtual environment:
```
exit
```

Usage

Translate Text from Images

Use the following command to translate text from images:

python3 scripts/main.py translate --src <path_to_image> --print --print-boxes --from <lang> --to <lang> --translator <translator>

CLI Options Explanation

--src (Required): Specify the path to the image file to be translated.
--print (Optional): Print extracted texts to the console.
--print-boxes (Optional): Draw bounding boxes around the detected text in the image.
--from (Optional): Specify the source language for OCR (e.g., "eng", "jpn"). If not specified, it's auto-detected.
--to (Optional): Specify the target language for translation (default is "eng").
--show (Optional): Show the final translated image.
--translator (Optional): Choose the translation service (default is "google"). Options: google, deepl.

Run the GUI

Use the following command to run the application's GUI:

python3 scripts/main.py gui

GUI Features

The LiveTranslator GUI allows users to capture and translate text from specific windows or regions of their screen. The main features include:

Select Capture Mode:
- Capture in Intervals: Capture the selected window/region at regular intervals.
- Capture and Wait: Capture the selected window/region once and wait until the translated window is closed.
Set Capture Interval:
- Set the interval (in seconds) for capturing the screen in Capture in Intervals mode.
Select Source and Target Languages:
- Choose from supported languages (e.g., English, Japanese, French) for OCR and translation.
Select Capture Type:
- Window: Capture a specific window.
- Region: Capture a user-defined region of the screen.
Window List:
- Refresh and select from a list of currently open windows for capturing.
Region Selection:
- Manually select a region of the screen for capturing and translation.
Start and Stop Capture:
- Start capturing the selected window/region and translating the text.
- Stop the capture process.
Status Display:
- A status label displays the current state of the capture process, with tooltips for additional information.

Testing

Use the following command to run tests:

python3 tests/run_tests.py <modules>

CLI Options Explanation

modules (Optional): Paths to specific test modules or classes. Leaving empty will run all tests
-v, --verbosity (Optional): Verbosity level for test output.
- 0: Minimal output
- 1: Normal output (default)
- 2: Detailed output
- 3: Debug-level output
--failfast (Optional): Stop running tests on the first failure.
--buffer (Optional): Buffer stdout and stderr during test execution.

Troubleshooting

If you encounter issues, you can remove the virtual environment and reinstall dependencies:
```
pipenv --rm
pipenv --python 3.9
pipenv install
```

Example Usage

To translate text from an image using the Google Translator, run:

python3 scripts/main.py translate --src path/to/your/image.png --print --print-boxes --from eng --to jpn --translator google --show

Running the Application

To run the application GUI, use:

python3 scripts/main.py gui

livetranslator's People

Contributors

Stargazers

Watchers

livetranslator's Issues

ui

Capture & Window overlay functionality

Window overlay in intervals and close on clicks/keyboards

mimic gpt api

Develop for Window OS

Refresh App's Status

resets between switches
after stop recording, reset all status (right now, the region is stored and blocks the window option)

Unit tests

Process multiple frames

tkinter

pip install -r req.txt

office365 REST

Capture specific window instead of screen

ocr and overlay

Restructure Postprocessor and Make text fits into the box

(calculate font, wrap into multiple lines)

Translation

train model

open ai

refactor CustomGPT + support pip install

Dynamic OCR

Dynamic preprocess: chooses appropriate methods according to the image
Dynamic segmenting: use different psm according to the image
Separate translate logic and allow different methods of translation #40

Translator Module

modulize the translating process to allow for different ways to translate (choosing between different libraries)
add https://pypi.org/project/deep-translator/

Grouping Characters (horizontal)

A pre-processing (pre-translation) step that groups nearby characters into clusters.

DBscan, kmeans:

TODO:

use a clustering algorithm to group text_boxes, then contcat them ensuring the order is correct (horizontally: going left-right -> top-bottom) (going pixels by pixels if necessary).
fits the translated within the box.
deal with outliers
#21
points that are not clustered should be translated individually normally