Interlinearizer

An interlinear book is an annotaded version of a text, where each word of the original text is accompanied with a translation just underneath. interlinearize is a tool with which we can convert EPUB files (or any other format supported by Calibre) into their respective interlinearized versions.

For example, see below the interlinearized version of the first passage of Voltaire's Candide:

Each word is translated individually and without context. This means that in most cases the translations will read awkardly and sometimes imprecisely as well. For the purposes of its intended usage this is not much of an issue, as a competent reader in a target language only looks up the occasional low-frequency word, and should in most cases be able to contextualize it within the larger sentence.

The program uses Calibre's ebook-convert to convert between formats andgoogletrans to translate words. So far interlinearize has only been tested on books in the .epub format, and translations from French to English. Although in theory the program should work fine with any book format supported by Calibre, and any languages supported by Google Translate. interlinearize has only been tested on Linux.

The repository comes with two examples. Candide by Voltaire and Madame Bovary by Gustave Flaubert, along with their interlinear versions. You can find them inside the compressed folder interlinearized_books.zip.

Installation

interlinearize does not need to be installed and can be run straight out of the repository. However, a few dependencies needs to be staisfied. The interlinearizer depends on googletrans, BeautifulSoup and nltk. To install these, cd into the repository and run the command

pip install requirements.txt

You will also have to install Calibre, as interlinearize makes use of its ebook-convert. Furthermore, you'll need to have Python 3 installed.

If you do want to "install" interlinearize (i.e. make it accessible from anywhere within the terminal), you can move interlinearize.py to /usr/local/bin:

chmod +x interlinearize.py
cp interlinearize.py /usr/local/bin/interlinearize

or any other folder that is in your PATH environmental variable. If you do this, then interlinearize will automatically generate configuration files in ~/.interlinearize/ the first time it is run.

Usage

Use the interlinearizer in the following way:

python interlinearize.py src dest book.format1 output.format2

where src and dest are the source and destination languages respectively, and book.format1 the input book of a given format and output.format2 the output of the desired format.

For example, to translate the copy of Candide in the repository execute the following command

python interlinearize.py fr en "Candide - Voltaire.epub" "Candide - Voltaire (interlinearized).epub"

If you want an HTML version of the interlinearized book, then omit the file extension for the output

python interlinearize.py fr en "Candide - Voltaire.epub" "Candide - Voltaire (interlinearized)"

Whenever you use the interlinearize, the dictionaries it assembles will always be stored as text files in the dict folder (for more details see the section below), so the more books you use it on the less time it will take to translate.

See this for a list of language codes.

Other commands

Open the config file:

python interlinearize.py -c config

Open the style file:

python interlinearize.py -c css

Open the a dictionary:

python interlinearize.py -c dict src dest

Reset the config|css|all to default:

python interlinearize.py -c clear config|css|all

Reset a dictionary:

python interlinearize.py -c cleardict src dest

Settings

You can change the formatting of the interlinearized text by editing the interlinear.css style file. Other settings can be found in the interlinearize.config file.

You can also edit the dictionaries directly in the dict folder. Each dictionary is named in the format src_dest.txt, so a French to English dictionary would be in dict/fr_en.txt for example. See this for a list of language codes.

interlinearize always looks for the configuration and dictionary files within the execution folder first. If it cannot find the files there, it will then look in ~/.interlinearize. If that folder does not exist, it will then create it and place default config files there.

Script not working with included test epub.

It appears there is a problem with googletrans. Tested on fedora. Attached is the traceback.

Converting book to HTMLZ
Finding translations of new words
Exception in thread Thread-18 (translate_words):
Traceback (most recent call last):
  File "/usr/lib64/python3.10/threading.py", line 1009, in _bootstrap_inner
    self.run()
  File "/usr/lib64/python3.10/threading.py", line 946, in run
    self._target(*self._args, **self._kwargs)
  File "/home/amichaygiuili/Downloads/interlinearize/interlinearize.py", line 320, in translate_words
    ts = translator.translate(words_to_translate, src=src, dest=dest)
  File "/home/amichaygiuili/.local/lib/python3.10/site-packages/googletrans/client.py", line 127, in translate
    translated = self.translate(item, dest=dest, src=src)
  File "/home/amichaygiuili/.local/lib/python3.10/site-packages/googletrans/client.py", line 132, in translate
    data = self._translate(text, dest, src)
  File "/home/amichaygiuili/.local/lib/python3.10/site-packages/googletrans/client.py", line 57, in _translate
    token = self.token_acquirer.do(text)
  File "/home/amichaygiuili/.local/lib/python3.10/site-packages/googletrans/gtoken.py", line 180, in do
    self._update()
  File "/home/amichaygiuili/.local/lib/python3.10/site-packages/googletrans/gtoken.py", line 59, in _update
    code = unicode(self.RE_TKK.search(r.text).group(1)).replace('var ', '')
AttributeError: 'NoneType' object has no attribute 'group'
^CTraceback (most recent call last):
  File "/home/amichaygiuili/Downloads/interlinearize/interlinearize.py", line 568, in <module>
    construct_word_list_from_text(word_list, word_dict, src_lan, dest_lan, service_urls, words_per_request)
  File "/home/amichaygiuili/Downloads/interlinearize/interlinearize.py", line 343, in construct_word_list_from_text
    t_dict = que.get()
  File "/usr/lib64/python3.10/queue.py", line 171, in get
    self.not_empty.wait()
  File "/usr/lib64/python3.10/threading.py", line 320, in wait
    waiter.acquire()
KeyboardInterrupt

lukastk / interlinearize Goto Github PK

interlinearize's Introduction

Interlinearizer

Installation

Usage

Other commands

Settings

interlinearize's People

Contributors

Stargazers

Watchers

Forkers

interlinearize's Issues

Script not working with included test epub.

UnicodeDecodeError

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent