Giter Club home page Giter Club logo

pngtotxt's Introduction

pngtotxt

Convert images of text(e.g. pages of book, science paper etc) to ONE concatenated .txt file.

pngtotxt is a simple bash script that is based on the Tesseract Open Source OCR Engine (main repository)

https://tesseract-ocr.github.io/

https://github.com/tesseract-ocr/tesseract

INSTALL

DEPENDENCIES

INSTALL

Git clone this repo :

git clone https://gitlab.com/christosangel/pngtotxt.git

Change directory to pngtotxt:

cd pngtotxt

Make pngtotxt file executable:

chmod +x pngtotxt

You may copy this file to your /usr/local/bin if you like:

sudo cp pngtotxt/usr/local/bin/

LANGUAGE SUPPORT

In order to be able to convert images of text in specific languages :

  1. Go to this link to see whether this specific language is supported by tesseract, and if yes,

  2. Change /add lines to the pngtotxt script (lines 56 to 65), adding name and code of wanted language(s).

RUN

$ pngtotxt

HOW TO USE IT

Gather all the numbered images that you wish to convert at your desktop.

CAUTION: Remove all other irrelevant images from the desktop directory.

Images can be either png or gif.

Run pngtotxt, and you will be welcome with this message:

image 1

After you press Enter, you will be asked to give the name you wish for the final text:

image 2

After naming the final text, you will be asked for the language of the text:

image 3

After language selection, the conversion commences:

image 4

Finally, you get a message when the conversion is complete:

image 5

In the desktop you will find a folder named by the text name, that will contain:

  1. The final text .txt file
  2. an image scroll with the images concatenated.
  3. a folder that contains all the png images.
  4. a folder that contains all the txt files from the conversion.

pngtotxt's People

Contributors

christosangelopoulos avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.