Giter Club home page Giter Club logo

tesstrainsh-win's Introduction

tesstrainsh-win

Train Tesseract LSTM with tesstrain.sh on Windows

About tesstrainsh-win

The file structure in tesstrainsh-win: image

Recommendations for choosing a training method

image

Requirements

tesseract

You will need a recent version (>= 4.0.0beta1) of tesseract built with the training tools and matching leptonica bindings.

Build instructions and more can be found in the Tesseract project wiki.

Build tesseract instructions on Windows can be found in the Tesseract4.0+VS2017+win10.

Git for Windows

Install the Git for Windows,and add the path where Git / bin is located to the Path path of the environment variable.

How to Use tesstrainsh-win

  1. Copy the font file to be trained to the tesstrain / fonts path. tesstrain.sh supports training multiple font files at the same time. e. g., There is a font file in tesstrain / fonts / Impact.ttf.

  2. Copy the langdata_lstm files of the font to be trained to tesstrainsh-win / langdata_lstm. e. g., all files downloaded from the langdata_lstm/eng place in the tesstrainsh-win / langdata_lstm / eng. If you need to train Simplified Chinese, create a new chi_sim folder under the tesstrainsh-win / langdata_lstm path, download all files under langdata_lstm/chi_sim and place them under the tesstrainsh-win / langdata_lstm / chi_sim path.

  3. Copy the basic traineddata to be trained to the path tesstrainsh-win / tessdata. e. g., The path of the eng.traineddata is tesstrainsh-win / tessdata / eng.traineddata. The .traineddata download from tessdata_best.

  4. Copy lstm.train to the path tesstrainsh-win/tessdata/configs. The file is under the tesseract installation path.(e.g., C: / Program Files/tesseract/tessdata/configs). I have prepared the file in tesstrainsh-win. But my Tesseract version is 4.1, if your Tesseract version is different,it is recommended that the file version also be consistent with your Tesseract version. Find the file from the tesseract installation path and copy them to tesstrainsh-win / tessdata / configs to overwrite the existing files.

  5. The tesstrain.sh, tesstrain_utils.sh and language-specific.sh under the tesstrainsh-win project are copied from the Tesseract4.1 source code (Tesseract / src / training). If you are using other versions of training tools, it is recommended that these three file versions also be consistent. Find these three files in the source code path and copy to the path tesstrainsh-win to overwrite the existing files.

  6. Run the command prompt as an administrator, Go to the tesstrainsh-win directory, e. g.:

cd %USERPROFILE%/tesstrainsh-win
  1. run sh tesstrainDone.sh
sh tesstrainDone.sh

When running this command, the following error may occur:

Reducing Trie to SquishedDawg
Error during conversion of wordlists to DAWGs!!
ERROR: Program Program failed. Abort. 

It because that the LF in Unix corresponding to the CRLF in Windows. Please correct all the CRLF in the files in the path of tesstrainsh-win/langdata_lstm/eng to LF.

If the command is executed successfully, these tips will be showed:

Finished! Error rate = 0.864
Loaded file output/impact_checkpoint, unpacking... 
  1. During the training process, two folders will be created under the tesstrainsh-win path:train and output.

train: The intermediate files generated during the training process are in this folder, for example, .box / .tif / .lstmf and other files are in this folder.

output: The staged checkpoints generated during the training process and the Impact.traineddata will be placed in this folder.

  1. if you want to evaluate training results, you could run the command:
sh eval.sh

The code to evaluate the eng.traineddata:

lstmeval \
--model train/eng.lstm \
--traineddata tessdata/eng.traineddata \
--eval_listfile train/eng.training_files.txt 

The code to evaluate the new trained traineddata, e.g.: Impact.traineddata:

lstmeval \
--model output/Impact_checkpoint \
--traineddata tessdata/eng.traineddata \
--eval_listfile train/eng.training_files.txt 

More Information About Train Tesseract LSTM

More information about Train Tesseract LSTM could refer to:

Train Tesseract LSTM methods Comparison

Train Tesseract LSTM with make on Windows

How the makefile in tesstrain-win work

Train Tesseract LSTM with tesstrain.sh on Windows

Win10 Tesseract4.1 LSTM training

tesstrainsh-win's People

Contributors

livezingy avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.