openalpr / train-ocr Goto Github PK

Input files and scripts necessary to train the license plate OCR

License: GNU Affero General Public License v3.0

Python 100.00%

train-ocr's Introduction

train-ocr

This repository provides code and data that can be used to train custom license plate fonts in support of the OpenALPR library.

The OCR library used by OpenALPR is Tesseract. Many of the tedious aspects of OCR training have been automated via a Python script. However, the input data still needs to be in a specific format to satisfy Tesseract.

For more information about training using Tesseract OCR, please read this tutorial: https://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3

To get started, first clone the repository and get familiar with the input files. In the "eu/input" folder, there are a number of tif files and box files. Each "font" will have at least one tif and box file. A country's license plate may have many fonts, each one would just use a different name.

The naming convention is: l[country_code].[fontname].exp[pagenumber].box

For example, the European German license plate font would look like: leu.germany.exp0.box

Open up a tif file. Notice, these are a series of similar looking letters and numbers. The best way to generate these is from actual license plate images. OpenALPR has a couple utilities to help generate these input files. The first step is to find many pictures of your license plates. Make sure to separate them by font. Sometimes, even within a single region, the license plate fonts will vary (e.g., between old plates and new plates, or digital vs stamped plates, or vehicle plates vs bicycle plates). Each unique font should be a different file in order to achieve the highest accuracy.

Adding a new Country

If you plan on training OCR for a completely new country, you will first need to configure the dimensions of the plate and characters. Add a new file in runtime_data/config/ with your country's 2-digit code. You can copy and paste a section from another country (e.g., us or eu).

You should tweak the following values:

plate_width_mm = [width of full plate in mm]
plate_height_mm = [height of full plate in mm]
char_width_mm = [width of a single character in mm]
char_height_mm = [height of a single character in mm]
char_whitespace_top_mm = [whitespace between the character and the top of the plate in mm]
char_whitespace_bot_mm = [whitespace between the character and the bottom of the plate in mm]
template_max_width_px = [maximum width of the plate before processing. Should be proportional to the plate dimensions]
template_max_height_px = [maximum height of the plate before processing. Should be proportional to the plate dimensions]
min_plate_size_width_px = [Minimum size of a plate region to consider it valid.]
min_plate_size_height_px = [Minimum size of a plate region to consider it valid.]
ocr_language = [name of the OCR language -- typically just the letter l followed by your country code]

Understanding Your Country's Plates

The first thing you need to know is how many fonts your country's license plates have. In the US, for example, many states use very different fonts for their plates. Some countries only use one font. Here is an example of New York and West Virginia,. Notice how different the "6" character is in both plates:

Each font needs to be trained separately. You do not want to combine characters across fonts, this will greatly decrease your accuracy. After each font is trained, they can be combined into one dataset for your entire country.

Creating the character tiles

Once you're ready to start training, you'll need to create a library of character tiles. Each tile is a small image file that contains the black-and-white character and is named after the character. For example, here are a few character tile examples:

부-0-0-2.png

0-0-az2012.png

c-1-az2012.png

d-9-az2012.jpg

d-9-2-az2012.jpg

You will want many of these character tiles for each character and each font. The character tiles are all going to be slightly different, this is necessary for the OCR training to understand how to detect characters. Notice in the above examples, the "D" characters have pixels located in different places, but they're clearly the same character.

Producing Tiles

There are two good ways to produce character tiles.

Use actual images from license plates
Use a TTF font that looks like the license plate font

Producing Tiles from Actual Plates

You should gather a large library of license plate images (At least 100). These license plate images should be cropped around the plate and the aspect ratio should match your configured width/height for your license plates. Make sure each image is at least 250px wide. The imageclipper program (separate repo) is helpful for quickly cropping large numbers of images. Save them as png files.

Each file should be prefaced with a two character identifier for the font/region. For example, for Maryland plates, we would name the file: mdplate1.png

Create an empty output directory.

To start classifying characters, use the classifychars utility program included in OpenALPR.

Execute the command: classifychars [country] [input image directory] [empty output directory]

A GUI will open up and analyze each license plate image in your input folder. The steps to classify each plate are:

Press the "Enter" key and type the letter or number for each position that you wish to classify. Pressing 'Space' will skip the character.
Use the arrow keys and press 'Space' to select the rendering that you wish to extract characters for. The box will be highlighted in blue if it is selected. For each plate, there may be good characters and bad characters. You want to pick the best characters, since significant imperfections may confuse the OCR.
Press the 's' key to save each character as a separate file in your out folder.
Press the 'n' key to move onto the next plate and repeat this process until you've classified all the plates.

Producing Tiles from a TTF Font

A TTF font can be used to produce tiles. However, we need to add some realistic distortion to the characters. This is necessary to make a robust OCR detector.

The process is as follows:

Figure out all the characters that could possibly be in a license plate.
Create a word document with all of these characters. Make sure there is plenty of spacing between lines and characters.
Copy and paste all of these characters to a text file (no spaces or line breaks)
Print this word document.
Take a few pictures (5 would be sufficient) of the word document with a digital camera. Vary the angle/rotation very slightly (1-2 degrees) with each picture.
Save the pictures to a folder.
Run the openalpr-utils-binarizefontsheet program to produce tiles from each of the images. Provide the program with the text file from step #3 and each image file.

Building a Tesseract Training Sheet

Once you've classified all the characters, it may be a good idea to scan through the directory to make sure that the classifications match the images. Each image filename should be prefaced with the character that it represents. Once you've done this, it's time to create a training sheet.

The "openalpr-utils-prepcharsfortraining" utility program in OpenALPR will create the Tesseract training sheet for you. Execute the following command: openalpr-utils-prepcharsfortraining [output directory from above]

The output will be:

combined.box
combined.tif

Rename these files to match the naming convention used by Tesseract (explained above). For example, leu.germany.exp0.box

You should create a training sheet for each unique license plate font that you wish to train.

Training the OCR

Lastly, you'll use the box/tif files created above to train your country's license plate OCR. Create a new directory using your country code, and create an input directory within it. Copy all the box/tif files created in the previous steps into this directory.

Execute the "train.py" file. Type in your country code.

If all went well, you should have a new file named l[countrycode].traineddata. Copy this file into your runtime_directory (runtime_data/ocr/tessdata/) and it is now ready for OpenALPR to use.

Tesseract may report issues. Most commonly it will complain that it could not line up the boxes on the provided image. If you are getting many of these warnings, you can re-run the openalpr-utils-prepcharsfortraining utility and provide values for --tile_width and --tile_height. Using different values will change how Tesseract sees the image and potentially improve results.

train-ocr's People

Contributors

Stargazers

Watchers

Forkers

pranaysharma samuel--hu fredperimlopes-zz jcongote andreyserenkov getyourbots jell0720 oliveiracwb maxwellxxx peters 100star flaviostutz twevan kuangyunsheng seanfrancisn hyuni kees-v wangxiong2015 garora hellhond gang929 sjwang1988 micaelomota lambasoft chagge jovargas lromang mbogelund arasharchor cristineguadelupe billiebbb renato-c-diogo sunsiz pixmat nishantkyal ocr-apps-websites dva00jnd yogid maechler yimai-io xianfengju luchianno haixoay96 zgsxwsdxg wallisch yoyohip ivmtorres joe2hpimn peggyqi co8yy 0x38 hanasakai esiexata sunmaximus tolasom kitter nemesis97 hupong sunyotech pbsgoncalves santiagohdzb pavq d4b1d3 losabio1617 dongyanchaotj gparracl magno-m-s-silva s33k3rs nicholaspei iheo joranerogier greysmurf666 ahmedriahi arnoldtkl tiravata xe1gyq dyama kuyon moralesmx apodolniy nuno-q ct3huang studuino ales41 duyvt-it anpr-australia guifagotti xiaoshzx manuelvr461 anis-byanjankar dj-garfield stevelin168 bitcoinben1 minicore2 theone25 angus151 soufianesabiri matheusvalbert islamailani serdar-sahin

train-ocr's Issues

using train.py

Hello friends,

I have some trouble about dealing with train.py .
I have managed to install tesseract with https://github.com/tesseract-ocr/tesseract/wiki/Compiling
But i don't understand one thing. In train.py
`TESSERACT_DIR='..../tesseract'

os.environ["TESSDATA_PREFIX"] = TESSERACT_DIR

os.system("export TESSDATA_PREFIX=" + TESSERACT_DIR)

TESSERACT_BIN=TESSERACT_DIR + '/tesseract'
TESSERACT_TRAINDIR= TESSERACT_DIR + '/training'
`
In my tesseract folder there exist training but there isnt any teserract.

Thank you

Where is openalpr-utils-prepcharsfortraining ?

I have OpenALPR, Tesseract and OpenCV installed in Windows 10. Where can I find the file openalpr-utils-prepcharsfortraining to create the tif from the character images? Please help! Thanks in advance!

My new country training will be available in Cloud API?

Hi,
I will train openALPR for Cambodia, but I am using the Cloud API in my app. Will my training be available within the Cloud API?

How to train ALPR for chinese plate?

I am training openALPR for Chinese characters. But I found it 's very difficult. I can't under stand the alpr-train document. Is there any examples to learn? Any help is appreciated.

openalpr-utils-binarizefontsheet program

i Install openalpr open cv and tesseract with the tutorial and i need to use openalpr-utils-binarizefontsheet but its actually give me this issue : bash: openalpr-utils-binarizefontsheet : command not found
im using raspbian stretch

Outdated Instructions

Can someone help me on how to execute with a clear instruction and updated instruction in training ocr of openalpr. I'm about to add new country and train with it's plate number.

Help to train OCR

I am training openALPR for Chinese characters. I have problem while training my images, the code accepts only A-Z and 0-9 as inputs, can you guide me how to give Chinese characters as input? I have scim installed to switch between languages but i am not able to do it with this trainer. Any help is appreciated. Thanks in advance.

Error when doing chinese plate training!

I use TTF training on linux server to training chinese plates.
The lcn.china.exp0.box and lcn.china.exp0.tif are copied under the directory: train-ocr/cn/input

The I run the train.py (path: train-ocr/). Error happens:
Generated training data for 9 words
Warning in pixReadMemTiff: tiff page 1 not found
sh: /data/tesseract-3.04.00/training/unicharset_extractor: No such file or directory
Executing: /data/tesseract-3.04.00/training/mftraining -F ./tmp/font_properties -U unicharset -O ./tmp/lcn.unicharset ./tmp/*.tr
sh: /data/tesseract-3.04.00/training/mftraining: No such file or directory
rm: cannot remove ./unicharset': No such file or directory mv: cannot stat./tmp/lcn.unicharset': No such file or directory
cp: cannot stat ./cn/input/unicharambigs': No such file or directory sh: /data/tesseract-3.04.00/training/cntraining: No such file or directory mv: cannot stat./shapetable': No such file or directory
mv: cannot stat ./pffmtable': No such file or directory mv: cannot stat./inttemp': No such file or directory
mv: cannot stat ./normproto': No such file or directory sh: /data/tesseract-3.04.00/training/combine_tessdata: No such file or directory mv: cannot stat./lcn.unicharset': No such file or directory
mv: cannot stat ./lcn.shapetable': No such file or directory mv: cannot stat./lcn.pffmtable': No such file or directory
mv: cannot stat ./lcn.inttemp': No such file or directory mv: cannot stat./lcn.normproto': No such file or directory
mv: cannot stat `./lcn.unicharambigs': No such file or directory

library not working with indian number plate

Each "font" will have at least one tif and box file. A country's license plate may have many fonts, each one would just use a different name.

what this means??

Plates with varying character height

Is there a support of plates with varying character height?
I'm trying to make a training set for Russian plates, but don't know which height should I specify in config.

Train.py for windows

Can you please post the code for train.py for it to work in windows as it is designed for Linux system only .On my windows system it is showing that some commands are not found(rm,mv,cp) .So please help me out.

Possibly post a code which works on windows based systems.

Thanks in advance.

Error in train.py specifying tesseract-ocr

I have properly installed my tesseract-ocr and directed it where it installed using absolute path. When I run the train.py it returns this error.

Two-Letter Country Code to Train: gb
Processing: ./gb/input/luk.uk.exp0.box
Executing: /home/nigel/tesseract-ocr/tesseract -l eng ./gb/input/luk.uk.exp0.tif luk.uk.exp0 nobatch box.train.stderr
sh: 1: /home/nigel/tesseract-ocr/tesseract: not found
mv: cannot stat './luk.uk.exp0.tr': No such file or directory
mv: cannot stat './luk.uk.exp0.txt': No such file or directory
sh: 1: /home/nigel/tesseract-ocr/training/unicharset_extractor: not found
Executing: /home/nigel/tesseract-ocr/training/mftraining -F ./tmp/font_properties -U unicharset -O ./tmp/lgb.unicharset ./tmp/*.tr
sh: 1: /home/nigel/tesseract-ocr/training/mftraining: not found
rm: cannot remove './unicharset': No such file or directory
mv: cannot stat './tmp/lgb.unicharset': No such file or directory
cp: cannot stat './gb/input/unicharambigs': No such file or directory
sh: 1: /home/nigel/tesseract-ocr/training/cntraining: not found
mv: cannot stat './shapetable': No such file or directory
mv: cannot stat './pffmtable': No such file or directory
mv: cannot stat './inttemp': No such file or directory
mv: cannot stat './normproto': No such file or directory
sh: 1: /home/nigel/tesseract-ocr/training/combine_tessdata: not found
mv: cannot stat './lgb.unicharset': No such file or directory
mv: cannot stat './lgb.shapetable': No such file or directory
mv: cannot stat './lgb.pffmtable': No such file or directory
mv: cannot stat './lgb.inttemp': No such file or directory
mv: cannot stat './lgb.normproto': No such file or directory
mv: cannot stat './lgb.unicharambigs': No such file or directory

This is my current directory. /home/nigel/train/train-ocr
My tesseract-ocr installed on /home/nigel/tesseract-ocr

this is my code in train.py TESSERACT_DIR='/home/nigel/tesseract-ocr/src'

Please help me with this.

What is the difference between train-oct and train-detector?

I'm quite new to openalpr. I'd like to know what is the difference between train-ocr and train-detector. Which one should I use or both have to be used to train data for new number plates? Thanks.

Tesseract Path in Linux/Ubuntu

I installed Tesseract in Ubuntu using the command sudo apt-get install tesseract-ocr. In your repository where there is train.py it needs the location for Tesseract [TESSERACT_DIR]. Tell me where it is installed in Ubuntu or any Linux based system??

Reply as soon as possible.

Thank you in Advance.

trained new data

hi i tested the image that i cropped from imageclipper then tried to generate tile but i have this

(test) losabio@losabio-System-Product-Name:~/env/openalpr/src/build/misc_utilities$ openalpr-utils-classifychars ph ./WINPUT/ ./train-ocr/ph/input
Usage:
n -- Next plate
p -- Previous plate
W -- Select image and save characters according to OCR results, then go to next image
s -- Save characters
<- and -> -- Cycle between images
Ent/space -- Select plate

Within a plate
<- and -> -- Cycle between characters
[0-9A-Z] -- Identify a character (saves the image)
ESC/Ent/Space -- Back to plate selection
./WINPUT//AQA6542.jpg_0000_0253_0390_0147_0056.png
Segmentation fault (core dumped)

Data for [eu]

Hello,

do you know why leu.germany.exp0.tif looks so different than the belgium/netherland one? I mean it's not "sorted" like the others
is the data used to generate these available somewhere? the eu folder in train-detector just contains raw plates.

Thanks.

documentation images

Getting the training set

Hi guys, I am sorry if my request is already answered here. I could not find a training set that I can use to train my tesseract.
I would like to train for alphabetic and digits characters.

Can you please help me?

Multiline license plate

Assume I have training data for single line license plate. The OCR result is quite accurate. I want to extend it for multiline license plate. Have a few questions:

Because the fonts used for single/multi line license plates are exactly the same. So I think I don't need to do the training. Am I right?
I should do the configuration for multiline license plate. Is there any guide for this?
Can I use two separated configuration files at the same time? How?

Any idea? Thanks.

GH plates training error

Hi,
I'm trying to train the ghana plates but I've only 33 plates images.
I'm getting the error below, what is the cause and how to fix it?
Thanks.

openalpr-utils-binarizefontsheet no result file

i use openalpr-utils-binarizefontsheet to produce tiles, but why no result file in my directory ?

what im doing wrong ?

Error: You have not tagged any characters

When I use the command "openalpr-utils-classifychars us [folder_with_plates] [empty folder]"

Appears this: http://prntscr.com/n7ok3o

Then I press space to mark it (in blue), like this: http://prntscr.com/n7oki1

Then what is supposed to do? I tried to move with arrow keys, press enter or the characters, but nothing works.

If I press space and then the letter S, or when I press arrows keys, appears: 'You have not tagged any characters', Obviously I have not labeled any character, if I can't.

The name of files are like: [country]plate1.png -> chplate1.png
And weight min. 250px.

Please do you know what can I do??

tif character spacing

Hi, I'm trying to train ocr for a specific font based on about 200 plates and I noticed the combined.tif file generated had the characters very close to each others causing the trainer program to fail, when I compared to the tif files inside the eu directory I could tell right away that those seem to be generated better, is there a way to set the spacing when generating the tif file? currently I could train with about 130 plates but once I reached 200 plates and my charset increased, tesseract failed on every character and couldn't find any blobs inside the tif file.

Failed to create .traineddata file from box and tif for Indonesian license plate

Greetings Matt, I have an issue when executing the last step for training Indonesian country with openALPR. In the train-ocr rep, the last step ("Training the OCR") describes that we have to execute the "train.py" to generate .traineddata file (in my case would be lid.traineddata if success) from .box and .tif file. From this step, I completely stuck because this program gives 'permission denied' error in the tessdata directory, even if I already changed the permission state by chmod in order to give permission of train.py to change, read, and write the tessdata directory.

Here's the directory from code:

TESSERACT_DIR='/media/gspeintercon/GSPE1/GIT/tesseract-4.0.0-beta.1'
os.environ["TESSDATA_PREFIX"] = '/media/gspeintercon/GSPE1/GIT/tesseract-4.0.0-beta.1'
os.system("export TESSDATA_PREFIX=" + '/media/gspeintercon/GSPE1/GIT/tesseract-4.0.0-beta.1')
TESSERACT_BIN= '/media/gspeintercon/GSPE1/GIT/tesseract-4.0.0-beta.1/tessdata'
TESSERACT_TRAINDIR= '/media/gspeintercon/GSPE1/GIT/tesseract-4.0.0-beta.1/training'

Here's the error result:

$ sudo python train.py
Two-Letter Country Code to Train: id
Processing: ./id/input/lid.indonesia.exp0.box
Executing: /media/gspeintercon/GSPE1/GIT/tesseract-4.0.0-beta.1/tessdata -l eng ./id/input/lid.indonesia.exp0.tif lid.indonesia.exp0 nobatch box.train.stderr
sh: 1: /media/gspeintercon/GSPE1/GIT/tesseract-4.0.0-beta.1/tessdata: Permission denied
mv: cannot stat './lid.indonesia.exp0.tr': No such file or directory
mv: cannot stat './lid.indonesia.exp0.txt': No such file or directory
Extracting unicharset from box file ./id/input/lid.indonesia.exp0.box
Other case a of A is not in unicharset
Other case b of B is not in unicharset
Other case c of C is not in unicharset
Other case d of D is not in unicharset
Other case e of E is not in unicharset
Other case f of F is not in unicharset
Other case g of G is not in unicharset
Other case h of H is not in unicharset
Other case i of I is not in unicharset
Other case j of J is not in unicharset
Other case k of K is not in unicharset
Other case l of L is not in unicharset
Other case m of M is not in unicharset
Other case n of N is not in unicharset
Other case o of O is not in unicharset
Other case p of P is not in unicharset
Other case q of Q is not in unicharset
Other case r of R is not in unicharset
Other case s of S is not in unicharset
Other case t of T is not in unicharset
Other case u of U is not in unicharset
Other case v of V is not in unicharset
Other case w of W is not in unicharset
Other case x of X is not in unicharset
Other case y of Y is not in unicharset
Other case z of Z is not in unicharset
Wrote unicharset file unicharset
Executing: /media/gspeintercon/GSPE1/GIT/tesseract-4.0.0-beta.1/training/mftraining -F ./tmp/font_properties -U unicharset -O ./tmp/lid.unicharset ./tmp/.tr
Warning: No shape table file present: shapetable
Reading ./tmp/.tr ...

Error: Unable to open ./tmp/.tr!
"Fatal error encountered!" == NULL:Error:Assert failed:in file globaloc.cpp, line 75
Segmentation fault (core dumped)
mv: cannot stat './tmp/lid.unicharset': No such file or directory
cp: cannot stat './id/input/unicharambigs': No such file or directory
Reading ./tmp/.tr ...

Error: Unable to open ./tmp/*.tr!
"Fatal error encountered!" == NULL:Error:Assert failed:in file globaloc.cpp, line 75
Segmentation fault (core dumped)
mv: cannot stat './shapetable': No such file or directory
mv: cannot stat './pffmtable': No such file or directory
mv: cannot stat './inttemp': No such file or directory
mv: cannot stat './normproto': No such file or directory
Combining tessdata files
Error: traineddata file must contain at least (a unicharset fileand inttemp) OR an lstm file.
Error combining tessdata files into lid.traineddata
Version string:4.00.00alpha
23:version:size=12, offset=192
mv: cannot stat './lid.unicharset': No such file or directory
mv: cannot stat './lid.shapetable': No such file or directory
mv: cannot stat './lid.pffmtable': No such file or directory
mv: cannot stat './lid.inttemp': No such file or directory
mv: cannot stat './lid.normproto': No such file or directory
mv: cannot stat './lid.unicharambigs': No such file or directory

Can you help me where was my error cause, I tried it with another country which the box and tif file was available from the repo but still can't generate the .traineddata file and gives same error. Thank you.

Add trained data for Swiss license plates

We are currently training the Swiss license plate font and I would like to create a pull request out of this. Since Switzerland is surrounded by EU countries, we also need to deal with EU plates. That is why we actually got the best results by adding the Swiss license plate style as an additional font to the EU trained data.
When creating the pull request, should I add the data to eu or create a new config for ch?

Issue running train.py

Hi Matt,

I am getting the following error, please help :)

Two-Letter Country Code to Train: sr
Processing: ./sr/input/leu.serbia.exp0.box
Executing: /home/harsh/Documents/Number_Plate_Training/tesseract -l eng ./sr/input/leu.serbia.exp0.tif leu.serbia.exp0 nobatch box.train.stderr
sh: 1: /home/harsh/Documents/Number_Plate_Training/tesseract: Permission denied
mv: cannot stat './leu.serbia.exp0.tr': No such file or directory
mv: cannot stat './leu.serbia.exp0.txt': No such file or directory
sh: 1: /home/harsh/Documents/Number_Plate_Training/training/unicharset_extractor: not found
Executing: /home/harsh/Documents/Number_Plate_Training/training/mftraining -F ./tmp/font_properties -U unicharset -O ./tmp/lsr.unicharset ./tmp/*.tr
sh: 1: /home/harsh/Documents/Number_Plate_Training/training/mftraining: not found
rm: cannot remove './unicharset': No such file or directory
mv: cannot stat './tmp/lsr.unicharset': No such file or directory
cp: cannot stat './sr/input/unicharambigs': No such file or directory
sh: 1: /home/harsh/Documents/Number_Plate_Training/training/cntraining: not found
mv: cannot stat './shapetable': No such file or directory
mv: cannot stat './pffmtable': No such file or directory
mv: cannot stat './inttemp': No such file or directory
mv: cannot stat './normproto': No such file or directory
sh: 1: /home/harsh/Documents/Number_Plate_Training/training/combine_tessdata: not found
mv: cannot stat './lsr.unicharset': No such file or directory
mv: cannot stat './lsr.shapetable': No such file or directory
mv: cannot stat './lsr.pffmtable': No such file or directory
mv: cannot stat './lsr.inttemp': No such file or directory
mv: cannot stat './lsr.normproto': No such file or directory
mv: cannot stat './lsr.unicharambigs': No such file or directory

Installation of Tesseract in Local Directory

I'm having a trouble in installing tesseract. I've been always encountering the error after using the command make in every instruction that I have found on how to install tesseract-ocr.

Did someone installed it successfully? If so, can you link to me what instructions that you have follow to make it work?

My issue was the train.py cannot find the tesseract-ocr though I installed it correctly using sudo, I tried also other way like installing it manually or to a local directory.

Please help me to get rid this. Thank you very much.

How can I get the .xml file in /runtime_data/region for my country？

Hello, I have got the traineddata from a TTF Font, but an error "--(!)Error loading CPU classifier" returned when I run alpr. So, how can I get the xml file in /runtime_data/region for my country？Thanks!

Instructions on adding a country

Hi, I have installed openalpr, but it only gives us a nominae of the USA, Europe. How to add numbers from ukraine to me? You can give some step-by-step instructions for adding a new country. Thank you.

Training Tesseract 4

The new version of Tesseract has a different training process because of the LSTM engine integration. Files from version >= 3.04 will work on the new version but they won't be using the latest and more accurate features. A new training process should be created in order to properly use Tesseract 4 in OpenALPR.

The openalpr python API

When I read the API page http://doc.openalpr.com/bindings.html#python ,
the first line 'from openalpr import Alpr' cause a problem 'ImportError: No module named openalpr'.
So I think it need a module named openalpr, but where can I get the module?
Wish you can help me!Thanks！

Train for two different kind of license plates

Hi
Currently Argentina has two license plates models (6 chars vs 7 chars). We are transitioning from one model to another. Is it possible to train OpenALPR to recognize both models? Or I would have to try to recognize for one model and if no results are found, try with the other?

Thanks!

error while using classifychars

1 week ago i used openalpr-utils-classifychars program nicely (only the W -- select image and save character ..... was not working ) . Today i rebuild openalpr in different pc, I think there has been some updates and i couldnt use the program. only n - next plate , p - previous plate and space button is working. Terminal says Did not select any boxes for other button. Am i doing wrong something? How can i fix the problem do you think?

Issues with train.py for new country

I'm getting the following errors while running train.py on Ubuntu 16.04 with Python 2.7:

Processing: ./ae/input/lae.abudhabi.exp0.box
./ae/input/lae.abudhabi.exp0.tif
Executing: /home/user123/train-ocr/tesseract-ocr/tesseract -l eng ./ae/input/lae.abudhabi.exp0.tif  lae.abudhabi.exp0 nobatch box.train.stderr
sh: 1: /home/user123/train-ocr/tesseract-ocr/tesseract: Permission denied
mv: cannot stat './lae.abudhabi.exp0.tr': No such file or directory
mv: cannot stat './lae.abudhabi.exp0.txt': No such file or directory
sh: 1: /home/user123/train-ocr/tesseract-ocr/tesseract/training/unicharset_extractor: not found
Executing: /home/user123/train-ocr/tesseract-ocr/tesseract/training/mftraining -F   ./tmp/font_properties -U unicharset -O ./tmp/lae.unicharset ./tmp/*.tr
sh: 1: /home/user123/train-ocr/tesseract-ocr/tesseract/training/mftraining: not found
rm: cannot remove './unicharset': No such file or directory
mv: cannot stat './tmp/lae.unicharset': No such file or directory
cp: cannot stat './ae/input/unicharambigs': No such file or directory
sh: 1: /home/user123/train-ocr/tesseract-ocr/tesseract/training/cntraining: not found
mv: cannot stat './shapetable': No such file or directory
mv: cannot stat './pffmtable': No such file or directory
mv: cannot stat './inttemp': No such file or directory
mv: cannot stat './normproto': No such file or directory
sh: 1: /home/user123/train-ocr/tesseract-ocr/tesseract/training/combine_tessdata: not found
./ae/ae.config
Applying config file: ./ae/ae.config
lae.traineddata
sh: 1: /home/user123/train-ocr/tesseract-ocr/tesseract/training/combine_tessdata: not found
config file: /home/user123/train-ocr/tesseract-ocr/tesseract/training/combine_tessdata -o   lae.traineddata ./ae/ae.config
status:  32512
mv: cannot stat './lae.unicharset': No such file or directory
mv: cannot stat './lae.shapetable': No such file or directory
mv: cannot stat './lae.pffmtable': No such file or directory
mv: cannot stat './lae.inttemp': No such file or directory
mv: cannot stat './lae.normproto': No such file or directory
mv: cannot stat './lae.unicharambigs': No such file or directory

What's wrong? Also, where do I get the .config file from?

can train-orc work with tesseract 3.05

I only downloaded tesseract 3.05 binary for windows. I managed to train with eu data. But this training file does not work less accurately than mathew'file. Why and how to improve accurate

Issue with mac directories in train.py

Hello, I am trying to train a sample data set that I acquired on my own to ocr, and I am using OS X 10.11 as operating system. However, when I try to run train.py on shell, I get this error, I think it is an issue with directory, as it is written according to a linux OS, but I dont know how to define TESSERACT_DIR according to mac directory system. Help is much appreciated, please let me know if anyone has a solution!
Thank you!!!
P.S. here is the error I am getting:

Two-Letter Country Code to Train: us Processing: ./us/input/lus.florida.exp0.box Executing: /storage/projects/alpr/libraries/tesseract-ocr/tesseract -l eng ./us/input/lus.florida.exp0.tif lus.florida.exp0 nobatch box.train.stderr sh: /storage/projects/alpr/libraries/tesseract-ocr/tesseract: No such file or directory mv: rename ./lus.florida.exp0.tr to ./tmp/lus.florida.exp0.tr: No such file or directory mv: rename ./lus.florida.exp0.txt to ./tmp/lus.florida.exp0.txt: No such file or directory sh: /storage/projects/alpr/libraries/tesseract-ocr/training/unicharset_extractor: No such file or directory Executing: /storage/projects/alpr/libraries/tesseract-ocr/training/mftraining -F ./tmp/font_properties -U unicharset -O ./tmp/lus.unicharset ./tmp/*.tr sh: /storage/projects/alpr/libraries/tesseract-ocr/training/mftraining: No such file or directory rm: ./unicharset: No such file or directory mv: rename ./tmp/lus.unicharset to ./lus.unicharset: No such file or directory cp: ./us/input/unicharambigs: No such file or directory sh: /storage/projects/alpr/libraries/tesseract-ocr/training/cntraining: No such file or directory mv: rename ./shapetable to ./lus.shapetable: No such file or directory mv: rename ./pffmtable to ./lus.pffmtable: No such file or directory mv: rename ./inttemp to ./lus.inttemp: No such file or directory mv: rename ./normproto to ./lus.normproto: No such file or directory sh: /storage/projects/alpr/libraries/tesseract-ocr/training/combine_tessdata: No such file or directory mv: rename ./lus.unicharset to ./tmp/lus.unicharset: No such file or directory mv: rename ./lus.shapetable to ./tmp/lus.shapetable: No such file or directory mv: rename ./lus.pffmtable to ./tmp/lus.pffmtable: No such file or directory mv: rename ./lus.inttemp to ./tmp/lus.inttemp: No such file or directory mv: rename ./lus.normproto to ./tmp/lus.normproto: No such file or directory mv: rename ./lus.unicharambigs to ./tmp/lus.unicharambigs: No such file or directory

Issues in path of tesseract in train.py

I specified the path of my tesseract correctly but still have this error.

or there is something wrong with my installation in OCR?

openalpr-utils-binarizefontsheet

does openalpr-utils-binarizefontsheet utility support Chinese?

Netherlands training data

Hi Matt,

I noticed you updated the Netherlands OCR training data.
Attached is the data I am using currently, which works very well for me:
EDIT 20-3-2016 Updated with latest version
nethe.zip
END EDIT
It has been produced using this font:
http://www.dafont.com/kenteken.font
Kind regards,
Kees

Processing issue with openalpr-utils-binarizefontsheet

Hi, I'm using OSX 10.10.2

is that OK? How much time could take that process?

Here is what I did..

openalpr-utils-binarizefontsheet --out_dir /tiles/out --character_file /tiles/characters.txt -- /tiles/font_sheet_1.png
Processing: /tiles/font_sheet_1.png

Then a black "Temp Window" appears and there is not output files

My source files:
characters.txt
QWERTYUIOPASDFGHJKLZXCVBNM1234567890

font_sheet_1.png

./train new country code

Hi,
when i try to proceed training, i give follow error.
root@ubuntu:/usr/local/src/train-ocr# ./train.py
Two-Letter Country Code to Train: ua
Processing: ./ua/input/leu.ukraine.exp0.box
Executing: /usr/local/src/tesseract-ocr/api/tesseract -l fra ./ua/input/leu.ukraine.exp0.tif leu.ukraine.exp0 box.train.stderr
Error opening data file /usr/local/src/tesseract-ocr/tessdata/fra.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory.
Failed loading language 'fra'
Tesseract couldn't load any languages!
Could not initialize tesseract.
mv: cannot stat ‘./leu.ukraine.exp0.tr’: No such file or directory
mv: cannot stat ‘./leu.ukraine.exp0.txt’: No such file or directory
Extracting unicharset from ./ua/input/leu.ukraine.exp0.box
Wrote unicharset file ./unicharset.
Executing: /usr/local/src/tesseract-ocr/training/mftraining -F ./tmp/font_properties -U unicharset -O ./tmp/lua.unicharset ./tmp/.tr
Warning: No shape table file present: shapetable
Reading ./tmp/.tr ...

Error: Unable to open ./tmp/.tr!
signal_termination_handler:Error:Signal_termination_handler called:Code 3000
Segmentation fault (core dumped)
mv: cannot stat ‘./tmp/lua.unicharset’: No such file or directory
cp: cannot stat ‘./ua/input/unicharambigs’: No such file or directory
Reading ./tmp/.tr ...

Error: Unable to open ./tmp/*.tr!
signal_termination_handler:Error:Signal_termination_handler called:Code 3000
Segmentation fault (core dumped)
rm: cannot remove ‘./shapetable’: No such file or directory
mv: cannot stat ‘./pffmtable’: No such file or directory
mv: cannot stat ‘./inttemp’: No such file or directory
mv: cannot stat ‘./normproto’: No such file or directory
Combining tessdata files
Error opening unicharset file
Error combining tessdata files into lua.traineddata
mv: cannot stat ‘./lua.unicharset’: No such file or directory
mv: cannot stat ‘./lua.pffmtable’: No such file or directory
mv: cannot stat ‘./lua.inttemp’: No such file or directory
mv: cannot stat ‘./lua.normproto’: No such file or directory
mv: cannot stat ‘./lua.unicharambigs’: No such file or directory

additional info:
echo $TESSDATA_PREFIX
//usr/local/src/openalpr/runtime_data/ocr

root@ubuntu:/usr/local/src/train-ocr/ua/input# ls -la
total 32
drwxr-xr-x 2 root root 4096 Nov 1 21:38 .
drwxr-xr-x 3 root root 4096 Nov 1 21:37 ..
-rw-r--r-- 1 root root 1060 Nov 1 21:38 leu.ukraine.exp0.box
-rw-r--r-- 1 root root 19298 Nov 1 21:38 leu.ukraine.exp0.tif

What does template_max_width_px and min_plate_size_width_px specify?

I have a question about these 2 parameters:
Does template_max_width_px specify the maximum width of the plate in the image stream comes from the video or the maximum width of the plate in the training sample images?
Moreover, what about the min_plate_size_width_px and what is going to be checked to be valid?
Thank you for your help.

Error when trying to train new country license plate

Hi,
I'm using Ubuntu 14.04 to train OpenALPR to recognize my country characters and license plates.

I have succeeded to do so but I discovered that the ocr_language was set to leu, rather than what I want, which is lil.

Once I've changed the ocr_language parameter to lil almost every OpenALPR utility is showing this error :

read_params_file: parameter not found: ;

As you can see above it doesn't says what parameter is missing. All I did is just change the ocr_language from leu to lil and create a new configuration file for the new language which is the same as the European one.

I've googled it and it seems to be an error from tesseract. However, no solution was good enough for this issue.

What should I do?

Thanks.

Segmentation fault if "char_width_mm = 20"

Hi,

train-ocr wouldn't recognize the I in http://www.licenseplates.tv/images/swissai.gif, so I reduced char_width_mm to 20 for the [eu] country and ./classifychars now display the I correctly, however when I try to press enter to input which char it is then it segfaults.

path tesseract

Hi
I would like to run the train.py script but I'm not sure what the correct path to tesseract is?
I used this command: brew install tesseract --devel
to install tesseract on OS X. I think this puts tesseract in the following path:
/usr/local/Cellar/tesseract/3.03rc1_3
The only subfolders in there are: bin, lib, include and share.
In the train.py script it refers to other subfolders as well e.g. /training.

So my question is, what path do I set for TESSERACT_DIR in train.py?

Thank you!