Document OCR by template

This is an OCR program designed for travel document. It can now support 23 types of documents with pre-defined template. You can add whatever you like.

Passport
China ID card
HK ID card (new format)
HK ID card (old format)
Macau ID card (new format)
Macau ID card (old format)
Macau ID card - backside with MRZ
China to HK/Macau Entry Permit card
China to HK/Macau Entry Permit (Old)
China to Taiwan Entry Permit card
HK/Macau to China Entry Permit card
HK/Macau to China Entry Permit card (Old)
Taiwan to China Entry Permit card
Taiwan to China Entry Permit (Old)
Australia Driver Licence - New South Wales
Australia Driver Licence - Victoria
Australia Driver Licence - Capital Territory
Australia Driver Licence - Queensland
Australia Driver Licence - Western
Australia Driver Licence - Northern Territory
Australia Driver Licence - Tasmania
Australia Driver Licence - South Australia
New Zealand Driver Licence

Environment

CentOS / Windows
python 3.7+

Installation

git clone --recursive https://github.com/wisebobo/doc_ocr_by_template
cd doc_ocr_by_template
pip3 install -r requirements.txt

How to use?

Go to project folder, edit the settings.py to update those APP_ID/APP_KEY to your own one.

Then execute

./startServer.sh

or

python3 startServer.py

Design Concept

Running tornado for exposing API service
After receiving base64 image, pass to a pre-trained ResNet50 model for image classification to retrieve the document type.
After getting the document type, create multiple threads to call Tencent/Baidu/Face++/Netease/JD OCR API to retrieve the 1st round of OCR result
Base on the 1st round of OCR result, to match against the pre-defined template. Template is created by using the [project folder]/templates/template_generator.html. If template match, crop the recognition area to a new image (idea is to remove those unnecessary information to get a more accurate OCR result), then pass to Tencent/Baidu/Face++/Netease/JD OCR API again.
Match the 2nd OCR result against the template fields
According to corresponding document type to apply respective data cleasing logic
Calculate the score

Reference

MRZ https://github.com/konstantint/PassportEye

himanshumoliya / doc_ocr_by_template Goto Github PK

doc_ocr_by_template's Introduction

Document OCR by template

Environment

Installation

How to use?

Design Concept

Reference

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent