In this capstone project I present the prototype of a photo OCR (optical character recognition) pipeline based on a sliding window algorithm that is able to automatically detect and parse text (digits) in images. The CNN classifiers used in the pipeline have been trained on images synthesized from a large collection of fonts. A demo application is included that detects/transcribes digits from a webcam.
- Report.pdf
- Execute
prepare_project.py
, which will download and extract fonts from https://github.com/google/fonts and will create a font cache. - Run
demo.py
with a webcam connected to the computer. Digits will be detected/transcribed in the webcam feed. Requires GPU! - Test the whole OCR pipeline with
test_ocr.py
. This will randomly create images with digit sequences that are then detected/transcribed. - Test the character segmentation on randomly generated text bounding boxes with
test_segmentation.py
. Will place test images inTestImagesSegmentation
. - Test the separate classifiers with
test_classifiers.py
, randomly generated test images are generated. Some examples are collected intest/classifier.png
,test/segmentation.png
andtest/detection.png
. - Test the character classifier on svhn test images:
test_char_classifier_on_svhn_test_data.py
. This will downloadtest.tar.gz
from http://ufldl.stanford.edu/housenumbers/ (264Mb). Some examples of the test images are collected insvhn/test.png
.
- keras / tensorflow
- OpenCV
- Pillow
- numpy
- python-levenshtein