Creates thumbnail from a given form. It basically detects logos, texts, border, and lines and will convert them into small-sized thumbnail.
main.py
is the entry point of the code and can be used as following:
$ python main.py inputfile outputfile
for example:
$ python src/main.py testdata/form.jpg testdata/thumbnail.png
The algorithm follows these steps:
- Thresholding. Otsu method is used to find the threshold automatically
- Thinning. With the assumption of a form being formed by logo, text and line only, the elements are thinned using Zhang-Suen algorithm.
- Line detection. Lines are detected using an implementation of hough transform, with the assumption the lines are either horizontal or vertical with possible slight rotation. Lines are then removed from the image.
- Logo detection. Connected components are first found on the image and those with sizes larger than a threshold are detected as logos.
- Text detection. Remined image is convolved with a horizontal filter and connected components are detected as text
- Thumbnail creation.