Giter Club home page Giter Club logo

ocr-water-meter's Introduction

OCR Water Meter

As task not contain dataset, i suggest to use kaggle water meter dataset.

P.S. Because of not simple install of some packages there is no .py files. Programm can be tested step by step with notebooks. .zip archive contains requirements.txt, but pipreqs can not detect detectron2 and it's may not work with pip also.

OCR task consists of 3 steps:

  1. Semantic segmentation model to allocate the numbers of the water meter and then crop it.
  2. OCR model to predict exact numbers on the cropped image.
  3. Combine it together.

Data is not prepared for segmentation due to not same shape and not normalized, also it's useful to check if there is an empty masks or image id is not the same as mask id. Seg_train.ipynb contains some visualization of the provided data for each step. After checking we rezize it into shape (256, 256) via opencv and, as usual, we should not forget to use augmentations with albumentations.

Now splititing dataset into validation and training parts and we are ready to train our model.

Results & architecture

Architecture

  • Architecture: UNet & EfficientNetB0
  • Loss function: FocalLoss (DiceLoss or DiceBCELoss might be better)
  • Metric: Dice coefficient
  • Optimizer: Adam (lr=1e-3, decay=1e-6)
  • learning scheduler: ReduceLROnPlateau(factor=0.5, patience=5)

Results:

Architecture dice_coef Input & Mask Resolution Epochs steps per epoch
UNet & EfficientNetB0 0.6283 (256x256) 50 200

alt text

Preparing dataset for OCR model

Before starting cropping dataset, should to say, i will use masks from dataset, not output of the segmentation model because it's better to train ocr model on the best dataset. Naturally i will create a prediction program for an single image using mask from segmentation model.

After using bitwise opencv operation on images with help of the masks, we need to care about rotation of the new cropped image:

alt text

After rotating:

alt text

Our objective is to input cropped photos into our OCR model utilizing the segmentation model's generated masks and associated images. Then, using the 200-meter sample of manually labeled images, we will train a Faster RCNN model. Our objective is to create a Faster RCNN model that can identify meters' digits with accuracy and forecast their values. We will parse the data and reformat the predictions using the output data from such a model on test photos so that the predictions appear in order from left to right. The digits will then be properly combined to get the final meter reading

Already created and ready for ocr model dataset can be downloaded via link: download

Examples:

alt text

Our photos are split and labeled with the appropriate label for each digit, as shown above. Dataset is divided into training (70%), validation (20%), and testing (10%) datasets, to train a special Detectron2 Faster RCNN model, according to article

To specify, should to say, we will use faster_rcnn_X_101_32x8d_FPN, but Detectron2 allows you many options in determining your model architecture, which you can see in the Detectron2 model zoo

alt text

After trainig there is a raw output, here are steps to extract predicitions:

  1. Read image and get output information.
  2. Find predicted boxes and labels.
  3. Obtain list of all predictions and the leftmost x-coordinate for bounding box.
  4. Sort the list based on x-coordinate in order to get proper order or meter reading.
  5. Get final order of identified classes, and map them to class value.
  6. Add decimal point to list of digits depending on number of bounding boxes.
  7. Combine digits and convert them into a float.

Prediction program

To try yourself this model (UNet&EfficientNetB0 + faster_rcnn_X_101_32x8d_FPN + post proccesing logic) we need create a program that:

Before start to describe steps we need 3 files: water_meter.h5, model_final.pth, Water counter.jpg

  1. Resize (256,256)
  2. Normalize data (/255.)
  3. Load model (Not forget about custom loss and metrics)
  4. Each pixel is a probability [0,1] of a mask, we can cluster them: pixelvalue > 0.5
  5. Cropping using bitwise operation.
  6. Use model_final.pth and repeate steps to extract predicitions

OCR Results

Example:

alt text

Meter reading: 1137.075

Task 1.zip contains notebooks for:

Seg_train.ipynb for training segmentation model (part 1).

Colab_links.txt links for part 2 and 3.

ocr-water-meter's People

Contributors

andrii-radyhin avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.