Giter Club home page Giter Club logo

sanskrit-ocr's Introduction

sanskrit-ocr

Note: This branch contains code for IndicOCR-v2. For IndicOCR-v1, kindly visit the this branch.


This repository contains code for various OCR models for classical Sanskrit Document Images. For a quick understanding of how to get the IndicOCR and CNN-RNN up and running, kindly continue to read this Readme. For more detailed instructions, visit our Wiki page.

The IndicOCR model and CNN-RNN models are best run on a GPU.

Please cite our paper if you end up using it for your own research.

@InProceedings{Dwivedi_2020_CVPR_Workshops,
author = {Dwivedi, Agam and Saluja, Rohit and Kiran Sarvadevabhatla, Ravi},
title = {An OCR for Classical Indic Documents Containing Arbitrarily Long Words},
booktitle = {The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
month = {June},
year = {2020}
}

Results:

The following table shows the comparitive results for the IndicOCR-v2 model with other state of the art models.

Row Dataset Model Training Config CER (%) WER (%)
1 new IndicOCR-v2 C3:mix training + real finetune 3.86 13.86
2 new IndicOCR-v2 C1:mix training 4.77 16.84
3 new CNN-RNN C3:mix training + real finetune 3.77 14.38
4 new CNN-RNN C1:mix training 3.67 13.86
5 new Google-OCR -- 6.95 34.64
6 new Ind.senz -- 20.55 57.92
7 new Tesseract (Devanagiri) -- 13.23 52.75
8 new Tesseract (Sanskrit) -- 21.06 62.34

IndicOCR-v2:

Details:

The code is written in tensorflow framework.

Pre-Trained Models:

  • Download pre-trained C1 models from here

  • Download pre-trained C3 models from here

Setup:

In the model/attention-lstm directory, run the following commands:

create conda create -n indicOCR python=3.6.10
conda activate indicOCR
conda install pip
pip install -r requirements.txt

Installation:

To install the aocr (attention-ocr) library, from the model/attention-lstm directory, run:

python setup.py install

tfrecords creation:

Make sure to have/create a .txt file with every line of the file in the following format:

path/to/image<space>annotation

ex: /user/sanskrit-ocr/datasets/train/1.jpg I am the annotated text

aocr dataset /path/to/txt/file/ /path/to/data.tfrecords

Train:

To train the data.tfrecords file created as described above, run the following command.

CUDA_VISIBLE_DEVICES=0 aocr train /path/to/tfrecords/file --batch-size <batch-size> --max-width <max-width> --max-height <max-height> --max-prediction <max-predicted-label-length> --num-epoch <num-epoch>

Validate:

To validate many checkpoints, run

python ./model/evaluate/attention_predictions.py <initial_ckpt_no> <final_ckpt_step> <steps_per_checkpoint>

This will create a val_preds.txt file in the model/attention-lstm/logs folder.

Test

To test a single checkpoint, run the following command:

CUDA_VISIBLE_DEVICES=0 aocr test /path/to/test.tfrecords --batch-size <batch-size> --max-width <max-width> --max-height <max-height> --max-prediction <max-predicted-label-length> --model-dir ./modelss

Note: If you want to test multiple checkpoints which are evenly spaced (numbering wise), use the method described in the validation section.

Computing Error Rates:

To compute the CER and WER of the predictions, run the following command:

python ./model/evaluate/get_errorrates.py <predicted_file_name>

ex: python model/evaluate/get_errorrates.py val_preds.txt

The results of error rates will be written to a file output.json in the visualize directory.

CNN-RNN:

Details:

The code is written in tensorflow framework.

Pre-Trained Models:

To download the best CNN-RNN model, kindly visit this page.

Setup:

In the model/CNN-RNN directory, run the following commands:

create conda create -n crnn python=3.6.10
conda activate crnn
conda install pip
pip install -r requirements.txt

tfrecords creation:

Make sure to have/create a .txt file with every line of the file in the following format:

path/to/image<space>annotation

ex: /user/sanskrit-ocr/datasets/train/1.jpg I am the annotated text

python model/CRNN/create_tfrecords.py /path/to/.txt/file ./model/CRNN/data/tfReal/data.tfrecords

Train:

To train the data.tfrecords file created as described above, run the following command.

python model/CRNN/train.py <training tfrecords filename> <train_epochs> <path_to_previous_saved_model> <steps-per_checkpoint>

ex: python ./model/CRNN/train.py train_feature.tfrecords 20 model/CRNN/model/shadownet/shadownet_-40 200

Note: If you are training from scratch just set the <path_to_previous_saved_model> arguement to 0.

ex: python model/CRNN/train.py data.tfrecords 100 0 <steps-per_checkpoint>

Validate:

To validate many checkpoints, run

python ./model/evaluate/crnn_predictions.py <tfrecords_file_name> <initial_step> <final_step> <steps_per_checkpoint> <out_file>

This will create a out_file in the model/CRNN/logs folder.

Note: the tfrecords_file_name should be relative to the model/CRNN/data/tfReal/ directory.

Test

Same as above

Computing Error Rates:

To compute the CER and WER of the predictions, run the following command:

Validation:

python model/evaluate/get_errorrates_crnn.py <path_to_predicted_file>

Test:

python model/evaluate/get_errorrates_crnn.py <path_to_predicted_file>

ex: python model/evaluate/get_errorrates_crnn.py model/CRNN/logs/test_preds_final.txt

Creating Synthetic Data, Obtaining results for Tesseract and Google-OCR etc.

Visit our Wiki page.


Other-Analysis:

WA-ECR Plot:

To gain a better insight into performance, we compute the word-averaged erroneous character rate (WA-ECR). This is defined as follows:

WA-ECR: = E/N

Where:

  • E: number of erroneous characters across all words of length L
  • N : number of L length words in the test set.

Paper Results

Figure: Distribution of word-averaged erroneous character rate (WA-ECR) as a function of length, for different models. The lower WA-ECR the better. The test words histogram in terms of word lengths can also be seen in the plot (red dots, log scale).


Sample Results:

Paper Results

Figure: Qualitative results for different models. Errors relative to ground truth are highlighted in red. Blue highlighting indicates text missing from at least one of the OCRs. A larger amount of blue within a line for an OCR indicates better coverage relative to others OCRs. Smaller amount of red indicates absence of errors.

sanskrit-ocr's People

Contributors

cvitatma avatar dwivediagam avatar itsbvk avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

sanskrit-ocr's Issues

conflict in requirements

While installing requirements, I'm facing conflict of requirements.
Here is the terminal output from Manjaro.

ERROR: Cannot install -r requirements.txt (line 14) and protobuf==3.0.0b2 because these package versions have conflicting dependencies.

The conflict is caused by:
    The user requested protobuf==3.0.0b2
    google-api-core 1.16.0 depends on protobuf>=3.4.0

To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict

ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/user_guide/#fixing-conflicting-dependencies

Tensorflow Serve Error

external/org_tensorflow/tensorflow/core/framework/op_kernel.cc:1753] OP_REQUIRES failed at ctc_decoder_ops.cc:288 : Failed precondition: sequence_length(0) <= 128

I am getting this error

Pango Installation Error.

ERROR: Could not find a version that satisfies the requirement pango (from versions: none)
ERROR: No matching distribution found for pango
My Python version is 3.7
Please Help..How To install pango library??

Facing error while installing pango

What is the python version used to install pango and generate the synthetic data. In the file "get_random_lines.py" in the line
pangocairo_context = pangocairo.CairoContext(context)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.