Giter Club home page Giter Club logo

adapting-ocr's Introduction

Adapting-OCR

Pytorch implementation of our paper Adapting OCR with limited labels

Qualitative Result of our Base, self-trained and hybrid model for English (left) and Hindi (right) datasets. Here ST+FT refers to the model trained using the proposed hybrid approach.

Dependency

  • This work was tested with PyTorch 1.2.0, CUDA 9.0, python 3.6 and Ubuntu 16.04.
  • requirements can be found in the file.
  • Also, please do a pip install pytorch-pretrained-bert as one of our kind contributors pointed out :)
  • command to create environment from the file is conda create -n pytorch1.4 --file env.txt
  • To activate the environment use: source activate pytorch1.4

Training

  • Supervised training

python -m train --name exp1 --path path/to/data

  • Main arguments

    • --name: creates a directory where checkpoints will be stored
    • --path: path to dataset.
    • --imgdir: dir name of dataset
  • Semi-supervised training

python -m train_semi_supervised --name exp1 --path path --source_dir src_dirname --target_dir tgt_dirname --schedule --noise --alpha=1

  • Main arguments
    • --name: creates a directory where checkpoints will be stored
    • --path: path to datasets
    • --source_dir: labelled data directory on which ocr was trained
    • --target_dir: unlabeled data directory on which we want to adapt ocr
    • --percent: percentage of unlabeled data to include in self-training
    • --schedule: will include STLR scheduler while training
    • --train_on_pred: will treat top-predictions as targets
    • --noise: will add gaussian noise to images while training
    • --alpha: set to 1 to include the mixup criterion
    • --combine_scoring: will also take into account the scores outputted by a language model

Note: --combine_scoring works only with line images not word images

  • Data
    • Use trdg to generate synthetic data. The script for data generation is included scrips/generate_data.sh.
    • Download two different fonts and keep the data pertaining to each font in source and target dirs.
    • Use one of the fonts to train data from scratch in a supervised manner.
    • Then finetune the trained model on target data using semi-supervised learning
    • A sample lexicon is provided in words.txt. Download different lexicon as per need.

References

  • The OCR architecture is a CNN-LSTM model borrowed from here
  • The mixup criterion code is borrowed from here
  • STLR is borrowed from this paper

adapting-ocr's People

Contributors

deepayan137 avatar dependabot[bot] avatar prathameshza avatar tumble-weed avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

adapting-ocr's Issues

Missing “English.data.pkl” from data/train.

Hi. I am trying to train the model on my custom dataset to read numbers only from ROIs. Dataset consists of a folder with images with naming convention as “ocr_reading_img.jpg”. I am getting missing file error for “English.Data.pkl”. Could anyone tell me what this file is and where to get it from? @Deepayan137

error while running the data generation script

I am a Windows user. I tried running the following command for data generation on the terminal

trdg -i words.txt -c 1000 --output_dir source_dir -ft fonts/Arimo-Regular.ttf

and got the following error

OSError: [Errno 22] Invalid argument: 'source_dir\\you?_379.jpg'

deepayan_adaptive_OCR_error

U are almost trash

after setting all the sheet up! it requires pytorch_pretrained_bert.

u as the writer are almost garbage

How to train for RTL languages?

I'm trying to use code from this repo to train OCR for Arabic language. Additionally, the dataset has hand written text which makes it even complicated, but this is just for learning purposes.

I trained a model without any modifications to the workflow and the model architecture except for the dataset class. The model is giving me 0 WA and CA. So, I wanted to ask what possible changes we need to make the model learn?

So far, I'm trying to reverse the targets as Arabic is an RTL language unlike English. And to pad the images to the left instead of right. I'd like to hear your thoughts on this. Thanks.

How to use this for Custom Dataset?

Hi,
I would like to use your work on my dataset. Please, guide me how can prepare my dataset so that I can use this algo over it.
I have receipt dataset amounts and product name is hand written over it. How should I prepare my data to get these handwritten extracted. Your suggestions would help.

Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.