Giter Club home page Giter Club logo

doctr's Introduction

🚀 Exciting update! We have created a demo for our paper on Hugging Face Spaces, showcasing the capabilities of our DocTr. Check it out here!

🔥 Good news! Our new work DocTr++: Deep Unrestricted Document Image Rectification comes out, capable of rectifying various distorted document images in the wild.

🔥 Good news! Our new work exhibits state-of-the-art performances on the DocUNet Benchmark dataset: DocScanner: Robust Document Image Rectification with Progressive Learning with Repo.

🔥 Good news! A comprehensive list of Awesome Document Image Rectification methods is available.

DocTr

1 2 3

DocTr: Document Image Transformer for Geometric Unwarping and Illumination Correction
ACM MM 2021 Oral

Any questions or discussions are welcomed!

🚀 Demo (Link)

  1. Upload the distorted document image to be rectified in the left box.
  2. Click the "Submit" button.
  3. The rectified image will be displayed in the right box.
  4. Our demo environment is based on a CPU infrastructure, and due to image transmission over the network, some display latency may be experienced.

image

Training

DocTr consists of two main components: a geometric unwarping transformer (GeoTr) and an illumination correction transformer (IllTr).

  • For geometric unwarping, we train the GeoTr network using the Doc3D and DTD dataset.
  • For illumination correction, we train the IllTr network based on the DocProj dataset.

Inference

  1. Download the pretrained models from Google Drive or Baidu Cloud, and put them to $ROOT/model_pretrained/.
  2. Put the distorted images in $ROOT/distorted/.
  3. Geometric unwarping. The rectified images are saved in $ROOT/geo_rec/ by default.
    python inference.py
    
  4. Geometric unwarping and illumination rectification. The rectified images are saved in $ROOT/ill_rec/ by default.
    python inference.py --ill_rec True
    

Evaluation

  • Important. In the DocUNet Benchmark, the '64_1.png' and '64_2.png' distorted images are rotated by 180 degrees, which do not match the GT documents. It is ingored by most of existing works. Before the evaluation, please make a check.
  • Note that the performances in our MM paper are computed with the two mistaken samples in DocUNet Benchmark. For reproducing the following quantitative performance on the corrected DocUNet Benchmark, please use the geometric rectified images available from Google Drive or Baidu Cloud. For the corrected performance of other methods, please refer to our new work DocScanner.
  • Image Metrics: We use the same evaluation code for MS-SSIM and LD as DocUNet Benchmark dataset based on Matlab 2019a. Please compare the scores according to your Matlab version. We provide our Matlab interface file at $ROOT/ssim_ld_eval.m.
  • OCR Metrics: The index of 30 document (60 images) of DocUNet Benchmark used for our OCR evaluation is $ROOT/ocr_img.txt (Setting 1). Please refer to DewarpNet for the index of 25 document (50 images) of DocUNet Benchmark used for their OCR evaluation (Setting 2). We provide the OCR evaluation code at $ROOT/OCR_eval.py. The version of pytesseract is 0.3.8, and the version of Tesseract in Windows is recent 5.0.1.20220118. Note that in different operating systems, the calculated performance has slight differences.
Method MS-SSIM LD ED (Setting 1) CER ED (Setting 2) CER
GeoTr 0.5105 7.76 464.83 0.1746 724.84 0.1832

Citation

If you find this code useful for your research, please use the following BibTeX entry.

@inproceedings{feng2021doctr,
  title={DocTr: Document Image Transformer for Geometric Unwarping and Illumination Correction},
  author={Feng, Hao and Wang, Yuechen and Zhou, Wengang and Deng, Jiajun and Li, Houqiang},
  booktitle={Proceedings of the 29th ACM International Conference on Multimedia},
  pages={273--281},
  year={2021}
}
@article{feng2021docscanner,
  title={DocScanner: Robust Document Image Rectification with Progressive Learning},
  author={Feng, Hao and Zhou, Wengang and Deng, Jiajun and Tian, Qi and Li, Houqiang},
  journal={arXiv preprint arXiv:2110.14968},
  year={2021}
}
@article{feng2023doctrp,
  title={Deep Unrestricted Document Image Rectification},
  author={Feng, Hao and Liu, Shaokai and Deng, Jiajun and Zhou, Wengang and Li, Houqiang},
  journal={IEEE Transactions on Multimedia},
  year={2023}
}

Acknowledgement

The codes are largely based on DocUNet, DewarpNet, and DocProj. Thanks for their wonderful works.

Contact

For commercial usage, please contact the email ([email protected]).

doctr's People

Contributors

fh2019ustc avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.