Giter Club home page Giter Club logo

seed's Introduction

SE_ASTER

Introduction

This is the implementation of the paper "SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition" This code is based on the aster.pytorch, we sincerely thank ayumiymk for his awesome repo and help.

How to use

Env

PyTorch == 1.1.0
torchvision == 0.3.0
fasttext == 0.9.1

Details can be found in requirements.txt

Train

Prepare your data
  • Download the pretrained language model (bin) from here
  • Update the path in the lib/tools/create_all_synth_lmdb.py
  • Run the lib/tools/create_all_synth_lmdb.py
  • Note: it may result in large storage space, you can modify the datasets/dataset.py to generate the word embedding in an online way
Run
  • Update the path in train.sh, then
sh train.sh

Test

  • Update the path in the test.sh, then
sh test.sh

Experiments

Evaluation on benchmarks

  • You can downlod the benchmark datasets from BaiduYun (key: nphk) shared by clovaai in this repo.
Checkpoint IIIT5K IC13-1015 IC13-857 IC15-1811 IC15-2077 SVT SVTP CUTE
OneDrive BaiduYun(key: x54e) 93.4 93.5 94.5 79.8 75.8 88.4 82.0 84.0

Evalution with lexicons

  • Existing methods replace the predicted word with the nearest lexicon word under the metric of edit distance (ED). With the semantic information, we can choose the most semantics similar (SS) word based on the nearest edit distance.
Methods IIIT5K-50 IIIT5K-1K SVT-50 IC13 IC15
ED 99.06 97.87 96.36 97.44 87.76
ED + SS 99.27 97.93 96.45 97.64 88.07

About the word embedding

  • Directly use word embedding from the pre-trained LM during training and inference.
IIIT5K IC13 IC15-1811 IC15-2077 SVT SVTP CUTE
94.6 93.8 85.0 79.6 90.9 84.2 85.4

Exploration on global information

  • We try to use Aggregation Cross-Entropy as the global information instead of the semantics. This part of code will be released in next few days.
IIIT5K IC13 IC15-1811 IC15-2077 SVT SVTP CUTE
93.8 91.3 78.7 - 90.1 81.6 81.9

Citation

@inproceedings{qiao2020seed,
  title={{SEED}: Semantics enhanced encoder-decoder framework for scene text recognition},
  author={Qiao, Zhi and Zhou, Yu and Yang, Dongbao and Zhou, Yucan and Wang, Weiping},
  booktitle={CVPR},
  year={2020},
}
@article{shi2018aster,
  title={{ASTER}: An attentional scene text recognizer with flexible rectification},
  author={Shi, Baoguang and Yang, Mingkun and Wang, Xinggang and Lyu, Pengyuan and Yao, Cong and Bai, Xiang},
  journal={TPAMI},
  volume={41},
  number={9},
  pages={2035--2048},
  year={2018},
  publisher={IEEE}
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.