Giter Club home page Giter Club logo

gtr's Introduction

Overview

Introduction

This is the official implementation of the AAAI 22 accepted paper : Visual Semantics Allow for Textual Reasoning Better in Scene Text Recognition. paper

Abstract

Existing Scene Text Recognition (STR) methods typically use a language model to optimize the joint probability of the 1D character sequence predicted by a visual recognition (VR) model, which ignore the 2D spatial context of visual semantics within and between character instances, making them not generalize well to arbitrary shape scene text. To address this issue, we make the first attempt to perform textual reasoning based on visual semantics in this paper. Technically, given the character segmentation maps predicted by a VR model, we construct a subgraph for each instance, where nodes represent the pixels in it and edges are added between nodes based on their spatial similarity. Then, these subgraphs are sequentially connected by their root nodes and merged into a complete graph. Based on this graph, we devise a graph convolutional network for textual reasoning (GTR) by supervising it with a cross-entropy loss. GTR can be easily plugged in representative STR models to improve their performance owing to better textual reasoning. Specifically, we construct our model, namely S-GTR, by paralleling GTR to the language model in a segmentation-based STR baseline, which can effectively exploit the visual-linguistic complementarity via mutual learning. S-GTR sets new state-of-the-art on six challenging STR benchmarks and generalizes well to multi-linguistic datasets.

Framework

How to use

Env

PyTorch == 1.1.0 
torchvision == 0.3.0
fasttext == 0.9.1

Details can be found in requirements.txt

Train

Prepare your data
  • Download the training set from Synthesis training dataset: Baidu(key:c83d) and Real training datasetBaidu(key:datm)
  • Download the pretrained Seg-baseline visual recognition model from here(soon update)
  • Update the path in the lib/tools/create_all_synth_lmdb.py
  • Run the lib/tools/create_all_synth_lmdb.py
  • Note: it may result in large storage space, you can modify the datasets/dataset.py to generate the word embedding in an online way
Run
  • Update the path in train.sh, then
sh train.sh

Test

  • Update the path in the test.sh, then
sh test.sh

Experiments

Evaluation results on benchmarks

  • You can downlod the benchmark datasets from GoogleDrive.
Methods TrainData model IIIT5K SVT IC13 SVTP IC15 CUTE
SegBaseline ST+MJ GoogleDrive 94.2 90.8 93.6 84.3 82.0 87.6
S-GTR ST+MJ GoogleDrive 95.8 94.1 96.8 87.9 84.6 92.3
S-GTR ST+MJ+R Baidu (key:e95m) 97.5 95.8 97.8 90.6 87.3 94.7

Evaluate S-GTR with different settings

  • Investigate the impact of different modules in S-GTR.
VRM LM GTR IIIT5K SVT IC13 SVTP IC15 CUTE
91.8 86.6 91.1 79.8 77.7 84.8
94.2 90.8 93.6 84.3 82.0 87.6
94.0 91.2 94.8 85.0 82.8 88.4
95.1 93.2 95.9 86.2 84.1 91.3

Plugging GTR in different STR baselines

  • Plug GTR module into four representative types of STR methods.
Methods model IIIT5K SVT IC13 SVTP IC15 CUTE
GTR+CRNN GoogleDrive 87.6 82.1 90.1 68.1 68.2 78.1
GTR+TRBA GoogleDrive 93.2 90.1 94.0 80.7 76.0 82.1
GTR+SRN GoogleDrive 96.0 93.1 96.1 87.9 83.9 90.7
GTR+PRENBaseline GoogleDrive 96.1 94.1 96.6 88.0 85.3 92.6
GTR+ABINet-LV GoogleDrive 96.8 94.8 97.7 89.6 86.9 93.1
  1. Train GTR + CRNN model
python GTR-plug/GTR-CRNN/train.py \
--train_data data_lmdb_release/training --valid_data data_lmdb_release/validation \
--select_data MJ-ST --batch_ratio 0.5-0.5 \
--Transformation None --FeatureExtraction VGG --SequenceModeling BiLSTM --Prediction CTC
--add_GTR True

Test GTR + CRNN model.

python GTR-plug/GTR-CRNN/test.py \
--eval_data data_lmdb_release/evaluation --benchmark_all_eval \
--Transformation None --FeatureExtraction VGG --SequenceModeling BiLSTM --Prediction CTC \
--add_GTR True --saved_model saved_models/best_accuracy.pth
  1. Train GTR + TRBA model.
python GTR-plug/GTR-TRBA/train.py \
--train_data data_lmdb_release/training --valid_data data_lmdb_release/validation \
--select_data MJ-ST --batch_ratio 0.5-0.5 \
--add_GTR True --Transformation TPS --FeatureExtraction ResNet --SequenceModeling BiLSTM --Prediction Attn

Test GTR + TRBA model

 python GTR-plug/GTR-TRBA/test.py \
--eval_data data_lmdb_release/evaluation --benchmark_all_eval \
--Transformation TPS --FeatureExtraction ResNet --SequenceModeling BiLSTM --Prediction Attn \
--saved_model saved_model/best_accuracy.pth --add_GTR True
  1. Train GTR + SRN model
python GTR-plug/GTR-SRN/train.py \
--train_data path-to-train-data --valid-data path-to-valid-data --add_GTR True

Test GTR + SRN model

python GTR-plug/GTR-SRN/test.py \
--train_data --valid-data path-to-valid-data --add_GTR True
  1. Train GTR + PRENBaseline model
python GTR-plug/GTR-P-Base/train.py \
--train_data path-to-train-data --valid-data path-to-valid-data --add_GTR True

Test GTR + PRENBaseline model.

python GTR-plug/GTR-P-Base/test.py \
--train_data path-to-train-data --valid-data path-to-valid-data --add_GTR True
  1. Train GTR + ABINet-LV model
python GTR-plug/GTR-ABINet/main.py \
--train_data path-to-train-data --valid-data path-to-valid-data --add_GTR True --config=configs/train_abinet.yaml 

Test GTR + ABINet-LV model.

python GTR-plug/GTR-ABINet/main.py \
 --valid-data path-to-valid-data  --add_GTR True --config=configs/train_abinet.yaml 

Issue

  1. The train and test datasets are uploaded. The pretrain model will be uploaded and the training code for MT adaptive framework will be updated soon.

  2. This code is for S-GTR and other GTR pluggin methods, and the pluggin models will be updated soon.

  3. To facilitate interested workers to use our model to adapt to other language training, we will provide text guidance in README for other language recognition as soon as possible.

  4. We will update the details of the visual recognition model, and provide guidance code to generate relevant feature maps for the question in issue.

Citation

Please consider citing this paper if you find it useful in your research.

@article{he2021visual,
  title={Visual Semantics Allow for Textual Reasoning Better in Scene Text Recognition},
  author={He, Yue and Chen, Chen and Zhang, Jing and Liu, Juhua and He, Fengxiang and Wang, Chaoyue and Du, Bo},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={36},
  year={2022}
}

Copyright

For research purpose usage only.

gtr's People

Contributors

adeline-cs avatar

Stargazers

 avatar  avatar JujubeTree avatar  avatar Xu Keke avatar Zheng Yuxi avatar  avatar Erwin avatar 柯南小黑 avatar scaler2017 avatar Bui Van Hop avatar ChenLM avatar Hang Guo avatar heitong avatar Richard Tseng avatar  avatar DreamXwj avatar Liu avatar Qihuang Zhong avatar haibinHe avatar LiuZhuang avatar wuwenjie avatar Cừu Đen avatar Huy Đỗ avatar likhith sugganahalli avatar mrxirzzz avatar Lam Bui avatar turning avatar  avatar Hwi avatar Yang Hang avatar  avatar  avatar Mahir Atmis avatar  avatar Qi Song avatar  avatar HSILA avatar sun avatar Anh Quan Nguyen  avatar Devesh Pant avatar Shual avatar  avatar  avatar  avatar  avatar Tianlun Zheng avatar Carpe_diem-Vampire avatar Lin Du avatar Bang Nguyen Anh avatar Linxi avatar  avatar  avatar Qiming Zhang avatar Yufei Xu avatar  avatar 小梅 avatar Hyunsoo Luke HA avatar Lilong Wen avatar Pankesh Bamotra avatar Zhou avatar Pham Dung avatar  avatar keshav sharma avatar Mohamad Mansour avatar  avatar  avatar  avatar 爱可可-爱生活 avatar Jeonghun Baek avatar Khanh Tran avatar  avatar j-lootens avatar hlLi avatar Som Nek avatar  avatar fireae avatar Mr Rotten peach avatar  avatar sword avatar  avatar  avatar James X. ZHAO avatar shubham birmi avatar  avatar Phan Hoang avatar CC avatar Juhua Liu avatar Felix Dittrich avatar Blake Chen avatar Phạm Văn Lĩnh avatar  avatar Kotaro Yamamoto avatar  avatar Lê Anh Duy avatar Max Wolf avatar Apos avatar junbin avatar  avatar lee avatar

Watchers

 avatar  avatar  avatar BoxFishLab avatar Yanshuang avatar  avatar kimyoon-young avatar Kotaro Yamamoto avatar Sehun Heo avatar

gtr's Issues

There are questions about training my own data set

Hello Heyue, your profile picture is prety cute. Wuhan University is my dream university. Readme: Download the training set from Synthesis Training dataset. I want to use my own data set, where should I change the path? By the way, could you give me your wechat? I want to ask some questions about academics. If you would like, please send contact information to [email protected]. Best Wishes.

wrong file format

请问为什么我执行Run the lib/tools/create_all_synth_lmdb.py命令时,会报错iter_5000.pth has wrong file format!,model下了,相应路径也改过了

train.py: error: unrecognized arguments: --add_GTR True

Training command:
python GTR-plug/GTR-TRBA/train.py --train_data /home/staqu/projects/anpr/anpr_ocr/deep-text-recognition-benchmark-master/train --valid_data /home/staqu/projects/anpr/anpr_ocr/deep-text-recognition-benchmark-master/val --select_data / --batch_ratio 1 --add_GTR True --Transformation TPS --FeatureExtraction ResNet --SequenceModeling BiLSTM --Prediction Attn

Error:
train.py: error: unrecognized arguments: --add_GTR True

Embedding vector and Language model

Hello,
Thank you for the code release.
I want to try your network for the Korean and Japanese languages. But when I checked your code for data preparation we have to prepare a word embedding vector. could you please let me know how can I prepare an Embedding vector and language model for Korean and Japanese languages?

which backbone will be better?

i see many backbones in the backbone config file and the Resnet50_FPN_Deform as default. I also find dilated and deform are embedding in backbone, will these blocks have great performance for this OCR task? What are the differents between Resnet50_FPN_Deform and ResNet50? I just see nn.ConvTranspose2d() in FPN_Deform function

Import Error when trying to run sh test.sh

After setting all the paths for images and pretained model checkpoint correctly, I run sh test.sh

Get the following error:

ImportError: cannot import name 'loss' from partially initialized module 'lib' (most likely due to a circular import) (/media/user/7200rpm Seagate/UGRP-TruckNumberPlate/GTR/liberation22/init.py)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.