lars-research / autostr Goto Github PK

H. Zhang, Q. Yao, M. Yang, Y. Xu, X. Bai. AutoSTR: Efficient Backbone Search for Scene Text Recognition. European Conference on Computer Vision (ECCV). 2020.

Python 100.00%

scene-text-recognition neural-architecture-search

autostr's Introduction

AutoSTR: Efficient Backbone Search for Scene Text Recognition

We investigate how to obtain a strong feature sequence extractor for scene text recognition task by neural architecture search technology. The research paper can be found here ECCV. 2020.

Requirements

python==3.6.7
pytorch==1.4.0
torchvision==0.2.1
lmdb
PyYAML
pillow
editdistance
...

Searching Network Architecture

python3 arch_search_exp.py --config_file configs/search.yaml

Retraining Compact Structure

python3 main.py --config_file configs/retrain.yaml

logs and checkpoints

The logs and checkpoints can be found in here with extraction code wp8w.

Citation

If you find this work helpful for your research, please cite the following paper:

@inproceedings{zhang2020efficient,
  title={AutoSTR: Efficient Backbone Search for Scene Text Recognition},
  author={Zhang, Hui and Yao, Quanming and Yang, Mingkun and Xu, Yongchao and Bai, Xiang},
  booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
  year={2020}
}

@TechReport{yao2018taking,
  author      = {Yao, Quanming and Wang, Mengshuo},
  institution = {arXiv preprint},
  title       = {Taking Human out of Learning Applications: A Survey on Automated Machine Learning},
  year        = {2018},
}

Acknowledgement

We used the code part from aster.pytorch (https://github.com/ayumiymk/aster.pytorch) and proxylessnas(https://github.com/mit-han-lab/proxylessnas). Thanks for their excellent work very much.

New Opportunities

Interns, research assistants, and researcher positions are available. See requirement

autostr's People

Contributors

Stargazers

Watchers

Forkers

peternara sapjunior barryzm wuxiaolianggit asmitakhaneja yangyin2016 ustczhouyu shengzhang90 duxiangcheng km1562 cxy86121 chadpieere caizhengqi benjamesbabala dengxiaolei lancercat

autostr's Issues

关于模型使用的一些疑问

这是一项很棒的工作，感谢您的分析！我在搜索额训练的时候有些疑问并遇到了一些问题，想与您讨论：
1、论文中提到操作和下采样是分两步搜索得到，我在执行python3 arch_search_exp.py --config_file configs/retrain.yaml命令时得到输出如下，看起来只搜索了操作而没有搜搜下采样。

2、在训练中我发现操作和下采样已经写在了retrain.yaml，如下图所示，我在搜索完模型以后是否要将这里改成我搜索得到的模型。

3、在训练中（执行CUDA_VISIBLE_DEVICES=2,3 python3 arch_search_exp.py --config_file configs/retrain.yaml）报错，找不到tools.flops_counter。在tools文件夹下的确没有flops_counter，这个应该如何修改代码。

文件缺失

lmdb.Error: /home/yaoquanming/local_datasets/ALL_REC_DATA:
请问这个路径的文件在哪里下载？

Use the Compact Structure for recognition

@quanmingyao @huizhang0110

After searching and retraining a compact structure,
how can i use the model to recognize the images in a folder. inferincing script
Does AutoSTR work on words only, or can it be trained on text-lines
Can inferencing/text recognition be done using cpu

Network always choise big kernel size and expansion factor t always is 6

The arch I search is as following:

=> saving checkpoint  ./logs/proxyless/IIIT5K/seed_1996_3/checkpoint.pth.tar
0. (Mix(5x5_MBConv6, 0.770), None)      # ['0.000', '0.230', '0.770', '0.000', '0.000', '0.000']
1. (Mix(5x5_MBConv6, 0.998), Identity)  # ['0.000', '0.000', '0.998', '0.000', '0.000', '0.001', '0.000']
2. (Mix(5x5_MBConv6, 0.942), Identity)  # ['0.000', '0.038', '0.942', '0.001', '0.000', '0.015', '0.003']
3. (Mix(5x5_MBConv6, 0.768), None)      # ['0.042', '0.100', '0.768', '0.040', '0.007', '0.043']
4. (Mix(5x5_MBConv6, 0.831), Identity)  # ['0.009', '0.019', '0.831', '0.005', '0.001', '0.066', '0.069']
5. (Mix(5x5_MBConv6, 0.456), Identity)  # ['0.017', '0.020', '0.456', '0.016', '0.110', '0.015', '0.366']
6. (Mix(5x5_MBConv6, 0.583), None)      # ['0.074', '0.261', '0.583', '0.018', '0.008', '0.056']
7. (Mix(5x5_MBConv6, 0.801), Identity)  # ['0.006', '0.151', '0.801', '0.000', '0.006', '0.036', '0.000']
8. (Mix(5x5_MBConv6, 0.384), Identity)  # ['0.000', '0.354', '0.384', '0.000', '0.258', '0.004', '0.000']
9. (Mix(5x5_MBConv6, 0.483), None)      # ['0.025', '0.366', '0.483', '0.009', '0.010', '0.106']
10. (Mix(5x5_MBConv6, 0.500), Identity) # ['0.000', '0.419', '0.500', '0.000', '0.044', '0.037', '0.000']
11. (Mix(5x5_MBConv6, 0.418), Identity) # ['0.000', '0.332', '0.418', '0.000', '0.003', '0.248', '0.000']
12. (Mix(5x5_MBConv6, 0.469), None)     # ['0.002', '0.337', '0.469', '0.004', '0.018', '0.169']
13. (Mix(5x5_MBConv3, 0.421), Identity) # ['0.000', '0.421', '0.412', '0.000', '0.003', '0.163', '0.000']
14. (Mix(5x5_MBConv3, 0.468), Identity) # ['0.000', '0.468', '0.428', '0.000', '0.001', '0.103', '0.000']

We can see that network always choise big kernel size and most expansion factor t is 6. Is it normal ?

Any suggestions ?