Giter Club home page Giter Club logo

autostr's Introduction

AutoSTR: Efficient Backbone Search for Scene Text Recognition

We investigate how to obtain a strong feature sequence extractor for scene text recognition task by neural architecture search technology. The research paper can be found here ECCV. 2020.

overview

Requirements

python==3.6.7
pytorch==1.4.0
torchvision==0.2.1
lmdb
PyYAML
pillow
editdistance
...

Searching Network Architecture

python3 arch_search_exp.py --config_file configs/search.yaml 

Retraining Compact Structure

python3 main.py --config_file configs/retrain.yaml 

logs and checkpoints

The logs and checkpoints can be found in here with extraction code wp8w.

Citation

If you find this work helpful for your research, please cite the following paper:

@inproceedings{zhang2020efficient,
  title={AutoSTR: Efficient Backbone Search for Scene Text Recognition},
  author={Zhang, Hui and Yao, Quanming and Yang, Mingkun and Xu, Yongchao and Bai, Xiang},
  booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
  year={2020}
}
@TechReport{yao2018taking,
  author      = {Yao, Quanming and Wang, Mengshuo},
  institution = {arXiv preprint},
  title       = {Taking Human out of Learning Applications: A Survey on Automated Machine Learning},
  year        = {2018},
}

Acknowledgement

We used the code part from aster.pytorch (https://github.com/ayumiymk/aster.pytorch) and proxylessnas(https://github.com/mit-han-lab/proxylessnas). Thanks for their excellent work very much.

New Opportunities

  • Interns, research assistants, and researcher positions are available. See requirement

autostr's People

Contributors

huizhang0110 avatar quanmingyao avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

autostr's Issues

关于模型使用的一些疑问

这是一项很棒的工作,感谢您的分析!我在搜索额训练的时候有些疑问并遇到了一些问题,想与您讨论:
1、论文中提到操作和下采样是分两步搜索得到,我在执行python3 arch_search_exp.py --config_file configs/retrain.yaml命令时得到输出如下,看起来只搜索了操作而没有搜搜下采样。
image
2、在训练中我发现操作和下采样已经写在了retrain.yaml,如下图所示,我在搜索完模型以后是否要将这里改成我搜索得到的模型。
image
3、在训练中(执行CUDA_VISIBLE_DEVICES=2,3 python3 arch_search_exp.py --config_file configs/retrain.yaml)报错,找不到tools.flops_counter。在tools文件夹下的确没有flops_counter,这个应该如何修改代码。
image

文件缺失

lmdb.Error: /home/yaoquanming/local_datasets/ALL_REC_DATA:
请问这个路径的文件在哪里下载?

Network always choise big kernel size and expansion factor t always is 6

The arch I search is as following:

=> saving checkpoint  ./logs/proxyless/IIIT5K/seed_1996_3/checkpoint.pth.tar
0. (Mix(5x5_MBConv6, 0.770), None)      # ['0.000', '0.230', '0.770', '0.000', '0.000', '0.000']
1. (Mix(5x5_MBConv6, 0.998), Identity)  # ['0.000', '0.000', '0.998', '0.000', '0.000', '0.001', '0.000']
2. (Mix(5x5_MBConv6, 0.942), Identity)  # ['0.000', '0.038', '0.942', '0.001', '0.000', '0.015', '0.003']
3. (Mix(5x5_MBConv6, 0.768), None)      # ['0.042', '0.100', '0.768', '0.040', '0.007', '0.043']
4. (Mix(5x5_MBConv6, 0.831), Identity)  # ['0.009', '0.019', '0.831', '0.005', '0.001', '0.066', '0.069']
5. (Mix(5x5_MBConv6, 0.456), Identity)  # ['0.017', '0.020', '0.456', '0.016', '0.110', '0.015', '0.366']
6. (Mix(5x5_MBConv6, 0.583), None)      # ['0.074', '0.261', '0.583', '0.018', '0.008', '0.056']
7. (Mix(5x5_MBConv6, 0.801), Identity)  # ['0.006', '0.151', '0.801', '0.000', '0.006', '0.036', '0.000']
8. (Mix(5x5_MBConv6, 0.384), Identity)  # ['0.000', '0.354', '0.384', '0.000', '0.258', '0.004', '0.000']
9. (Mix(5x5_MBConv6, 0.483), None)      # ['0.025', '0.366', '0.483', '0.009', '0.010', '0.106']
10. (Mix(5x5_MBConv6, 0.500), Identity) # ['0.000', '0.419', '0.500', '0.000', '0.044', '0.037', '0.000']
11. (Mix(5x5_MBConv6, 0.418), Identity) # ['0.000', '0.332', '0.418', '0.000', '0.003', '0.248', '0.000']
12. (Mix(5x5_MBConv6, 0.469), None)     # ['0.002', '0.337', '0.469', '0.004', '0.018', '0.169']
13. (Mix(5x5_MBConv3, 0.421), Identity) # ['0.000', '0.421', '0.412', '0.000', '0.003', '0.163', '0.000']
14. (Mix(5x5_MBConv3, 0.468), Identity) # ['0.000', '0.468', '0.428', '0.000', '0.001', '0.103', '0.000']

We can see that network always choise big kernel size and most expansion factor t is 6. Is it normal ?

Any suggestions ?

Download of pretrained model from baidu

Hello,
The download of pretrained model from baidu is problematic - it requires registration which is possible only for Chinese phone numbers
Can you suggest alternative source for downloading?
Thanks!

Muti-gpu search/retrain support ?

In papers, you mentioned that "All models are trained on 8 NVIDIA 2080 graphics cards. ".

When I run search code, I can use single GPU only, any suggestions?

loss

训练中出现
企业微信截图_16397157986891

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.