Giter Club home page Giter Club logo

textboxes's Introduction

TextBoxes: A Fast Text Detector with a Single Deep Neural Network

Introduction

This paper presents an end-to-end trainable fast scene text detector, named TextBoxes, which detects scene text with both high accuracy and efficiency in a single network forward pass, involving no post-process except for a standard nonmaximum suppression. For more details, please refer to our paper.

Citing TextBoxes

Please cite TextBoxes in your publications if it helps your research:

@inproceedings{LiaoSBWL17,
  author    = {Minghui Liao and
               Baoguang Shi and
               Xiang Bai and
               Xinggang Wang and
               Wenyu Liu},
  title     = {TextBoxes: {A} Fast Text Detector with a Single Deep Neural Network},
  booktitle = {AAAI},
  year      = {2017}
}

Contents

  1. Installation
  2. Download
  3. Test
  4. Train
  5. Performance

Installation

  1. Get the code. We will call the directory that you cloned Caffe into $CAFFE_ROOT
git clone https://github.com/MhLiao/TextBoxes.git

cd TextBoxes

make -j8

make py

Download

  1. Models trained on ICDAR 2013: Dropbox link BaiduYun link
  2. Fully convolutional reduced (atrous) VGGNet: Dropbox link BaiduYun link
  3. Compiled mex file for evaluation(for multi-scale test evaluation: evaluation_nms.m): Dropbox link BaiduYun link

Test

  1. run "python examples/demo.py".
  2. You can modify the "use_multi_scale" in the "examples/demo.py" script to control whether to use multi-scale or not.
  3. The results are saved in the "examples/results/".

Train

  1. Train about 50k iterions on Synthetic data which refered in the paper.
  2. Train about 2k iterions on corresponding training data such as ICDAR 2013 and SVT.
  3. For more information, such as learning rate setting, please refer to the paper.

Performance

  1. Using the given test code, you can achieve an F-measure of about 80% on ICDAR 2013 with a single scale.
  2. Using the given multi-scale test code, you can achieve an F-measure of about 85% on ICDAR 2013 with a non-maximum suppression.
  3. More performance information, please refer to the paper and Task1 and Task4 of Challenge2 on the ICDAR 2015 website: http://rrc.cvc.uab.es/?ch=2&com=evaluation

Data preparation for training

The reference xml file is as following:

    <?xml version="1.0" encoding="utf-8"?>
    <annotation>
        <object>
            <name>text</name>
            <bndbox>
                <xmin>158</xmin>
                <ymin>128</ymin>
                <xmax>411</xmax>
                <ymax>181</ymax>
            </bndbox>
        </object>
        <object>
            <name>text</name>
            <bndbox>
                <xmin>443</xmin>
                <ymin>128</ymin>
                <xmax>501</xmax>
                <ymax>169</ymax>
            </bndbox>
        </object>
        <folder></folder>
        <filename>100.jpg</filename>
        <size>
            <width>640</width>
            <height>480</height>
            <depth>3</depth>
        </size>
    </annotation>

Please let me know if you encounter any issues.

textboxes's People

Contributors

cypof avatar dgolden1 avatar ducha-aiki avatar eelstork avatar erictzeng avatar flx42 avatar fyu avatar jamt9000 avatar jeffdonahue avatar jyegerlehner avatar kkhoot avatar kloudkl avatar longjon avatar lukeyeager avatar mavenlin avatar mhliao avatar mohomran avatar mtamburrano avatar netheril96 avatar philkr avatar qipeng avatar rbgirshick avatar ronghanghu avatar sergeyk avatar sguada avatar shelhamer avatar tnarihi avatar weiliu89 avatar yangqing avatar yosinski avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.