Giter Club home page Giter Club logo

text-image-augmentation's Introduction

Text Image Augmentation

Build Status

A general geometric augmentation tool for text images in the CVPR 2020 paper "Learn to Augment: Joint Data Augmentation and Network Optimization for Text Recognition". We provide the tool to avoid overfitting and gain robustness of text recognizers.

Note that this is a general toolkit. Please customize for your specific task. If the repo benefits your work, please cite the papers.

News

  • 2020-02 The paper "Learn to Augment: Joint Data Augmentation and Network Optimization for Text Recognition" was accepted to CVPR 2020. It is a preliminary attempt for smart augmentation.

  • 2019-11 The paper "Decoupled Attention Network for Text Recognition" (Paper Code) was accepted to AAAI 2020. This augmentation tool was used in the experiments of handwritten text recognition.

  • 2019-04 We applied this tool in the ReCTS competition of ICDAR 2019. Our ensemble model won the championship.

  • 2019-01 The similarity transformation was specifically customized for geomeric augmentation of text images.

Requirements

We recommend Anaconda to manage the version of your dependencies. For example:

     conda install boost=1.67.0

Installation

Build library:

    mkdir build
    cd build
    cmake -D CUDA_USE_STATIC_CUDA_RUNTIME=OFF ..
    make

Copy the Augment.so to the target folder and follow demo.py to use the tool.

    cp Augment.so ..
    cd ..
    python demo.py

Demo

  • Distortion

  • Stretch

  • Perspective

Speed

To transform an image with size (H:64, W:200), it takes less than 3ms using a 2.0GHz CPU. It is possible to accelerate the process by calling multi-process batch samplers in an on-the-fly manner, such as setting "num_workers" in PyTorch.

Improvement for Recognition

We compare the accuracies of CRNN trained using only the corresponding small training set.

Dataset IIIT5K IC13 IC15
Without Data Augmentation 40.8% 6.8% 8.7%
With Data Augmentation 53.4% 9.6% 24.9%

Citation

@inproceedings{luo2020learn,
  author = {Canjie Luo and Yuanzhi Zhu and Lianwen Jin and Yongpan Wang},
  title = {Learn to Augment: Joint Data Augmentation and Network Optimization for Text Recognition},
  booktitle = {CVPR},
  year = {2020}
}

@inproceedings{wang2020decoupled,
  author = {Tianwei Wang and Yuanzhi Zhu and Lianwen Jin and Canjie Luo and Xiaoxue Chen and Yaqiang Wu and Qianying Wang and Mingxiang Cai}, 
  title = {Decoupled attention network for text recognition}, 
  booktitle ={AAAI}, 
  year = {2020}
}

@article{schaefer2006image,
  title={Image deformation using moving least squares},
  author={Schaefer, Scott and McPhail, Travis and Warren, Joe},
  journal={ACM Transactions on Graphics (TOG)},
  volume={25},
  number={3},
  pages={533--540},
  year={2006},
  publisher={ACM New York, NY, USA}
}

Acknowledgment

Thanks for the contribution of the following developers.

@keeofkoo

@cxcxcxcx

@Yati Sagade

Attention

The tool is only free for academic research purposes.

text-image-augmentation's People

Contributors

canjie-luo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

text-image-augmentation's Issues

CMake fail

Thanks for your code, but when I compiled the code according to the readme.md, I meet the following error.

Could NOT find PythonLibs (missing: PYTHON_LIBRARIES PYTHON_INCLUDE_DIRS)

default

About the agent updating and initialization

I have two questions about the nice paper "Learn to Augment: Joint Data Augmentation and Network Optimization
for Text Recognition":

1.  In LIne 9 of the Algorithm 1, why the Agent network update towards -S'? I don't understand why -S' is a harder moving state.
2. As for the agent initialization, what is the initialization direction of the 2*(N+1) fiducial points?

running into problems during make

Cloned the repo and tried building it.
Used the following command
cmake .. -D CUDA_USE_STATIC_CUDA_RUNTIME=OFF -DPYTHON_INCLUDE_DIR=$(python -c "from distutils.sysconfig import get_python_inc; print(get_python_inc())") -DPYTHON_LIBRARY=$(python -c "import distutils.sysconfig as sysconfig; print(sysconfig.get_config_var('LIBDIR'))")

During make, getting this error log
In file included from /usr/include/python2.7/numpy/ndarraytypes.h:1809:0,
from /usr/include/python2.7/numpy/ndarrayobject.h:18,
from /y/x/Text-Image-Augmentation/include/conversion.h:8,
from /y/x/Text-Image-Augmentation/src/conversion.cpp:1:
/usr/include/python2.7/numpy/npy_1_7_deprecated_api.h:15:2: warning: #warning "Using deprecated NumPy API, disable it by " "#defining NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-Wcpp]
#warning "Using deprecated NumPy API, disable it by "
^~~~~~~
/y/x/Text-Image-Augmentation/src/conversion.cpp:119:16: error: cannot declare variable 'g_numpyAllocator' to be of abstract type 'NumpyAllocator'
NumpyAllocator g_numpyAllocator;
^~~~~~~~~~~~~~~~
/y/x/Text-Image-Augmentation/src/conversion.cpp:64:7: note: because the following virtual functions are pure within 'NumpyAllocator':
class NumpyAllocator : public MatAllocator
^~~~~~~~~~~~~~
In file included from /usr/include/opencv2/core.hpp:59:0,
from /usr/include/opencv2/imgproc.hpp:46,
from /usr/include/opencv2/imgproc/imgproc.hpp:48,
from /y/x/Text-Image-Augmentation/include/conversion.h:5,
from /y/x/Text-Image-Augmentation/src/conversion.cpp:1:
/usr/include/opencv2/core/mat.hpp:417:23: note: virtual cv::UMatData* cv::MatAllocator::allocate(int, const int*, int, void*, size_t*, int, cv::UMatUsageFlags) const
virtual UMatData* allocate(int dims, const int* sizes, int type,
^~~~~~~~
/usr/include/opencv2/core/mat.hpp:419:18: note: virtual bool cv::MatAllocator::allocate(cv::UMatData*, int, cv::UMatUsageFlags) const
virtual bool allocate(UMatData* data, int accessflags, UMatUsageFlags usageFlags) const = 0;
^~~~~~~~
/usr/include/opencv2/core/mat.hpp:420:18: note: virtual void cv::MatAllocator::deallocate(cv::UMatData*) const
virtual void deallocate(UMatData* data) const = 0;
^~~~~~~~~~
/y/x/Text-Image-Augmentation/src/conversion.cpp: In member function 'cv::Mat NDArrayConverter::toMat(const PyObject*)':
/y/x/Text-Image-Augmentation/src/conversion.cpp:202:11: error: 'class cv::Mat' has no member named 'refcount'
m.refcount = refcountFromPyObject(o);
^~~~~~~~
/y/x/Text-Image-Augmentation/src/conversion.cpp: In member function 'PyObject* NDArrayConverter::toNDArray(const cv::Mat&)':
/y/x/Text-Image-Augmentation/src/conversion.cpp:223:12: error: 'class cv::Mat' has no member named 'refcount'
if(!p->refcount || p->allocator != &g_numpyAllocator)
^~~~~~~~
/y/x/Text-Image-Augmentation/src/conversion.cpp:230:36: error: 'class cv::Mat' has no member named 'refcount'
return pyObjectFromRefcount(p->refcount);
^~~~~~~~
CMakeFiles/Augment.dir/build.make:75: recipe for target 'CMakeFiles/Augment.dir/src/conversion.cpp.o' failed
make[2]: *** [CMakeFiles/Augment.dir/src/conversion.cpp.o] Error 1
CMakeFiles/Makefile2:72: recipe for target 'CMakeFiles/Augment.dir/all' failed
make[1]: *** [CMakeFiles/Augment.dir/all] Error 2
Makefile:83: recipe for target 'all' failed
make: *** [all] Error 2

python version - 2.7.17
opencv - 3.3.0
numpy - 1.13.3

Why oldDotL are set by DstPoints ?

Could you help explain the following contradiction?

According to the paper, w_k is defined with respect to the fiducial point(control point) p_k, and hence oldDotL should represent the fiducial point here:

w[k] = 1 / ((i - oldDotL[k].x) * (i - oldDotL[k].x) +
(j - oldDotL[k].y) * (j - oldDotL[k].y));

But instead oldDotL are set with the deformed positions:

void ImgWarp_MLS::setDstPoints(const vector<Point_<int> > &qdst) {
nPoint = qdst.size();
oldDotL.clear();
oldDotL.reserve(nPoint);
for (size_t i = 0; i < qdst.size(); i++) oldDotL.push_back(qdst[i]);
}

qdst.push_back(Point(rand()%threshold, rand()%threshold));
qdst.push_back(Point(img_input.cols-rand()%threshold, rand()%threshold));
qdst.push_back(Point(img_input.cols-rand()%threshold, img_input.rows-rand()%threshold));
qdst.push_back(Point(rand()%threshold, img_input.rows-rand()%threshold));
for (int i = 1; i < segment; i++){
qsrc.push_back(Point(cut*i, 0));
qsrc.push_back(Point(cut*i, img_input.rows));
qdst.push_back(Point(cut*i+rand()%threshold-0.5*threshold, rand()%threshold-0.5*threshold));
qdst.push_back(Point(cut*i+rand()%threshold-0.5*threshold, img_input.rows+rand()%threshold-0.5*threshold));
}
cv::Mat result = trans1.setAllAndGenerate(img_input, qsrc, qdst, img_input.cols, img_input.rows);

About Joint Training

Thanks for your work, but I am wondering how to joint training, which is mentioned in the paper (Algorithm 1 Joint Learning Scheme).

undefined symbol: _ZN2cv6formatB5cxx11EPKcz

作者您好!
我已经生成了Augment.so文件,在运行脚本时出现了这个错误,不知道作者知道可能是什么引起的吗?感谢!
ImportError: /home/sun/sunny/projects/Decoupled-attention-network/Scene-Text-Image-Transformer/Augment.so: undefined symbol: _ZN2cv6formatB5cxx11EPKcz

i got some trouble in 'make'

[ 12%] Building CXX object CMakeFiles/Augment.dir/src/conversion.cpp.o
In file included from /home/fbas/下载/Scene-Text-Image-Transformer-master/src/conversion.cpp:1:0:
/home/fbas/下载/Scene-Text-Image-Transformer-master/include/conversion.h:8:33: fatal error: numpy/ndarrayobject.h: 没有那个文件或目录
compilation terminated.
CMakeFiles/Augment.dir/build.make:62: recipe for target 'CMakeFiles/Augment.dir/src/conversion.cpp.o' failed
make[2]: *** [CMakeFiles/Augment.dir/src/conversion.cpp.o] Error 1
CMakeFiles/Makefile2:67: recipe for target 'CMakeFiles/Augment.dir/all' failed
make[1]: *** [CMakeFiles/Augment.dir/all] Error 2
Makefile:83: recipe for target 'all' failed
make: *** [all] Error 2

cmake problem

我使用的测试环境是ubuntu 16.04, 没有按照说明使用anaconda安装boost,结果可以编译,成功生成了Argument.so这个文件

但是到服务器上Centos7.4, 使用同样的办法就不行了,我想如果不是boost1.67安装出了问题,那就是cmake Error

boost 安装过程:

  1. down load boost_1_67_0.tar.gz
  2. extract file and cd it
  3. ./bootstrap.sh --with-libraries=all --with-python=/home/kongtianning/anaconda3/envs/python2712/bin/python --with-python-version=2.7 --with-python-root=/home/kongtianning/anaconda3/envs/python2712 --prefix=/home/kongtianning/myboost
  4. ./b2
  5. ./b2 install

接下来我按照你说的做, 在ubuntu上面用系统自带的python2 没问题 但是在Centos上就不行
mkdir build
cd build
cmake -D CUDA_USE_STATIC_CUDA_RUNTIME=OFF .. 

在Centos下cmake 命令我是这样用的

cmake -DPYTHON_INCLUDE_DIR=/home/kongtianning/anaconda3/envs/python2712/include/python2.7 -DPYTHON_LIBRARY=/home/kongtianning/anaconda3/envs/python2712/lib/ -DPYTHON_EXECUTABLE=/home/kongtianning/anaconda3/envs/python2712/bin/python -D CUDA_USE_STATIC_CUDA_RUNTIME=OFF ..

结果返回是 找不到 boost_python

CMake Error at /usr/share/cmake/Modules/FindBoost.cmake:1138 (message):
Unable to find the requested Boost libraries.

Boost version: 1.67.0

Boost include path: /usr/local/include

Could not find the following Boost libraries:

      boost_python

No Boost libraries were found. You may need to set BOOST_LIBRARYDIR to the
directory containing Boost libraries or BOOST_ROOT to the location of
Boost.
Call Stack (most recent call first):
CMakeLists.txt:18 (find_package)

-- Configuring incomplete, errors occurred!
See also "/home/kongtianning/PycharmProjects/HanWangProJectPython/imageAugment/build/CMakeFiles/CMakeOutput.log".

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.