Giter Club home page Giter Club logo

sstd's Introduction

License

Single Shot Text Detector with Regional Attention

Introduction

SSTD is initially described in our ICCV 2017 spotlight paper.

A third-party implementation of SSTD + Focal Loss. Thanks, Ho taek Han

If you find it useful in your research, please consider citing:

@inproceedings{panhe17singleshot,
      Title   = {Single Shot Text Detector with Regional Attention},
      Author  = {He, Pan and Huang, Weilin and He, Tong and Zhu, Qile and Qiao, Yu and Li, Xiaolin},
      Note    = {Proceedings of Internatioanl Conference on Computer Vision (ICCV)},
      Year    = {2017}
      }
@inproceedings{panhe16readText,
      Title   = {Reading Scene Text in Deep Convolutional Sequences},
      Author  = {He, Pan and Huang, Weilin and Qiao, Yu and Loy, Chen Change and Tang, Xiaoou},
      Note    = {Proceedings of AAAI Conference on Artificial Intelligence, (AAAI)},
      Year    = {2016}
      }
@inproceedings{liu16ssd,
      Title   = {{SSD}: Single Shot MultiBox Detector},
      Author  = {Liu, Wei and Anguelov, Dragomir and Erhan, Dumitru and Szegedy, Christian and Reed, Scott and Fu, Cheng-Yang and Berg, Alexander C.},
      Note    = {Proceedings of European Conference on Computer Vision (ECCV)},
      Year    = {2016}
      }

Installation

  1. Get the code. We will call the directory that you cloned Caffe into $CAFFE_ROOT
git clone https://github.com/BestSonny/SSTD.git
cd SSTD
  1. Build the code. Please follow Caffe instruction to install all necessary packages and build it.
# Modify Makefile.config according to your Caffe installation.
cp Makefile.config.example Makefile.config
make -j8
# Make sure to include $CAFFE_ROOT/python to your PYTHONPATH.
make py
make test -j8
# (Optional)
make runtest -j8
# build nms
cd examples/text
make
cd ..
  1. Run the demo code. Download Model google drive, baiduyun and put it in text/model folder
cd examples
sh text/download.sh
mkdir text/result
python text/demo_test.py

sstd's People

Contributors

bado-lee avatar bestsonny avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sstd's Issues

i had cudnn version error

my cuda is 8.0 and cudnn is 5,
but when i compile this code failed.

====
CXX .build_release/src/caffe/proto/caffe.pb.cc
CXX src/caffe/blob.cpp
In file included from ./include/caffe/util/device_alternate.hpp:40:0,
from ./include/caffe/common.hpp:19,
from ./include/caffe/blob.hpp:8,
from src/caffe/blob.cpp:4:
./include/caffe/util/cudnn.hpp: In function ‘void caffe::cudnn::setConvolutionDesc(cudnnConvolutionStruct**, cudnnTensorDescriptor_t, cudnnFilterDescriptor_t, int, int, int, int)’:
./include/caffe/util/cudnn.hpp:112:3: error: too few arguments to function ‘cudnnStatus_t cudnnSetConvolution2dDescriptor(cudnnConvolutionDescriptor_t, int, int, int, int, int, int, cudnnConvolutionMode_t, cudnnDataType_t)’
CUDNN_CHECK(cudnnSetConvolution2dDescriptor(*conv,
^
In file included from ./include/caffe/util/cudnn.hpp:5:0,
from ./include/caffe/util/device_alternate.hpp:40,
from ./include/caffe/common.hpp:19,
from ./include/caffe/blob.hpp:8,
from src/caffe/blob.cpp:4:
/usr/local/cuda/include/cudnn.h:537:27: note: declared here
cudnnStatus_t CUDNNWINAPI cudnnSetConvolution2dDescriptor( cudnnConvolutionDescriptor_t convDesc,
^
Makefile:577: recipe for target '.build_release/src/caffe/blob.o' failed
make: *** [.build_release/src/caffe/blob.o] Error 1

====

please told me your cudnn and cuda version

Train own model

Could you please give brief instructions regarding how one can train their own model?

cpu running

As your Algorithm requires Opencv 3+ I had to create a VM as I don't want it to interfere with my 2.4 on the regular operating system. However, the capacity of my GPU is not enough to run a GPU calculation within the VM (or the passthrough does not work properly) therefore I wanted to ask if there is an opportunity to run the code on my images, only on CPU?

BR
Valentin

opencv3?

Excuse me, I wanted to ask if I need opencv3 for this repository? I am getting errors which apparently are related to opencv3. I am using only opencv 2.4.9.
And also do I have to use the caffe u ship? I have a caffe version on my ubuntu installation. Is yours modified somehow?

'make' command generates

~/Desktop/A_M-arbeit/G_Code/G_SSTD$ make -j8
LD -o .build_release/lib/libcaffe.so.1.0.0-rc3
/usr/bin/ld: cannot find -lopencv_imgcodecs
/usr/bin/ld: cannot find -lopencv_videoio

when I run python text/demp_test.py,a error occured

warn("The default mode, 'constant', will be changed to 'reflect' in "
F1206 10:15:04.766489 447 syncedmem.cpp:56] Check failed: error == cudaSuccess (2 vs. 0) out of memory
*** Check failure stack trace: ***
Aborted (core dumped)
How should I do for this?

when uses more than 2 graphic card to run, it brings problem.

@BestSonny ,Hi. when i use more than 2 graphic card to run , it wrong.
like following:
Check failed: error == cudaSuccess (77 vs. 0) an illegal memory access was encountered.
Check failed: error == status == CUBLAS_STATUS_SUCCESS (11 vs. 0)CUBLAS_STATUS_MAPPING_ERROR

I guess that, maybe "Annotated_mask_data_layer" is something wrong. but i am not sure.
Have met the problem? please give me some advice. Thanks in advance.

how to convert data (look like [image, mask, bbox_label])into lmdb?

#hi,BestSonny. Thanks for your good shared code. when i use the tool "convert_annoset_mask" to convert database into lmdb, i was OK. But when i train my net, i get a problem. It seems the tool "convert_annoset_mask" convert mask(single channel) into 3 channels. I don't knew where i did wrong.
could you tell me why or share your "convert" shell ?
Here is my "convert" command:

./build/tools/convert_annoset_mask --anno_type=detection --label_type=xml --label_map_file=/home/shi/caffe-ssd/data/VOC0712/labelmap_voc.prototxt --check_label=True --min_dim=0 --max_dim=0 --resize_height=0 --resize_width=0 --backend=lmdb --shuffle=False --check_size=False --encode_type=jpg --encoded=True --gray=False /home/shi/data/ ### /home/shi/data/all_train.txt /home/shi/data/VOC12_AUG/lmdb/VOC12_train_lmdb

here is the training wrong what showed in Caffe:

I0105 14:35:56.477653 16084 net.cpp:100] Creating Layer seg_loss

I0105 14:35:56.477654 16084 net.cpp:434] seg_loss <- upscore
I0105 14:35:56.477658 16084 net.cpp:434] seg_loss <- mask
I0105 14:35:56.477661 16084 net.cpp:408] seg_loss -> seg_loss
I0105 14:35:56.477670 16084 layer_factory.hpp:77] Creating layer seg_loss
I0105 14:35:56.485399 16084 softmax_loss_layer.cpp:47] softmaxwithloss bottom[0] size: 2,21,320,320
I0105 14:35:56.485427 16084 softmax_loss_layer.cpp:50] softmaxwithloss bottom[1] size: 2,3,320,320
F0105 14:35:56.485437 16084 softmax_loss_layer.cpp:53] Check failed: outer_num_ * inner_num_ == bottom[1]->count() (204800 vs. 614400) Number of labels must match number of predictions; e.g., if softmax axis == 1 and prediction shape is (N, C, H, W), label count (number of labels) must be NHW, with integer values in {0, 1, ..., C-1}.

'module' object has no attribute 'LabelMap'

I am trying to run demo_test.py, however, I'm getting the following error:

'module' object has no attribute 'LabelMap'

This error happens on the line: labelmap = caffe_pb2.LabelMap()

I have successfully built and installed caffe from master branch https://github.com/BVLC/caffe in /home/ubuntu/caffe/python and I can successfully execute import caffe and from caffe.proto import caffe_pb2

My PYTHONPATH is:

ubuntu@ip-10-0-0-14:~/caffe/python$ echo $PYTHONPATH
/home/ubuntu:/home/ubuntu/caffe/python

Can someone please let me know what I'm missing?

Oriented Text detection

Hi, Thank you very much for sharing your code.
Currently I'm trying to reproduce result of ICDAR2015 in your paper but I cannot find prototxt for oriented texts (especifically for ICDAR2015 sets).
It would be wonderful if you can share the model and pre-trained model for oriented texts.

Training code

so nice to share the code. but I can not find the training code here, could you please upload the training code?

when add segmentation result into feature map which is used to detect, my model didn't converge.

@BestSonny Hi.
Now, I train a net to segment and detect simultaneously on VOC2012. I use MaskResize and MaskPooling to fuse conv feature maps with segmentation result (1 - segment_result[0]). But the model didin't converge.
So, add BN layer into net and reduce learning rate to a small number. But the model didn't converge as well .
Have you met the problem when you trained your model? Could you give some advice ?

Following is part of training log:

I0109 13:39:28.412003 21535 net.cpp:761] Ignoring source layer fc8
I0109 13:39:28.412008 21535 net.cpp:761] Ignoring source layer prob
I0109 13:39:29.777668 21535 solver.cpp:243] Iteration 0, loss = 26.1135
I0109 13:39:29.777715 21535 solver.cpp:259] Train net output #0: mbox_loss = 22.8622 (* 1 = 22.8622 loss)
I0109 13:39:29.777726 21535 solver.cpp:259] Train net output #1: seg_loss = 3.18038 (* 1 = 3.18038 loss)
I0109 13:39:29.777734 21535 sgd_solver.cpp:138] Iteration 0, lr = 0.001
I0109 13:40:46.876418 21535 solver.cpp:243] Iteration 10, loss = 16.8931
I0109 13:40:46.876471 21535 solver.cpp:259] Train net output #0: mbox_loss = 14.9162 (* 1 = 14.9162 loss)
I0109 13:40:46.876482 21535 solver.cpp:259] Train net output #1: seg_loss = 1.84232 (* 1 = 1.84232 loss)
I0109 13:40:46.876493 21535 sgd_solver.cpp:138] Iteration 10, lr = 0.001
I0109 13:41:01.436628 21535 solver.cpp:243] Iteration 20, loss = 16.8339
I0109 13:41:01.436683 21535 solver.cpp:259] Train net output #0: mbox_loss = 15.0192 (* 1 = 15.0192 loss)
I0109 13:41:01.436695 21535 solver.cpp:259] Train net output #1: seg_loss = 2.32088 (* 1 = 2.32088 loss)
I0109 13:41:01.436707 21535 sgd_solver.cpp:138] Iteration 20, lr = 0.001
I0109 13:41:15.868654 21535 solver.cpp:243] Iteration 30, loss = nan
I0109 13:41:15.868707 21535 solver.cpp:259] Train net output #0: mbox_loss = nan (* 1 = nan loss)
I0109 13:41:15.868721 21535 solver.cpp:259] Train net output #1: seg_loss = 87.3365 (* 1 = 87.3365 loss)

Where is your code of auxiliary loss?

The paper said: We introduce an auxiliary loss which provides a direct and detailed supervision of text via a binary mask that indicates text or non-text at each pixel location
How does it be realized ?

Model details different from original paper

Hi there,

I read the original paper and this implementation, and that's awsome!

However, I have a question concerning the details of SSTD net, and I'm really looking forward to see you reply:)

(1) In the deconvolution part, I see that you use groups=64 to upsample. But generally speaking, groups=1 might be more reasonale, so I guess it's for saving computational complexity? Or is there any other reasons?

(2) The original paper uses deconv3_3, conv1_1 to establish attention map. I see that you're using deconv16_16 and two conv3_3 to do it. Does it mean that this implementation is better than that in the original paper?

It's a very nice code and I really appretite your comment!

Thanks

how to run a demo based on downloaded model?

#9 suggests that the demo_test.py file can not be run on a CPU because it takes a long time. It was my understanding that once the model has been built, it can be used to be run on a CPU and still perform fast.

I'm wondering how would someone put this model in production if even after a built model (demo.caffemodel) running the demo on a single image takes a long time?

I can't use python to create a MaskResize layer

     i build caffe src code with maskresize  layer  successfully. but  i can't  use python to create a MaskResize layer.  Could you tell me what should i do to correct it ?

Following is my python code:

name = '{}_mask_resize'.format(from_layers[i])

        mask_resize_param = {
                'output_height': 1,
                'output_width': 1,
                'factor_height': factors[i],
                'factor_width': factors[i],
                }
        net[name] = L.MaskRisize(net.slice1, mask_resize_param=mask_resize_param)

run the python code , it shows :

I0108 23:21:11.590718 20850 layer_factory.hpp:77] Creating layer conv4_3_mask_resize

F0108 23:21:11.590739 20850 layer_factory.hpp:81] Check failed: registry.count(type) == 1 (0 vs. 1) Unknown layer type: MaskRisize (known types: AbsVal, Accuracy, AnnotatedData, AnnotatedDataMask, ArgMax, BNLL, BatchNorm, BatchReindex, Bias, Concat, ContrastiveLoss, Convolution, Crop, Data, Deconvolution, DetectionEvaluate, DetectionOutput, Dropout, DummyData, ELU, Eltwise, Embed, EuclideanLoss, Exp, Filter, Flatten, HDF5Data, HDF5Output, HingeLoss, Im2col, ImageData, InfogainLoss, InnerProduct, Input, LRN, LSTM, LSTMUnit, Log, MVN, MaskPooling, MaskResize, MemoryData, MultiBoxLoss, MultinomialLogisticLoss, Normalize, PReLU, Parameter, Permute, Pooling, Power, PriorBox, Python, RNN, ReLU, Reduction, Reshape, SPP, Scale, Sigmoid, SigmoidCrossEntropyLoss, Silence, Slice, SmoothL1Loss, Softmax, SoftmaxWithLoss, Split, TanH, Threshold, Tile, VideoData, WindowData)
*** Check failure stack trace: ***

Attempt to run CPU only version

/home/harsh/mlocr/SSTD [' step: 8', ' step: 16', ' step: 32', ' step: 64', ' step: 78.2222222222', ' step: 100.571428571', ' step: 140.8'] found at line: 3942 found at line: 4363 found at line: 4784 found at line: 5205 found at line: 5626 found at line: 6047 found at line: 6468 WARNING: Logging before InitGoogleLogging() is written to STDERR F0303 10:59:26.766597 25916 common.cpp:66] Cannot use GPU in CPU-only Caffe: check mode. *** Check failure stack trace: *** Aborted (core dumped)
Any ideas? I can't figure out why

Error with nms?

I am gettin following error when running dem_test:

File "text/demo_test.py", line 21, in
from nms.gpu_nms import gpu_nms
ImportError: No module named gpu_nm

and the contained files in nms are:

cpu_nms.pyx gpu_nms.pyx nms_kernel.cu
gpu_nms.hpp init.py py_cpu_nms.py

I also executed:
sh text/download.sh before starting demo_test.py
Best!

Valentin

About GPU memory Usage

Hello @BestSonny .
Thank you for your contribution.
i want to port your software to mobile. So currently,
how many MB of GPU memory is used during test time.

怎么在没有GPU支持下,使用你们的应用?

在这一步,make test -j8,我们因为没有cuda支持,已经开始编译失败。。。
尝试跳过这步后,尝试编译nms时,也会报错,因为没有nvcc环境,你们有仅支持CPU使用的版本么?

maximal image size

Dear @BestSonny what is the maximum picture size I am able to put in without being resized? Can the resizing be disabled? I did try but it gave me an error
Check failed: shape[i] <= 0x7fffffff / count_ (3448 vs. 811) blob size exceeds INT_MAX
When resizing is disabled which is the max. img-size I can feed?

BR

Valentin

CPU-only mode

Hello,

How may I run the demo_test.py in CPU-only mode in Python3? I changed the caffe.proto solver mode to CPU as well.

Here is my makefile.config file:

Refer to http://caffe.berkeleyvision.org/installation.html

Contributions simplifying and improving our build system are welcome!

cuDNN acceleration switch (uncomment to build with cuDNN).

USE_CUDNN := 1

CPU-only switch (uncomment to build without GPU support).

CPU_ONLY := 1

uncomment to disable IO dependencies and corresponding data layers

USE_OPENCV := 1
USE_LEVELDB := 1
USE_LMDB := 1

uncomment to allow MDB_NOLOCK when reading LMDB files (only if necessary)

You should not set this flag if you will be reading LMDBs with any

possibility of simultaneous read and write

ALLOW_LMDB_NOLOCK := 1

Uncomment if you're using OpenCV 3

OPENCV_VERSION := 3

To customize your choice of compiler, uncomment and set the following.

N.B. the default for Linux is g++ and the default for OSX is clang++

CUSTOM_CXX := g++

CUDA directory contains bin/ and lib/ directories that we need.

#CUDA_DIR := /usr/local/cuda

On Ubuntu 14.04, if cuda tools are installed via

"sudo apt-get install nvidia-cuda-toolkit" then use this instead:

CUDA_DIR := /usr

CUDA architecture setting: going with all of them.

For CUDA < 6.0, comment the *_35 lines for compatibility.

#CUDA_ARCH := -gencode arch=compute_20,code=sm_20
-gencode arch=compute_20,code=sm_21
-gencode arch=compute_30,code=sm_30
-gencode arch=compute_35,code=sm_35
-gencode arch=compute_50,code=sm_50
-gencode arch=compute_52,code=sm_52
-gencode arch=compute_61,code=sm_61

BLAS choice:

atlas for ATLAS (default)

mkl for MKL

open for OpenBlas

BLAS := atlas

BLAS := open

Custom (MKL/ATLAS/OpenBLAS) include and lib directories.

Leave commented to accept the defaults for your choice of BLAS

(which should work)!

BLAS_INCLUDE := /path/to/your/blas

BLAS_LIB := /path/to/your/blas

Homebrew puts openblas in a directory that is not on the standard search path

BLAS_INCLUDE := $(shell brew --prefix openblas)/include

BLAS_LIB := $(shell brew --prefix openblas)/lib

This is required only if you will compile the matlab interface.

MATLAB directory should contain the mex binary in /bin.

MATLAB_DIR := /usr/local

MATLAB_DIR := /Applications/MATLAB_R2012b.app

NOTE: this is required only if you will compile the python interface.

We need to be able to find Python.h and numpy/arrayobject.h.

#PYTHON_INCLUDE := /usr/include/python2.7
/usr/lib/python2.7/dist-packages/numpy/core/include

Anaconda Python distribution is quite popular. Include path:

Verify anaconda location, sometimes it's in root.

ANACONDA_HOME := $(HOME)/anaconda2

PYTHON_INCLUDE := $(ANACONDA_HOME)/include \

	$(ANACONDA_HOME)/include/python2.7 \
	$(ANACONDA_HOME)/lib/python2.7/site-packages/numpy/core/include \

Uncomment to use Python 3 (default is Python 2)

PYTHON_LIBRARIES := boost_python3 python3.6m
PYTHON_INCLUDE := /usr/include/python3.6m
/home/bora/.local/lib/python3.6/site-packages/numpy/core/include

We need to be able to find libpythonX.X.so or .dylib.

PYTHON_LIB := /usr/lib

PYTHON_LIB := $(ANACONDA_HOME)/lib

Homebrew installs numpy in a non standard path (keg only)

PYTHON_INCLUDE += $(dir $(shell python -c 'import numpy.core; print(numpy.core.file)'))/include

PYTHON_LIB += $(shell brew --prefix numpy)/lib

Uncomment to support layers written in Python (will link against Python libs)

WITH_PYTHON_LAYER := 1

Whatever else you find you need goes here.

INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include /usr/include/hdf5/serial/
LIBRARY_DIRS := $(PYTHON_LIB) /usr/local/lib /usr/lib /usr/lib/x86_64-linux-gnu/hdf5/serial/

If Homebrew is installed at a non standard location (for example your home directory) and you use it for general dependencies

INCLUDE_DIRS += $(shell brew --prefix)/include

LIBRARY_DIRS += $(shell brew --prefix)/lib

Uncomment to use pkg-config to specify OpenCV library paths.

(Usually not necessary -- OpenCV libraries are normally installed in one of the above $LIBRARY_DIRS.)

USE_PKG_CONFIG := 1

N.B. both build and distribute dirs are cleared on make clean

BUILD_DIR := build
DISTRIBUTE_DIR := distribute

Uncomment for debugging. Does not work on OSX due to BVLC/caffe#171

DEBUG := 1

The ID of the GPU that 'make runtest' will use to run unit tests.

#TEST_GPUID := 0

enable pretty build (comment to see full commands)

Q ?= @

EXCLUDES_FOLDERS := examples/text/nms/

model无法下载,download.sh文件丢失

你好,demo程序都正确编译成功。
只是给出的链接我这边一直打不开,model文件无法下载,还有text/download.sh文件也没找到。。。能否给个其他链接,如百度网盘?

Orientation in cuda implementation

SSTD article says:
"we use a softmax function for binary classification of text or non-text, and apply the smooth-l1 loss for regressing 5 parameters for each word bounding box, including a parameter for box orientation"

I can find the box orientation parameter in the .cpp implementation
But I cannot find the box orientation parameter in the .cu implementation

The .cpp implementation (on CPU) cannot run because "mask_resize_layer.cpp:42] Not Implemented Yet"

Is there a way to get orientation from .cu implementation?

Have you already implemented the mask_resize_layer.cpp?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.