mhliao / textboxes Goto Github PK
View Code? Open in Web Editor NEWThis project forked from weiliu89/caffe
TextBoxes: A Fast Text Detector with a Single Deep Neural Network
Home Page: https://github.com/MhLiao/TextBoxes
License: Other
This project forked from weiliu89/caffe
TextBoxes: A Fast Text Detector with a Single Deep Neural Network
Home Page: https://github.com/MhLiao/TextBoxes
License: Other
你好,TextBoxes在横向字符有很好的表现,但是我在ssd路径下运行TextBoxes的model, 出现了错误,与SSD网络相比,TextBoxes在data层缺少了distort_param,在loss缺少了ignor_cross_boudary_bbox与mini_type参数,我猜TextBoxes使用的ssd是否是老版ssd
错误是:
Check failed:num_priors_*num_loc_class_*4==bottom[0]->channels()(295112 vs.590224) Number of priors must match number of location predictions
nd@nd-Z97X-UD3H:~/yao/TextBoxes$ make -j8
CXX src/caffe/data_transformer.cpp
CXX src/caffe/util/sampler.cpp
CXX src/caffe/util/upgrade_proto.cpp
CXX src/caffe/util/bbox_util.cpp
CXX src/caffe/util/im2col.cpp
CXX src/caffe/util/im_transforms.cpp
CXX src/caffe/util/blocking_queue.cpp
CXX src/caffe/util/db.cpp
CXX src/caffe/util/io.cpp
CXX src/caffe/util/insert_splits.cpp
In file included from src/caffe/util/im_transforms.cpp:20:0:
./include/caffe/util/im_transforms.hpp:44:37: error: ‘ResizeParameter’ does not name a type
void UpdateBBoxByResizePolicy(const ResizeParameter& param,
^
./include/caffe/util/im_transforms.hpp:46:31: error: ‘NormalizedBBox’ has not been declared
NormalizedBBox* bbox);
^
./include/caffe/util/im_transforms.hpp:48:50: error: ‘ResizeParameter’ does not name a type
cv::Mat ApplyResize(const cv::Mat& in_img, const ResizeParameter& param);
^
./include/caffe/util/im_transforms.hpp:50:49: error: ‘NoiseParameter’ does not name a type
cv::Mat ApplyNoise(const cv::Mat& in_img, const NoiseParameter& param);
^
src/caffe/util/im_transforms.cpp:251:37: error: ‘ResizeParameter’ does not name a type
void UpdateBBoxByResizePolicy(const ResizeParameter& param
.
.
.
make: *** [.build_release/src/caffe/util/upgrade_proto.o] Error 1
^Cmake: *** [.build_release/src/caffe/util/io.o] Interrupt
make: *** [.build_release/src/caffe/data_transformer.o] Interrupt
make: *** [.build_release/src/caffe/util/sampler.o] Interrupt
make: *** [.build_release/src/caffe/util/blocking_queue.o] Interrupt
make: *** [.build_release/src/caffe/util/insert_splits.o] Interrupt
make: *** wait: No child processes. Stop.
Then I add caffe.pb.h in 'include/caffe/proto/', This error still appear.
I have followed your guide and when I run
python demo.py
Error occurs, error message shows as follow:
Traceback (most recent call last):
File "demo.py", line 8, in <module>
from nms import nms
File "/home/wangjianbo_i/TextBoxes/examples/TextBoxes/nms.py", line 3, in <module>
import shapely
ImportError: No module named shapely
Also, in your guide,
Test
run "python examples/demo.py".
May should be
python examples/TextBoxes/demo.py
What is the difference of TextBoxes and SSD? Is it just modify the default boxes?
Dear @MhLiao
I have tried to run test_icdar13_multi_scale.py but I have obtained the following issue
Could you please help me to resolve it
Hi,
I am trying to train TextBoxes on my custom dataset. The annotations are in pascal_voc format. I am getting following log for all the iterations so far (currently it is at 830 iteration) while training TextBoxes.
Train net output #0: mbox_loss = 0 (* 1 = 0 loss)
Somehow it does not look right to me. Can someone suggest what might be wrong here. I have done a code walk through and things seem right as far as annotations are concerned. I am debugging it further but any quick help will be appreciated. Thanks.
@MhLiao ,hello
I am training TextBoxes model to re-implementation the result in your paper on SynthText dataset.
So I have two questions:
Thanks in advance!
每次都遇到这个问题说找不到,要吐血了,请求支援
how to modify the gpu usage?(if my gpu is out of memory), when i run the test?
Hi...I am new in caffe and i just know basic working of caffe. For single character recognition we just provide image path and label in train.txt and val.txt file. Can anyone please tell me in Textboxes what is format of train.txt and val.txt for creating lmdb?
My train.txt file contains image path and annotation file path for coco dataset is it right ? and how can we open lmdb file which contains annotations ?
I used following code for reading LMDB file
import caffe
import lmdb
import numpy as np
import cv2
import matplotlib.pyplot as plt
from caffe.proto import caffe_pb2
lmdb_file='/home/arha/workarea/ocr_project/cocotext/data/lmdb/coco_val_lmdb'
lmdb_env = lmdb.open(lmdb_file)
lmdb_txn = lmdb_env.begin()
lmdb_cursor = lmdb_txn.cursor()
datum = caffe_pb2.Datum()
for key, value in lmdb_cursor:
datum.ParseFromString(value)
label = datum.label
print 'label=',label
data = caffe.io.datum_to_array(datum)
print data.shape
print 'data=',data
image = np.transpose(data, (1,2,0))
cv2.imshow('cv2', image)
cv2.waitKey(0)
print('{},{}'.format(key, label))
And It is giving output as follow
label= 0
(0, 0, 0)
data= []
OpenCV Error: Assertion failed (size.width>0 && size.height>0) in imshow, file /home/arha/softwares/opencv-2.4.13/modules/highgui/src/window.cpp, line 261
Traceback (most recent call last):
File "lm.py", line 28, in
cv2.imshow('cv2', image)
cv2.error: /home/arha/softwares/opencv-2.4.13/modules/highgui/src/window.cpp:261: error: (-215) size.width>0 && size.height>0 in function imshow
Please clear my doubts.
Thank you
@MhLiao, in your article in table 1 time consumption for few methods is presented. How did you measure these times? My GPU is weaker than Titan X and I'd like to get to know times on my machine. Of course, I'm particularly interested in your method and I assume that I should add some tic tac in test_icdar13.py file, but I'd like to know your opinion.
Best regards
Hi, @MhLiao
I'm trying to re-implement your work in tensorflow . For now, i have got big process. But the detection on small objects are quiet bad because small objects can't match default boxes well, and they never got trained.
I tried to tune the anchor size but it does't work well. Could you give me some advices.
Thanks a lot.
Hello,
l got the following error while running :
python demo.py
l got the following error :
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0709 17:08:07.867985 6370 _caffe.cpp:135] DEPRECATION WARNING - deprecated use of Python interface
W0709 17:08:07.868007 6370 _caffe.cpp:136] Use this instead (with the named "weights" parameter):
W0709 17:08:07.868028 6370 _caffe.cpp:138] Net('/home/ahmed/TextBoxes/examples/TextBoxes/deploy.prototxt', 1, weights='/home/ahmed/TextBoxes/examples/TextBoxes/TextBoxes_icdar13.caffemodel')
[libprotobuf ERROR google/protobuf/text_format.cc:245] Error parsing text-format caffe.NetParameter: 758:14: Message type "caffe.LayerParameter" has no field named "norm_param".
F0709 17:08:07.915925 6370 upgrade_proto.cpp:88] Check failed: ReadProtoFromTextFile(param_file, param) Failed to parse NetParameter file: /home/ahmed/TextBoxes/examples/TextBoxes/deploy.prototxt
*** Check failure stack trace: ***
Aborted (core dumped)
Thanks a lot
I am running the evaluation_nms code and there is a mistake in the code about "Undefined function or variable 'polygon_intersect'”. So, how do you actually implement this function.
Thanks a lot.
Hello,
let me first thank you about these excellent articles : textboxes+CRNN.
In first page of textbox paper. It's mentionned the following : "we use the confidence scores of CRNN
to regularize the detection outputs of TextBoxes"
However, l get stuck at getting the probability of the sequence outputted by CRNN
from example :
--h-e--ll-oo- => 'hello' with a probability= 0.89
for instance
how can l get that ? l'm using the pytorch version.
in the code CTCLoss can't find these probabilities .
In __init__.py
the CTC class is defined as follow :
However l don't find where to print the output probabilities here.
class _CTC(Function):
def forward(self, acts, labels, act_lens, label_lens):
is_cuda = True if acts.is_cuda else False
acts = acts.contiguous()
loss_func = warp_ctc.gpu_ctc if is_cuda else warp_ctc.cpu_ctc
grads = torch.zeros(acts.size()).type_as(acts)
minibatch_size = acts.size(1)
costs = torch.zeros(minibatch_size)
loss_func(acts,
grads,
labels,
label_lens,
act_lens,
minibatch_size,
costs)
self.grads = grads
self.costs = torch.FloatTensor([costs.sum()])
return self.costs
def backward(self, grad_output):
return self.grads, None, None, None
class CTCLoss(Module):
def __init__(self):
super(CTCLoss, self).__init__()
def forward(self, acts, labels, act_lens, label_lens):
"""
acts: Tensor of (seqLength x batch x outputDim) containing output from network
labels: 1 dimensional Tensor containing all the targets of the batch in one sequence
act_lens: Tensor of size (batch) containing size of each output sequence from the network
act_lens: Tensor of (batch) containing label length of each example
"""
_assert_no_grad(labels)
_assert_no_grad(act_lens)
_assert_no_grad(label_lens)
return _CTC()(acts, labels, act_lens, label_lens)
Thank you
May I ask for the LMDB creating script for training? The path on comment seems missing. Thank you.
Hi! @MhLiao Can you help me convert TextBoxes_icdar13.caffemodel
to TextBoxes_icdar13.mlmodel
through the tool coremltools
?
use see: https://pypi.python.org/pypi/coremltools extremely grateful!
I try it :
import coremltools
coreml_model = coremltools.converters.caffe.convert(('TextBoxes_icdar13.caffemodel', 'deploy.prototxt'))
coreml_model.save('TextBoxes_icdar13.mlmodel')
But I get error :
[libprotobuf ERROR /git/coreml/deps/protobuf/src/google/protobuf/text_format.cc:298] Error parsing text-format caffe.NetParameter: 758:14: Message type "caffe.LayerParameter" has no field named "norm_param".
Traceback (most recent call last):
File "convert.py", line 5, in <module>
coreml_model = coremltools.converters.caffe.convert(('TextBoxes_icdar13.caffemodel', 'deploy.prototxt'))
File "/Users/mambaxie/anaconda2/lib/python2.7/site-packages/coremltools/converters/caffe/_caffe_converter.py", line 142, in convert
predicted_feature_name)
File "/Users/mambaxie/anaconda2/lib/python2.7/site-packages/coremltools/converters/caffe/_caffe_converter.py", line 187, in _export
predicted_feature_name
RuntimeError: Unable to load caffe network Prototxt file: deploy.prototxt
How do i fix it?
After I ran command make -j8
, I'm getting this error :
/usr/include/google/protobuf/arenastring.h:219:31: note: candidate expects 1 argument, 0 provided Makefile:575: recipe for target '.build_release/src/caffe/data_transformer.o' failed make: *** [.build_release/src/caffe/data_transformer.o] Error 1
I'm using Ubuntu 16.04. Protobuf 3.2
Downgrading Protobuf to 3.1 also didn't work
Hi @MhLiao
Could you please clarify that the model icdar_2013.caffemodel is pretrained on SynthText dataset and then trained on ICDAR 2013 Text localization dataset (as mentioned in the paper) or it is only trained on ICDAR 2013 Text localization dataset?
Dear All,
Could someone send me please the ICDAR 2013 database + test_list.txt needed to run test-icdr13.py because I didn't find it on the Internet.
Thanks in advance.
When I run
python demo.py
Error occurs and the error message shows as folllow:
[root@ml-gpu-ser167 TextBoxes]# python demo.py
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0712 06:58:42.744659 24402 _caffe.cpp:135] DEPRECATION WARNING - deprecated use of Python interface
W0712 06:58:42.744719 24402 _caffe.cpp:136] Use this instead (with the named "weights" parameter):
W0712 06:58:42.744726 24402 _caffe.cpp:138] Net('./deploy.prototxt', 1, weights='./TextBoxes_icdar13.caffemodel')
Traceback (most recent call last):
File "demo.py", line 36, in <module>
caffe.TEST) # use test mode (e.g., don't perform dropout)
RuntimeError: Could not open file ./deploy.prototxt
What's wrong with it?
Text in SynthText is oriented and its label has 4 points, how to use these data for training?
Hi, @MhLiao
I found that you set the parameter 'clip' to 'true' in your train script.
Could you please explain why did you do this?
Does this parameter have effect on performance?
Thanks a lot.
Hi !
I was experimenting with the parameters and thought of changing the aspect ratios from [2,3,5,7,10] to [1,2,3,5,7] and got this error while training :
F0615 09:42:51.943936 18711 multibox_loss_layer.cpp:143] Check failed: num_priors_ * loc_classes_ * 4 == bottom[0]->channels() (224088 vs. 266728) Number of priors must match number of location predictions.
I am getting the same error if I try to add any other integer to [2,3,5,7,10].
Can you please help me to resort this issue.
Thanks in advance!
Hello,
l installed successfully caffe. l have tested that with the notebook jupyter examples suggested by caffe.
caffe is installed in /home/ahmed/caffe
Textboxes is installed in /home/ahmed/TextBoxes
When l come to install textboxes. l get stuck at the following :
cd /home/ahmed/TextBoxes
works correctly but when l run
make -j8
l get
Makefile:6: *** Makefile.config not found. See Makefile.config.example.. Stop.
Thank you
Hello @MhLiao ,the following is error when I run python test_icdar13.py .I only compiled caffe of cpu version Would you mind giving me a hand? Thanks in advance.
jsj@jsj:~/TextBoxes/examples/TextBoxes$ python test_icdar13.py
WARNING: Logging before InitGoogleLogging() is written to STDERR
F0527 19:02:48.273658 10427 common.cpp:66] Cannot use GPU in CPU-only Caffe: check mode.
*** Check failure stack trace: ***
已放弃 (核心已转储)
Hi. I am not clear the process of ICDAR2013 dataset generating LMDB, could you explain it in detail. Thank you.
In ICDAR2017 dataset, the images are described with ground truth bounding box. Like 86,191,142,191,139,214,84,214,Latin,Flame . So how can i feed the data into the text box model?
@MhLiao There is an erro when running test_icdar13.py at line:
net = caffe.Net(model_def, # defines the structure of the model
model_weights, # contains the trained weights
caffe.TEST) # use test mode (e.g., don't perform dropout)
The console shows that:
[libprotobuf ERROR google/protobuf/text_format.cc:245] Error parsing text-format caffe.NetParameter: 758:14: Message type "caffe.LayerParameter" has no field named "norm_param".
WARNING: Logging before InitGoogleLogging() is written to STDERR
F0404 18:15:56.743150 22335 upgrade_proto.cpp:79] Check failed: ReadProtoFromTextFile(param_file, param) Failed to parse NetParameter file: /home/scw4750/TextBoxes/examples/TextBoxes/deploy.prototxt
*** Check failure stack trace: ***
It seems that it cannot parse the parameter field in Layer: type: "Normalize" in deploy.prototxt. How can I resolve this issue?
Hi,
I wanted to know if we can implement TextBoxes such that there are more than two possible classes? (currently the classes are text and background)
Thanks!
@MhLiao hello,
Could you please tell me the GPU memory needed to train or test your proposed method?
And how much free memory is required if i use cudnn?
Thank you very much!
Hi,
Is there any plan to release TextBoxes + CRNN combined module or parts of it?
Please let me know. Thanks.
Hi,MhLiao.
I have implemented your TextBoxes on SynthText and icdar2013.but here are some problems that bother me.i got the result recall =0.488,precision = 0.938 and f-measure=0.64 for 700x700 single scale in icdar test set. For multi scale the f-measue is 0.73. that seems worse than your result.
the detail information for my training is below:
step 1: pretrain on the synthText data.
pretrain model:the VGG_ILSVRC_16_layers_fc_reduced.caffemodel
train data: about 85w SynthText , train size:700x700,batch size:8 (GPU limit)
lr: 0.0001 for 6w iterations, 0.00001 for the rest 12w iterations. total 18w iterations(loss about 2.0).
step 2: train on the icdar2013 train data
pretrain model:the model of step 1.
train size:700x700,batch_size = 4.
lr :0.00001 for 3k iterations.(loss about 1.5)
I have tried other settings: train data resize to 500x500 but still got a low recall(about 0.46) .by the way,the final loss is down to about 2.0 when trained on the synthdata, i don't know whether i have not
taken into account.
looking forward to your reply.
@MhLiao Hi,
i download the codes and run test_icdar13.py with TextBoxes_icdar13.caffemodel on the ICDAR13 dataset
(which contains 233 pictures), but some results (eg. img_14, img_19, img_34) are worse than the pictures
present in your paper (Figure 3).
Could you help me find the reason. Do you use multi-scale to produce the pictures shown in Figure 3 ?
Thanks in advance!
Hi, @MhLiao
In the training phase, I saw the test net outputting "detection_eval = ***". What's meaning of this value? I wonder whether I can judge the network's performance in terms of this value.
Thanks!
(I found this value isn't equal to the actual F-measure.)
Hi MhLiao
I download the codes and ready to run your code.
But i dont know where can i find this file -> [TextBoxes_icdar13.caffemodel]
Where can i find this file?
Thanks
@MhLiao There is an error when running train_icdar13.py.
The console shows that:
I0614 14:06:31.362638 9607 layer_factory.hpp:77] Creating layer data
I0614 14:06:31.371596 9607 net.cpp:100] Creating Layer data
I0614 14:06:31.371624 9607 net.cpp:408] data -> data
I0614 14:06:31.371659 9607 net.cpp:408] data -> label
F0614 14:06:31.371739 9610 db_lmdb.hpp:15] Check failed: mdb_status == 0 (2 vs. 0) No such file or directory
*** Check failure stack trace: ***
@ 0x7f2c227e95cd google::LogMessage::Fail()
@ 0x7f2c227eb433 google::LogMessage::SendToLog()
@ 0x7f2c227e915b google::LogMessage::Flush()
@ 0x7f2c227ebe1e google::LogMessageFatal::~LogMessageFatal()
@ 0x7f2c22df1350 caffe::db::LMDB::Open()
@ 0x7f2c22e2d276 caffe::DataReader<>::Body::InternalThreadEntry()
@ 0x7f2c1f4225d5 (unknown)
@ 0x7f2c1ecd06ba start_thread
@ 0x7f2c2128782d clone
@ (nil) (unknown)
Aborted (core dumped)
@MhLiao hello
I want to re-implement the result in your paper ,but I got a more low f-measure which is 55%(single scale), below is the solver which is created with using your default code
+++++++++++++++++++++++++++++++++++++++++
train_net: "models/TextBoxes/train.prototxt"
test_net: "models/TextBoxes/test.prototxt"
test_iter: 233
test_interval: 500
base_lr: 0.0001
display: 10
max_iter: 120000
lr_policy: "step"
gamma: 0.1
momentum: 0.9
weight_decay: 0.0005
stepsize: 60000
snapshot: 500
snapshot_prefix: "models/TextBoxes/snapshots2/"
solver_mode: GPU
device_id: 0
debug_info: false
snapshot_after_train: true
test_initialization: false
average_loss: 10
iter_size: 1
type: "SGD"
eval_type: "detection"
ap_version: "11point"
+++++++++++++++++++++++++++++++++++++++++
but the stepsize=40000 in your paper.
and below is my train.prototxt's transformer parameters
+++++++++++++++++++++++++++++++++++++++++
transform_param {
mirror: false
mean_value: 104
mean_value: 117
mean_value: 123
resize_param {
prob: 1
resize_mode: WARP
height: 300
width: 300
interp_mode: LINEAR
interp_mode: AREA
interp_mode: NEAREST
interp_mode: CUBIC
interp_mode: LANCZOS4
}
emit_constraint {
emit_type: CENTER
}
}
Can your give a detail of your solver.prototxt and transformer paramters, or some advice
Thanks a lot~
I have tried the parameter settings in your paper to train the model, but the performance of detection is bad. The f-measure on ICDAR2013-test is 0.72(All of the following results are tested with 700*700 single-scale. The f-measure should be 0.80 according to the paper). Here is my parameter setting for training:
Step 1. pretrain on the synthetic data
Step 2. train on the ICDAR2013-train data
Could you give me some advice on training or more information about the parameter setting for training? @MhLiao Thanks very much.
Hi, i am totally new in this field and am struggling to find out how to train about 50k iterions on Synthetic data which refered in the paper. Hope that there's a guide on how to setup the training. Thanks.
Hi,
I've downloaded the ICDAR2013 dataset (Task 2.1: Text Localization, 233 images and ground truth text files), but how can I generate test_list.txt?
Thanks for excellent work. But I have a concern about your multiscale input. The PriorBox layers need specific input size to construct the default boxes, but in your multiscale test file, you just invoke the deploy.prototxt generated by SSD300 training file for all scales. Does it mean that in larger input scale (e.g. 700x700), the size of prior boxes are same with 300x300 and the only difference is the number of boxes? Thank you.
I am trying to train on the COCO-Text data and my loss is not reducing significantly once it reaches 4.1.
Also, How is the inclusion of vertical offset default boxes being done in the code provided.
Regards,
Aakriti
i have make successfully
when i make runtest, it has this error
[----------] 3 tests from MultiBoxLossLayerTest/3, where TypeParam = caffe::GPUDevice
[ RUN ] MultiBoxLossLayerTest/3.TestConfGradient
F0122 10:41:19.059348 24720 multibox_loss_layer.cpp:143] Check failed: num_priors_ * loc_classes_ * 4 == bottom[0]->channels() (128 vs. 64) Number of priors must match number of location predictions.
*** Check failure stack trace: ***
@ 0x7f488e9d6b6d google::LogMessage::Fail()
@ 0x7f488e9dab87 google::LogMessage::SendToLog()
@ 0x7f488e9d8a09 google::LogMessage::Flush()
@ 0x7f488e9d8d0d google::LogMessageFatal::~LogMessageFatal()
@ 0x7f488a7b6dc0 caffe::MultiBoxLossLayer<>::Reshape()
@ 0x55d82f caffe::Layer<>::SetUp()
@ 0x55ee3f caffe::GradientChecker<>::CheckGradientExhaustive()
@ 0x9d12d1 caffe::MultiBoxLossLayerTest_TestConfGradient_Test<>::TestBody()
@ 0xa804b3 testing::internal::HandleExceptionsInMethodIfSupported<>()
@ 0xa782b7 testing::Test::Run()
@ 0xa7835e testing::TestInfo::Run()
@ 0xa78465 testing::TestCase::Run()
@ 0xa7a6f8 testing::internal::UnitTestImpl::RunAllTests()
@ 0xa7a987 testing::UnitTest::Run()
@ 0x54942f main
@ 0x7f4889a50bd5 __libc_start_main
@ 0x552149 (unknown)
make: *** [runtest] Aborted
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.