backtime92 / craft-reimplementation Goto Github PK

View Code? Open in Web Editor NEW

461.0 25.0 157.0 136.48 MB

CRAFT-Pyotorch：Character Region Awareness for Text Detection Reimplementation for Pytorch

Python 100.00%

craft text-detection pytorch

craft-reimplementation's Introduction

CRAFT-Reimplementation

Note：If you have any problems, please comment. Or you can join us weChat group. The QR code will update in issues #49 .

非常抱歉，一直没有继续维护这个工程，近期看到挺多人关注的，我预计11月底最晚12月初重新维护一下工程，由于当时实习期间整体工程能力不够导致工程相对较乱，这个重新维护会整理清楚。维护周期大概是两周时间，我会重新整理code以及重新训练同时上传训练的pretrain model。同时一些实验的关键和实验思路我会写上注释，欢迎到时关注。

Reimplementation：Character Region Awareness for Text Detection Reimplementation based on Pytorch

Character Region Awareness for Text Detection

Youngmin Baek, Bado Lee, Dongyoon Han, Sangdoo Yun, Hwalsuk Lee (Submitted on 3 Apr 2019)

The full paper is available at: https://arxiv.org/pdf/1904.01941.pdf

Install Requirements:

1、PyTroch>=0.4.1
2、torchvision>=0.2.1
3、opencv-python>=3.4.2
4、check requiremtns.txt
5、4 nvidia GPUs(we use 4 nvidia titanX)

to do list

Release strong supervision training part in early December

craft-reimplementation's People

Contributors

Stargazers

Watchers

Forkers

fendaq kapitsa2811 mahendra047 rubanseven yifan-zhao sailinglqh haikuoyao jingmouren xgmiao hajungong007 phvan2312 andres-mejia goodluckcwl polarisshi gsws super-ljg alwc chadpieere lixiaoming0017 yuxiaomu nbhupendra ygest liyucode limitlessss yacobby donaghys merria28 looput wuxiaolianggit thisisisaac zlszhonglongshen msjyyt xuweidongkobe zbpjlc wh0ru dexception tanyapohn davinci2018 dongpil zhengjiawen wangjianyuweg wuyunxiangwyx hyfine wjinhai zonghaofan semaraugusto hell-to-heaven peternara xingyi123456 fireae tukjet vivienseguy dokyeongk mess-lelouch alperkesen brownsweater 17666107783 dun933 lmpan annihilation7 ideaflow 12345fengce 2113vm whitman1984 lehuyhung thorpham pkq1688 sunxingxingtf sonsuhyune xuhuaren light201212 lhwcv amoonhappy vinhloiit kuankuanren ngoanpv ygdwn455 duyuankai1992 tdtce sangkwun euphoriayan chunleiml gavinic sonninh zyzyzhou deoko hanchenn nguyenquanghieu2000d pingzi5233 pgsrv nightfuryyy stoneyang159 amyyayun ai-motive don-taiwan yangfan-96 duxiangcheng pei-chen1208 yyh0806 jimlinntu

craft-reimplementation's Issues

Is this safe to delete?

I couldn't figure out exactly what this branch does, but is it safe for me to delete?

Excuse me, can I train my own data set?

Why `copyStateDict`?

Just out of curiosity, is there a reason why you make a copy of state_dict before loading to CRAFT.load_state_dict with copyStateDict?

train error no augmentation.py gaussianmap.py generateheatmap.py mseloss.py

Think you very much.
when i trian with this code ,some error arised.
there are no augmentation.py gaussianmap.py generateheatmap.py mseloss.py file

RandomScale in data pre-processing

Hi, I am confused in the data pre-processing -- random scale stage.

def random_scale(img, bboxes, min_size):
    h, w = img.shape[0:2]
    if max(h, w) > 1280:
        scale = 1280.0 / max(h, w)
        img = cv2.resize(img, dsize=None, fx=scale, fy=scale)
        bboxes *= scale

    h, w = img.shape[0:2]
    random_scale = np.array([1.0, 1.3, 1.5])
    scale = np.random.choice(random_scale)
    if min(h, w) * scale <= min_size:
        scale = (min_size + 10) * 1.0 / min(h, w)
    bboxes *= scale
    img = cv2.resize(img, dsize=None, fx=scale, fy=scale)
    return img

In this stage, you randomly enlarge the image and bounding boxes but just return the image. I am not sure if it would cause the label mismatch problem.

when and when not to freeze

You freeze vgg16_bn weights in trainic15data.py but not in trainSyndata.py.

Why do you freeze in trainic15data.py and not in trainSyndata.py?
What is the benefit of freezing? I assumed we should ALWAYS let gradients flow through both vgg16 and CRAFT.
Have you trained vgg16 with CRAFT starting from Pytorch's pretrained weights? If so, how was the performence?
saving model like this:

net = CRAFT()
net.load_state_dict(copyStateDict(torch.load('CRAFT-Reimplemetation/pretrain/SynthText.pth')))
torch.save(net.state_dict(), os.path.join("pretrain","test.pth"))

only saves weights of CRAFT and not vgg16_bn. If I were to train both CRAFT and vgg16, both unfrozen, how would I save both CRAFT and vgg16's network together in a single model?

Have you ever tested other backbonds?

Have you ever tested other backbonds, like ResNet, SENet? The VGG16-BN is a structure proposed several years ago.

can't load `Syndata.pth`: `load_state_dict Missing Keys`

~~How is Syndata.pth different from vgg16_bn-6c64b313.pth? My guess is vgg16_bn-6c64b313.pth is just fro vgg16 and Syndata.pth is for the rest?~~

when I use Syndata.pth in vgg16_bn.py, I get error: load_state_dict Missing keys ...
but when I use vgg16_bn-6c64b313.pth, the model load properly.

is Syndata.pth for the same model as vgg16_bn-6c64b313.pth?
Why am I getting this bug?

[BUG] Unexpected 'CUDA out of memory'

I tried to run test.py but the error occurred.

RuntimeError: CUDA out of memory. Tried to allocate 644.00 MiB (GPU 0; 10.92 GiB total capacity; 9.81 GiB already allocated; 567.00 MiB fre; 53.70 MiB cached)

I run the nvdia-smi and I found that the memory is actually sufficient （12GB remain）

How can I fix it?

Bug report: random crop

If the thread goes into this branch, incorrect images like below occur:

If it goes to the else branch, the results are fine:

I think this is because the coordinates of character boxes are not adjusted properly in the first branch.

about loss function

i am confused about loss function in the paper, when estimated character
bounding boxes is low mean the accuracy of model is bad, confidence score is low and the loss is low too.

I get the following error during training of synth and IC15 , after updating the new data_loader.py ., It was working before the new update.
I try to install different version of OpenCV , it didn't solve.

cv2.error: OpenCV(3.4.2) /io/opencv/modules/imgproc/src/color.hpp:253: error: (-215:Assertion failed) VScn::contains(scn) && VDcn::contains(dcn) && VDepth::contains(depth) in function 'CvtHelper

Have you sorted boundry boxes?

Weakly supervised problem

Hi author, thanks for opening source your reimplementation about CRAFT. I hope to get the same results as claimed in the original paper, but now still have some problems (only 81+% performance on ICDAR15 dataset). Do you have any suggestions? I guess gaussian map generation or watershed may contain some problems.

If possible, could I contact you through email, wechat or so on for convenient communication?

train error

Good job! The code can be trained on Syntext. However, the following error occurred when i run trainic15data.py on 4 gpus. I try to address this problem, and find a solution, https://blog.csdn.net/loopun/article/details/89295454. I train it on 1 gpus, but it is not work, and the error still exist. Anyone give me a suggestion? Thank you.

Traceback (most recent call last):
File "trainic15data.py", line 183, in
loss.backward()
File "/root/userfolder/anaconda3/envs/craft_pytorch/lib/python3.5/site-packages/torch/tensor.py", line 93, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/root/userfolder/anaconda3/envs/craft_pytorch/lib/python3.5/site-packages/torch/autograd/init.py", line 90, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: invalid argument 3: divide by zero at /pytorch/aten/src/THC/generic/THCTensorMathPairwise.cu:88

ESFNet

@backtime92
After doing a lot of testing, I found ESFNet to have great potential at text-detection tasks.

ESFNet have:

Very fast Training 4 hours.
Very fast Inferencing 100 -142 fps
Very little parameters 0.09 - 0.177 M Parameters
Very good IoU 82 - 85

The original implementation by the developer is in PyTorch: source at github , Paper.

I hope that you really consider ESFNet,
Waiting for your reply.

Address the training speed when train ICDAR2015 dataset

Hi, I am trying to train the ICDAR2015 dataset. Thanks for your work! I found the training speed is quite slow. The problem exists in the data loader (the num of workers is set to 0). However, if I set it larger than 1, it would meet the below problem when generating the pseudo label.

  File "/home/craft_reimplementation/data_loader.py", line 125, in inference_pursedo_bboxes
    img_torch = img_torch.type(torch.FloatTensor).cuda()
RuntimeError: CUDA error: initialization error

Do you have any good idea to solve this issue?

Affinity Score Problem

Hi! @backtime92
I have found a difference of test prediction between your pretrained model, ICDAR13,17 and CRAFT.
I am not sure my opinions and I was just wondering this below part.

def add_affinity(self, image, bbox_1, bbox_2):
center_1, center_2 = np.mean(bbox_1, axis=0), np.mean(bbox_2, axis=0)
tl = np.mean([bbox_1[0], bbox_1[1], center_1], axis=0)
bl = np.mean([bbox_1[2], bbox_1[3], center_1], axis=0)
tr = np.mean([bbox_2[0], bbox_2[1], center_2], axis=0)
br = np.mean([bbox_2[2], bbox_2[3], center_2], axis=0)

If you create an affinity like this code, it will fit the size of the box created by the center of the four triangles. However, the affinity of the paper seems to have been created by slightly reducing the length of the left and right rather than the size of the box.

After examining the CRAFT model and your model for the same image, I noticed the difference that the number of segmentation chunks was different. So the affinity GT's left and right boxes are so long that the linkage did not go away in the post-processing phase, so I thought that the segmentation chunks would not divide well.

Is there any chance that this problem affects performance? I just wonder!
Thank You!

微信二维码不能用了，能重新发一下吗

where is the synthdata.

i am very glad to see you have uploaded your training code.so i want to reproduce your result because of the paper's author did not release the training code. so i want to know where is your synthdata from. can your share the link? thank again.

Why recall is very low durining training in compare to Precision?

I am training the ICDAR15 dataset, but after 350 epochs I reached precision=0.75 and recall=0.1...

training documentation

@backtime92 Will you include documentation on how to prepare the data & train?

question on wechat

I want to join the WeChat group, but the QR code has expired. Can I resend it or pull me into the group?

I found some mistakes in the way you in unwarp character unwarp the character bounding boxes

I don't think it is the right logic code to match each character box location to real location in the original image.
Here is my code:
for j in len(bboxes):
real_bboxes = []
for pts in bboxes[j]:
point = np.append(point, 1)
assert len(point) == 3
tmp = np.matmul(inv(MM), point)
print('tmp', tmp)
real_point = tmp / tmp[-1]
real_bboxes.append(real_point)
bboxes[j] = np.array(real_bboxes)

Bad detection samples

@backtime92
is the Bad detection samples related to clovaai/CRAFT-pytorch#36
?

Have you sorted boundry boxes?

Hi,

Need your help to sort boundry boxes. Currently they are just suffuled.

Operands could not be broadcast together with shapes (512,512) (306,220)

Hi author, thanks for your open source.

When I train the icdar15 dataset, I met this problem:

"operands could not be broadcast together with shapes (512,512) (306,220). "

Do you also met this problem? And do you know which line of code causes this warning?

how many epoch do you train when finetuning on IC15

some questions。
1.how many epoch do you train when finetuning on IC15
2.which epoch, the loss value began to decline significantly。

Progress / help needed

Hi there,

I am working on a project that is going to be using the CRAFT text detector and was wondering what the current state of this project is. I saw you mention in an issue that the results aren't yet as good as the original paper, is it stated anywhere how far off they are and what needs to be done to bring them up to that level.

I am keen to contribute, is there a roadmap anywhere that I can follow. Find my email on my GH profile if you want to get in contact.

Cheers.

pre-trained model?

when do you want to upload the improved trained models?

train problem

In the file trainic15data.py

`from augmentation import random_rot, crop_img_bboxes'

'from gaussianmap import gaussion_transform, four_point_transform'

'from generateheatmap import add_character, generate_target, add_affinity, generate_affinity, sort_box, real_affinity, generate_affinity_box'

'from mseloss import Maploss`

I can't find the file augmentation and generateheatmap.
Could you tell me how to get them. Thank you.

Have you sorted boundry boxes?

Can you please let me if boundary box sorting is done? if not can you please help me for the same as i am working on something and want to get the boundary box sorted . I really appreciate.

Synth data training problem

Hi, I tried your code trainSyndata.py on data http://www.robots.ox.ac.uk/~vgg/data/scenetext/. However, after 16 epochs, the hmean on IC2013 still around 65%, which is far below your results(76.33%). Could you suggest what the possible problems are, please?

Which version of python and CUDA is used ?

Question on Maploss

While reading your loss function, I came across a few questions:

I think def single_image_loss defined in mseloss.py seems to be a misnomer. The function iterates through outputs for the entire batch of images, not just a single image. It should be something like def batch_cumulative_loss.
The only thing I read from the paper was that it used MSE loss. So I was expecting a loss function to be something like:

loss_fn = torch.nn.MSELoss(reduce=False, size_average=False)

loss1 = torch.mean(loss_fn(p_gh, gh_label))
loss2 = torch.mean(loss_fn(p_gah, gah_label))

# multiply by mask

return loss1/batch_size + loss2/batch_size

However, in your loss function, you cap the number of negative pixel losses depending on the number of positive pixels. Why do you do this? If you have some outside source that led you to use this, I would really appreciate if you share with me!

Can't Train road sign

Hi, thanks for opening source your about CRAFT, We are conducting a study to recognize the text of the road signs. we used CRAFT, but we can't training. Out road sign dataset is this, 5662 images.

00601058.jpg

and This is how labelling is done.

gt_00601058.txt
144,18,103,3,100,30,142,38,증평
136,106,86,95,86,125,134,133,증평
138,54,86,42,83,66,134,82,괴산
59,20,59,0,86,1,85,24,31
28,93,27,69,75,76,74,101,510
154,134,154,117,185,118,185,136,1km

This is the result of when we have over 88 epoch. As you can see, nothing comes out.
what is our problem?? Is the epoch short? or not English labeling? Unfortunately, we cannot receive English labelled data....

we use for train train-MLT_data.py, There are no other modifications except the path to the folder.

Operands could not be broadcast together with shapes (512,512) (306,220)?

Hi author, thanks for your open source.Would you solve operands could not be broadcast together with shapes (512,512) (306,220)?

Train weakly supervised on my own dataset

Hi author,
Thank you for the re-implementation.

I have a question related to training weakly supervised on my own data set. It only has word level annotation, with a rectangular format (not like quadrilateral boxes in ICDAR 2015).
Below is the example of the annotation.

If "confidence <=0.5", then the character bboxes got from dividing the crop images by number of characters is not accurate.
In these cases, should I drop these images from training data, or is there any other way to deal with this problem?

LinkRefiner

@backtime92
clovaai just released the LinkRefiner code clovaai/CRAFT-pytorch@3cd65f5 better you implement it, along with option to train .
So that we can detect text-lines

Critical issues with training code

training scripts (trainSyndata.py) imports test.py. test.py has some code not nested in if __name__ == "__main__":. For example, this below code that parses argument:

    parser = argparse.ArgumentParser(description='CRAFT Text Detection')
    parser.add_argument('--trained_model', default='weights/craft_mlt_25k.pth', type=str, help='pretrained model')
    parser.add_argument('--text_threshold', default=0.7, type=float, help='text confidence threshold')
    parser.add_argument('--low_text', default=0.4, type=float, help='text low-bound score')
    parser.add_argument('--link_threshold', default=0.4, type=float, help='link confidence threshold')
    parser.add_argument('--cuda', default=True, type=str2bool, help='Use cuda to train model')
    parser.add_argument('--canvas_size', default=2240, type=int, help='image size for inference')
    parser.add_argument('--mag_ratio', default=2, type=float, help='image magnification ratio')
    parser.add_argument('--poly', default=False, action='store_true', help='enable polygon type')
    parser.add_argument('--show_time', default=False, action='store_true', help='show processing time')
    parser.add_argument('--test_folder', default='/data/', type=str, help='folder path to input images')

is part of test.py and it is not within if __name__ == "__main__":. This means whenever test.py is imported, this code runs. You can see the error with:

python trainSyndata.py --help

which prints

usage: trainSyndata.py [-h] [--trained_model TRAINED_MODEL]
                       [--text_threshold TEXT_THRESHOLD] [--low_text LOW_TEXT]
                       [--link_threshold LINK_THRESHOLD] [--cuda CUDA]
                       [--canvas_size CANVAS_SIZE] [--mag_ratio MAG_RATIO]
                       [--poly] [--show_time] [--test_folder TEST_FOLDER]

CRAFT Text Detection

optional arguments:
  -h, --help            show this help message and exit
  --trained_model TRAINED_MODEL
                        pretrained model
  --text_threshold TEXT_THRESHOLD
                        text confidence threshold
  --low_text LOW_TEXT   text low-bound score
  --link_threshold LINK_THRESHOLD
                        link confidence threshold
  --cuda CUDA           Use cuda to train model
  --canvas_size CANVAS_SIZE
                        image size for inference
  --mag_ratio MAG_RATIO
                        image magnification ratio
  --poly                enable polygon type
  --show_time           show processing time
  --test_folder TEST_FOLDER
                        folder path to input images

The above print statement is for args written in test.py, not for trainSyndata.py. Nesting the arguments within the if statement in test.py fixed the error.

watershed.py uses Polygon library, but I can't seem to find how to install this library. Any hints?

the format of training data

hi, this is a good job. But i want to know the format of training data, word-level or character-level bounding-boxes? thank you.

New Trained Model?

Could you please share the newly trained weights?

What languages were included in SynthText?

What languages were included in SynthText?
Was there any word that was laid our vertically (like for traditional Chinese language)?

Questions on scaling images and GT masks

In data_loader.py , pull_item function:

        region_scores = self.resizeGt(region_scores)
        affinity_scores = self.resizeGt(affinity_scores)
        confidence_mask = self.resizeGt(confidence_mask)

and the function definition of resizeGt is:

    def resizeGt(self, gtmask):
        return cv2.resize(gtmask, (self.target_size // 2, self.target_size // 2))

Why do you resize the scales to half the target size?

In the same function, you perform element-wise dividsion on region_scores and affiity_scores:

region_scores_torch = torch.from_numpy(region_scores / 255).float()
affinity_scores_torch = torch.from_numpy(affinity_scores / 255).float()

why?

random_scale uses self.target_size as the minimum dimension size and uses 1280 as the maximum. This means the image and char boxes can fit anywhere between 1280 and self.target_size. So what happens if the image is larger than 768? How do you gurantee that it will be 768? You don't seem to rescale the image after random_scale.

backtime92 / craft-reimplementation Goto Github PK

craft-reimplementation's Introduction

CRAFT-Reimplementation

Note：If you have any problems, please comment. Or you can join us weChat group. The QR code will update in issues #49 .

Reimplementation：Character Region Awareness for Text Detection Reimplementation based on Pytorch

Character Region Awareness for Text Detection

Install Requirements:

to do list

craft-reimplementation's People

Contributors

Stargazers

Watchers

Forkers

craft-reimplementation's Issues

Recommend Projects

Recommend Topics

Recommend Org