Giter Club home page Giter Club logo

craft-reimplementation's Introduction

CRAFT-Reimplementation

Note:If you have any problems, please comment. Or you can join us weChat group. The QR code will update in issues #49 .

非常抱歉,一直没有继续维护这个工程,近期看到挺多人关注的,我预计11月底最晚12月初重新维护一下工程,由于当时实习期间整体工程能力不够导致工程相对较乱,这个重新维护会整理清楚。维护周期大概是两周时间,我会重新整理code以及重新训练同时上传训练的pretrain model。同时一些实验的关键和实验思路我会写上注释,欢迎到时关注。

Reimplementation:Character Region Awareness for Text Detection Reimplementation based on Pytorch

Character Region Awareness for Text Detection

Youngmin Baek, Bado Lee, Dongyoon Han, Sangdoo Yun, Hwalsuk Lee (Submitted on 3 Apr 2019)

The full paper is available at: https://arxiv.org/pdf/1904.01941.pdf

Install Requirements:

1、PyTroch>=0.4.1
2、torchvision>=0.2.1
3、opencv-python>=3.4.2
4、check requiremtns.txt
5、4 nvidia GPUs(we use 4 nvidia titanX)

to do list

Release strong supervision training part in early December

craft-reimplementation's People

Contributors

backtime92 avatar thisisisaac avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

craft-reimplementation's Issues

Why `copyStateDict`?

Just out of curiosity, is there a reason why you make a copy of state_dict before loading to CRAFT.load_state_dict with copyStateDict?

RandomScale in data pre-processing

Hi, I am confused in the data pre-processing -- random scale stage.

def random_scale(img, bboxes, min_size):
    h, w = img.shape[0:2]
    if max(h, w) > 1280:
        scale = 1280.0 / max(h, w)
        img = cv2.resize(img, dsize=None, fx=scale, fy=scale)
        bboxes *= scale

    h, w = img.shape[0:2]
    random_scale = np.array([1.0, 1.3, 1.5])
    scale = np.random.choice(random_scale)
    if min(h, w) * scale <= min_size:
        scale = (min_size + 10) * 1.0 / min(h, w)
    bboxes *= scale
    img = cv2.resize(img, dsize=None, fx=scale, fy=scale)
    return img

In this stage, you randomly enlarge the image and bounding boxes but just return the image. I am not sure if it would cause the label mismatch problem.

when and when not to freeze

You freeze vgg16_bn weights in trainic15data.py but not in trainSyndata.py.

  1. Why do you freeze in trainic15data.py and not in trainSyndata.py?
  2. What is the benefit of freezing? I assumed we should ALWAYS let gradients flow through both vgg16 and CRAFT.
  3. Have you trained vgg16 with CRAFT starting from Pytorch's pretrained weights? If so, how was the performence?
  4. saving model like this:
net = CRAFT()
net.load_state_dict(copyStateDict(torch.load('CRAFT-Reimplemetation/pretrain/SynthText.pth')))
torch.save(net.state_dict(), os.path.join("pretrain","test.pth"))

only saves weights of CRAFT and not vgg16_bn. If I were to train both CRAFT and vgg16, both unfrozen, how would I save both CRAFT and vgg16's network together in a single model?

can't load `Syndata.pth`: `load_state_dict Missing Keys`

How is Syndata.pth different from vgg16_bn-6c64b313.pth? My guess is vgg16_bn-6c64b313.pth is just fro vgg16 and Syndata.pth is for the rest?

when I use Syndata.pth in vgg16_bn.py, I get error: load_state_dict Missing keys ...
but when I use vgg16_bn-6c64b313.pth, the model load properly.

  1. is Syndata.pth for the same model as vgg16_bn-6c64b313.pth?
  2. Why am I getting this bug?

[BUG] Unexpected 'CUDA out of memory'

I tried to run test.py but the error occurred.

RuntimeError: CUDA out of memory. Tried to allocate 644.00 MiB (GPU 0; 10.92 GiB total capacity; 9.81 GiB already allocated; 567.00 MiB fre; 53.70 MiB cached)

I run the nvdia-smi and I found that the memory is actually sufficient (12GB remain)

How can I fix it?

Bug report: random crop

If the thread goes into this branch, incorrect images like below occur:

ballet_106_102_after_processing

If it goes to the else branch, the results are fine:

ballet_106_102_after_processing

I think this is because the coordinates of character boxes are not adjusted properly in the first branch.

about loss function

i am confused about loss function in the paper, when estimated character
bounding boxes is low mean the accuracy of model is bad, confidence score is low and the loss is low too.
0_MwnYwgkbZc2Onnd0

Opencv Error!

I get the following error during training of synth and IC15 , after updating the new data_loader.py ., It was working before the new update.
I try to install different version of OpenCV , it didn't solve.

cv2.error: OpenCV(3.4.2) /io/opencv/modules/imgproc/src/color.hpp:253: error: (-215:Assertion failed) VScn::contains(scn) && VDcn::contains(dcn) && VDepth::contains(depth) in function 'CvtHelper

Weakly supervised problem

Hi author, thanks for opening source your reimplementation about CRAFT. I hope to get the same results as claimed in the original paper, but now still have some problems (only 81+% performance on ICDAR15 dataset). Do you have any suggestions? I guess gaussian map generation or watershed may contain some problems.

If possible, could I contact you through email, wechat or so on for convenient communication?

train error

Good job! The code can be trained on Syntext. However, the following error occurred when i run trainic15data.py on 4 gpus. I try to address this problem, and find a solution, https://blog.csdn.net/loopun/article/details/89295454. I train it on 1 gpus, but it is not work, and the error still exist. Anyone give me a suggestion? Thank you.

Traceback (most recent call last):
File "trainic15data.py", line 183, in
loss.backward()
File "/root/userfolder/anaconda3/envs/craft_pytorch/lib/python3.5/site-packages/torch/tensor.py", line 93, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/root/userfolder/anaconda3/envs/craft_pytorch/lib/python3.5/site-packages/torch/autograd/init.py", line 90, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: invalid argument 3: divide by zero at /pytorch/aten/src/THC/generic/THCTensorMathPairwise.cu:88

ESFNet

@backtime92
After doing a lot of testing, I found ESFNet to have great potential at text-detection tasks.

ESFNet have:

  • Very fast Training 4 hours.
  • Very fast Inferencing 100 -142 fps
  • Very little parameters 0.09 - 0.177 M Parameters
  • Very good IoU 82 - 85

The original implementation by the developer is in PyTorch: source at github , Paper.

I hope that you really consider ESFNet,
Waiting for your reply.

Address the training speed when train ICDAR2015 dataset

Hi, I am trying to train the ICDAR2015 dataset. Thanks for your work! I found the training speed is quite slow. The problem exists in the data loader (the num of workers is set to 0). However, if I set it larger than 1, it would meet the below problem when generating the pseudo label.

  File "/home/craft_reimplementation/data_loader.py", line 125, in inference_pursedo_bboxes
    img_torch = img_torch.type(torch.FloatTensor).cuda()
RuntimeError: CUDA error: initialization error

Do you have any good idea to solve this issue?

Affinity Score Problem

Hi! @backtime92
I have found a difference of test prediction between your pretrained model, ICDAR13,17 and CRAFT.
I am not sure my opinions and I was just wondering this below part.

def add_affinity(self, image, bbox_1, bbox_2):
center_1, center_2 = np.mean(bbox_1, axis=0), np.mean(bbox_2, axis=0)
tl = np.mean([bbox_1[0], bbox_1[1], center_1], axis=0)
bl = np.mean([bbox_1[2], bbox_1[3], center_1], axis=0)
tr = np.mean([bbox_2[0], bbox_2[1], center_2], axis=0)
br = np.mean([bbox_2[2], bbox_2[3], center_2], axis=0)

If you create an affinity like this code, it will fit the size of the box created by the center of the four triangles. However, the affinity of the paper seems to have been created by slightly reducing the length of the left and right rather than the size of the box.

image

After examining the CRAFT model and your model for the same image, I noticed the difference that the number of segmentation chunks was different. So the affinity GT's left and right boxes are so long that the linkage did not go away in the post-processing phase, so I thought that the segmentation chunks would not divide well.

Is there any chance that this problem affects performance? I just wonder!
Thank You!

where is the synthdata.

i am very glad to see you have uploaded your training code.so i want to reproduce your result because of the paper's author did not release the training code. so i want to know where is your synthdata from. can your share the link? thank again.

question on wechat

I want to join the WeChat group, but the QR code has expired. Can I resend it or pull me into the group?

I found some mistakes in the way you in unwarp character unwarp the character bounding boxes

image
I don't think it is the right logic code to match each character box location to real location in the original image.
Here is my code:
for j in len(bboxes):
real_bboxes = []
for pts in bboxes[j]:
point = np.append(point, 1)
assert len(point) == 3
tmp = np.matmul(inv(MM), point)
print('tmp', tmp)
real_point = tmp / tmp[-1]
real_bboxes.append(real_point)
bboxes[j] = np.array(real_bboxes)

Progress / help needed

Hi there,

I am working on a project that is going to be using the CRAFT text detector and was wondering what the current state of this project is. I saw you mention in an issue that the results aren't yet as good as the original paper, is it stated anywhere how far off they are and what needs to be done to bring them up to that level.

I am keen to contribute, is there a roadmap anywhere that I can follow. Find my email on my GH profile if you want to get in contact.

Cheers.

train problem

In the file trainic15data.py

`from augmentation import random_rot, crop_img_bboxes'

'from gaussianmap import gaussion_transform, four_point_transform'

'from generateheatmap import add_character, generate_target, add_affinity, generate_affinity, sort_box, real_affinity, generate_affinity_box'

'from mseloss import Maploss`

I can't find the file augmentation and generateheatmap.
Could you tell me how to get them. Thank you.

Have you sorted boundry boxes?

Can you please let me if boundary box sorting is done? if not can you please help me for the same as i am working on something and want to get the boundary box sorted . I really appreciate.

Question on Maploss

While reading your loss function, I came across a few questions:

  1. I think def single_image_loss defined in mseloss.py seems to be a misnomer. The function iterates through outputs for the entire batch of images, not just a single image. It should be something like def batch_cumulative_loss.

  2. The only thing I read from the paper was that it used MSE loss. So I was expecting a loss function to be something like:

loss_fn = torch.nn.MSELoss(reduce=False, size_average=False)

loss1 = torch.mean(loss_fn(p_gh, gh_label))
loss2 = torch.mean(loss_fn(p_gah, gah_label))

# multiply by mask

return loss1/batch_size + loss2/batch_size

However, in your loss function, you cap the number of negative pixel losses depending on the number of positive pixels. Why do you do this? If you have some outside source that led you to use this, I would really appreciate if you share with me!

Can't Train road sign

Hi, thanks for opening source your about CRAFT, We are conducting a study to recognize the text of the road signs. we used CRAFT, but we can't training. Out road sign dataset is this, 5662 images.

00601058.jpg
00601058
and This is how labelling is done.

gt_00601058.txt
144,18,103,3,100,30,142,38,증평
136,106,86,95,86,125,134,133,증평
138,54,86,42,83,66,134,82,괴산
59,20,59,0,86,1,85,24,31
28,93,27,69,75,76,74,101,510
154,134,154,117,185,118,185,136,1km

This is the result of when we have over 88 epoch. As you can see, nothing comes out.
what is our problem?? Is the epoch short? or not English labeling? Unfortunately, we cannot receive English labelled data....
res_3_mask

we use for train train-MLT_data.py, There are no other modifications except the path to the folder.

Train weakly supervised on my own dataset

Hi author,
Thank you for the re-implementation.

I have a question related to training weakly supervised on my own data set. It only has word level annotation, with a rectangular format (not like quadrilateral boxes in ICDAR 2015).
Below is the example of the annotation.
sample

If "confidence <=0.5", then the character bboxes got from dividing the crop images by number of characters is not accurate.
In these cases, should I drop these images from training data, or is there any other way to deal with this problem?

Critical issues with training code

  1. training scripts (trainSyndata.py) imports test.py. test.py has some code not nested in if __name__ == "__main__":. For example, this below code that parses argument:
    parser = argparse.ArgumentParser(description='CRAFT Text Detection')
    parser.add_argument('--trained_model', default='weights/craft_mlt_25k.pth', type=str, help='pretrained model')
    parser.add_argument('--text_threshold', default=0.7, type=float, help='text confidence threshold')
    parser.add_argument('--low_text', default=0.4, type=float, help='text low-bound score')
    parser.add_argument('--link_threshold', default=0.4, type=float, help='link confidence threshold')
    parser.add_argument('--cuda', default=True, type=str2bool, help='Use cuda to train model')
    parser.add_argument('--canvas_size', default=2240, type=int, help='image size for inference')
    parser.add_argument('--mag_ratio', default=2, type=float, help='image magnification ratio')
    parser.add_argument('--poly', default=False, action='store_true', help='enable polygon type')
    parser.add_argument('--show_time', default=False, action='store_true', help='show processing time')
    parser.add_argument('--test_folder', default='/data/', type=str, help='folder path to input images')

is part of test.py and it is not within if __name__ == "__main__":. This means whenever test.py is imported, this code runs. You can see the error with:

python trainSyndata.py --help

which prints

usage: trainSyndata.py [-h] [--trained_model TRAINED_MODEL]
                       [--text_threshold TEXT_THRESHOLD] [--low_text LOW_TEXT]
                       [--link_threshold LINK_THRESHOLD] [--cuda CUDA]
                       [--canvas_size CANVAS_SIZE] [--mag_ratio MAG_RATIO]
                       [--poly] [--show_time] [--test_folder TEST_FOLDER]

CRAFT Text Detection

optional arguments:
  -h, --help            show this help message and exit
  --trained_model TRAINED_MODEL
                        pretrained model
  --text_threshold TEXT_THRESHOLD
                        text confidence threshold
  --low_text LOW_TEXT   text low-bound score
  --link_threshold LINK_THRESHOLD
                        link confidence threshold
  --cuda CUDA           Use cuda to train model
  --canvas_size CANVAS_SIZE
                        image size for inference
  --mag_ratio MAG_RATIO
                        image magnification ratio
  --poly                enable polygon type
  --show_time           show processing time
  --test_folder TEST_FOLDER
                        folder path to input images

The above print statement is for args written in test.py, not for trainSyndata.py. Nesting the arguments within the if statement in test.py fixed the error.

  1. watershed.py uses Polygon library, but I can't seem to find how to install this library. Any hints?

the format of training data

hi, this is a good job. But i want to know the format of training data, word-level or character-level bounding-boxes? thank you.

Questions on scaling images and GT masks

  1. In data_loader.py , pull_item function:
        region_scores = self.resizeGt(region_scores)
        affinity_scores = self.resizeGt(affinity_scores)
        confidence_mask = self.resizeGt(confidence_mask)

and the function definition of resizeGt is:

    def resizeGt(self, gtmask):
        return cv2.resize(gtmask, (self.target_size // 2, self.target_size // 2))

Why do you resize the scales to half the target size?


  1. In the same function, you perform element-wise dividsion on region_scores and affiity_scores:
region_scores_torch = torch.from_numpy(region_scores / 255).float()
affinity_scores_torch = torch.from_numpy(affinity_scores / 255).float()

why?


  1. random_scale uses self.target_size as the minimum dimension size and uses 1280 as the maximum. This means the image and char boxes can fit anywhere between 1280 and self.target_size. So what happens if the image is larger than 768? How do you gurantee that it will be 768? You don't seem to rescale the image after random_scale.

Current result

Hi author, have you get a better result with the new dataloader?

training IC15 with pretrained model

Hi,

It is observed that your pretrained model already obtained good performance on IC15. But when I tried to finetune your pretrained model on IC15, it performs terrible after just 1 epoch. Is this normal ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.