Giter Club home page Giter Club logo

textfusenet's People

Contributors

alwinator avatar hotaru-ishibashi avatar real-yej avatar ying09 avatar ymy-k avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

textfusenet's Issues

how to make inferences

By running the demo, we can get a visualization image with bounding boxes and characters. However, is there any inference command which can return the words or phrases instead of only characters? Thanks! @Real-YeJ

Demo for large images with multiple text instances

I observed that when doing a demo, the model takes much of the GPU memory, making it difficult to test for large image with multiple text instances (crashed usually due to memory limit). Are there anyway to go around this, i.e., resize the image and test it or something?

tool create data train?

i have custom data. can you suggest for me some tool annotate for that data like your example data train.

裁剪掉char特征效果能有多少?

@Real-YeJ 叶同学,你好, 实际工作中不太容易能够做到字符级别的标注或学习, 想问下, 如果不做字符级别的特征, 你们的模型效果大概在各项指标上是什么水平?

Training a new model 培訓新模型

@Real-YeJ @ying09

i have a folder containing a combination of ".jpg" and ".txt" of the textlines.
i want to train a new ctw1500 based model, how can i do that?

我有一個文件夾,其中包含“ .jpg”及其在“ .txt”中的文本行註釋的組合
我想訓練一個基於ctw1500的新模型,我該怎麼做?

ImportError: cannot import name '_C' from 'detectron2'

Hi, I'm trying to run your demo.

I installed the pytorch environment follow your 'step-by-step installation.txt'.
But when I use python demo/icdar2015_detection.py to run the demo. I came across this problem .


Traceback (most recent call last):
  File "demo/icdar2015_detection.py", line 12, in <module>
    from detectron2.data.detection_utils import read_image
  File "C:\Users\Tianh\Desktop\1-detect\TextFuseNet-master\demo\detectron2\data\__init__.py", line 4, in <module>
    from .build import (
  File "C:\Users\Tianh\Desktop\1-detect\TextFuseNet-master\demo\detectron2\data\build.py", line 13, in <module>
    from detectron2.structures import BoxMode
  File "C:\Users\Tianh\Desktop\1-detect\TextFuseNet-master\demo\detectron2\structures\__init__.py", line 2, in <module>
    from .boxes import Boxes, BoxMode, pairwise_iou
  File "C:\Users\Tianh\Desktop\1-detect\TextFuseNet-master\demo\detectron2\structures\boxes.py", line 7, in <module>
    from detectron2.layers import cat
  File "C:\Users\Tianh\Desktop\1-detect\TextFuseNet-master\demo\detectron2\layers\__init__.py", line 3, in <module>
    from .deform_conv import DeformConv, ModulatedDeformConv
  File "C:\Users\Tianh\Desktop\1-detect\TextFuseNet-master\demo\detectron2\layers\deform_conv.py", line 10, in <module>
    from detectron2 import _C
ImportError: cannot import name '_C' from 'detectron2' (C:\Users\Tianh\Desktop\1-detect\TextFuseNet-master\demo\detectron2\__init__.py)

Do you know why? Thanks!

Unable to compile if pytorch version is >1.4?

Hi,

I'm trying to pair the detection model with a recognition model that I have already trained while working on the character annotations. However, since I've trained the model in 1.6 and pytorch doesn't have forward compatibility in a certain case, I need to use pytorch 1.6 to compile the detectron for textfusenet. There seems to be an issue with older versions of detectron when trying to compile using pytorch >1.4.

I've tried compiling the new detectron on my own and using the fvcore that was provided but I was only met with the error AttributionError: module 'fvcore' has no attribute 'version'

I've also tried using pip's fvcore but it just came out with another error about missing texfusenet key which I assume means the detectron2 provided is modified.

Is there anyway to use textfusenet with a newer version of pytorch?

关于图3中的不同字符的特征相加的作用

作者你好,对于Fig 3中的字符级特征,文中说的是将每个字符对应的特征resize到14×14然后相加,但是它们对应的是不同字符的特征,比如说B的特征和A的特征相加,这样的作用是什么呢?
期待作者的回复,谢谢

Use pre-trained model for prediction without GPU

Hi,
I'd like to try the TextFuseNet architecture without training on new data but only to assess the performance of the model, is it possible to do it without GPU ?
I've followed the step by step installation guide and placed the detection model in a created folder according to the python file but when running the demo, it needs to have a GPU.

Is there something missing here ?

Question of NUM_CLASSES

I have a question while learning Korean dataset

Follow the steps below to proceed

  1. write config file
  2. register dataset( my dataset name is AISL dataset)
  3. then training below command
$ python tools/train_net.py --num-gpus 4 --config-file

below is config file ( just change the dataset name from total-text config file )

_BASE_: "./Base-RCNN-FPN.yaml"
MODEL:
  MASK_ON: True
  TEXTFUSENET_MUTIL_PATH_FUSE_ON: True
  WEIGHTS: "./out_dir_r101/totaltext_model/model_tt_r101.pth"
  PIXEL_STD: [57.375, 57.120, 58.395]
  RESNETS:
    STRIDE_IN_1X1: False  # this is a C2 model
    NUM_GROUPS: 32
    WIDTH_PER_GROUP: 8
    DEPTH: 101
  ROI_HEADS:
    NMS_THRESH_TEST: 0.4
  TEXTFUSENET_SEG_HEAD:
    FPN_FEATURES_FUSED_LEVEL: 1
    POOLER_SCALES: (0.125,)

DATASETS:
  TRAIN: ("AISLText",)
  TEST: ("AISLText",)
SOLVER:
  IMS_PER_BATCH: 8
  BASE_LR: 0.001
  STEPS: (40000,80000,)
  MAX_ITER: 120000
  CHECKPOINT_PERIOD: 2500

INPUT:
  MIN_SIZE_TRAIN: (800,1000,1200)
  MAX_SIZE_TRAIN: 1500
  MIN_SIZE_TEST: 800
  MAX_SIZE_TEST: 1333


OUTPUT_DIR: "./out_dir_r101/at_model/"

register with coco_register in detectron2/data/datasets/builtin.py.

image_path = "/home/ensa/JYB/TextFuseNet/datasets/AISLText/train_images"
json_path = "/home/ensa/JYB/TextFuseNet/datasets/AISLText/trainval.json"
register_coco_instances("AISLText", {},json_path, image_path)

An error occurs when learning

[01/19 18:35:50 d2.data.datasets.coco]: Loaded 3 images in COCO format from /home/ensa/JYB/TextFuseNet/datasets/AISLText/trainval.json
[01/19 18:35:50 d2.data.build]: Removed 0 images with no usable annotations. 3 images left.
[01/19 18:35:50 d2.data.build]: Distribution of training instances among all 31 categories:
|  category  | #instances   |  category  | #instances   |  category  | #instances   |
|:----------:|:-------------|:----------:|:-------------|:----------:|:-------------|
|     -      | 2            |     0      | 2            |     1      | 2            |
|     3      | 3            |     5      | 1            |     7      | 2            |
|     A      | 2            |     B      | 2            |     E      | 4            |
|     K      | 2            |     L      | 2            |     R      | 1            |
|     a      | 1            |     b      | 1            |     c      | 1            |
|     e      | 2            |     i      | 1            |     m      | 1            |
|     o      | 2            |     r      | 3            |     t      | 1            |
|    text    | 7            |     u      | 1            |     y      | 1            |
|     강      | 1            |     료      | 1            |     실      | 3            |
|     의      | 1            |     자      | 1            |     장      | 1            |
|     화      | 1            |            |              |            |              |
|   total    | 56           |            |              |            |              |
[01/19 18:35:50 d2.data.detection_utils]: TransformGens used in training: [ResizeShortestEdge(short_edge_length=(800, 1000, 1200), max_size=1500, sample_style='choice'), RandomFlip(), RandomContrast(intensity_min=0.5, intensity_max=1.5), RandomBrightness(intensity_min=0.5, intensity_max=1.5), RandomSaturation(intensity_min=0.5, intensity_max=1.5), RandomLighting(scale=1.1931034212737668)]
[01/19 18:35:50 d2.data.build]: Using training sampler TrainingSampler
[01/19 18:35:51 fvcore.common.checkpoint]: Loading checkpoint from ./out_dir_r101/totaltext_model/model_tt_r101.pth
[01/19 18:35:51 d2.engine.train_loop]: Starting training from iteration 0
[01/19 18:35:53 d2.engine.hooks]: Total training time: 0:00:01 (0:00:00 on hooks)
Traceback (most recent call last):
  File "tools/train_net.py", line 161, in <module>
    args=(args,),
  File "/home/ensa/JYB/TextFuseNet/detectron2/engine/launch.py", line 49, in launch
    daemon=False,
  File "/home/ensa/anaconda3/envs/textfusenet2/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 171, in spawn
    while not spawn_context.join():
  File "/home/ensa/anaconda3/envs/textfusenet2/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 118, in join
    raise Exception(msg)
Exception: 

-- Process 1 terminated with the following error:
Traceback (most recent call last):
  File "/home/ensa/anaconda3/envs/textfusenet2/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap
    fn(i, *args)
  File "/home/ensa/JYB/TextFuseNet/detectron2/engine/launch.py", line 84, in _distributed_worker
    main_func(*args)
  File "/home/ensa/JYB/TextFuseNet/tools/train_net.py", line 149, in main
    return trainer.train()
  File "/home/ensa/JYB/TextFuseNet/detectron2/engine/defaults.py", line 356, in train
    super().train(self.start_iter, self.max_iter)
  File "/home/ensa/JYB/TextFuseNet/detectron2/engine/train_loop.py", line 132, in train
    self.run_step()
  File "/home/ensa/JYB/TextFuseNet/detectron2/engine/train_loop.py", line 212, in run_step
    loss_dict = self.model(data)
  File "/home/ensa/anaconda3/envs/textfusenet2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ensa/anaconda3/envs/textfusenet2/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 442, in forward
    output = self.module(*inputs[0], **kwargs[0])
  File "/home/ensa/anaconda3/envs/textfusenet2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ensa/JYB/TextFuseNet/detectron2/modeling/meta_arch/rcnn.py", line 88, in forward
    _, detector_losses = self.roi_heads(images, features, proposals, gt_instances)
  File "/home/ensa/anaconda3/envs/textfusenet2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ensa/JYB/TextFuseNet/detectron2/modeling/roi_heads/roi_heads.py", line 584, in forward
    losses.update(self._forward_mask(features_list, proposals, targets))
  File "/home/ensa/JYB/TextFuseNet/detectron2/modeling/roi_heads/roi_heads.py", line 684, in _forward_mask
    mask_features = self.mutil_path_fuse_module(mask_features, global_context, proposals)
  File "/home/ensa/anaconda3/envs/textfusenet2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ensa/JYB/TextFuseNet/detectron2/modeling/roi_heads/mutil_path_fuse_module.py", line 110, in forward
    feature_fuse = char_context + x + global_context
RuntimeError: The size of tensor a (19) must match the size of tensor b (145) at non-singleton dimension 0

To test whether learning is possible,I just tested with 3 images.
then this error is occurred

I compared the your sample coco format to my coco format, but it was the same.

I need to learn at least 1000 characters, does this error relevant to the number of characters? or relevant to input size?

Thank you for reading
please help...

code for gt generation

Hi,
thanks for the fantastic research. is there a code for
the pretrain model inference on new dataset and generate gt(coco json file) containing character-level annotations.

RuntimeError:CUDA out of memory

I'm aware that this is more hardware issue of mine, but I was wondering if there is any way I can make the model little bit smaller to save GPU memory. Thank you in advance!

IMS_PER_BATCH: 2 is Error

I run with batchsize = 1 is oke. but 2 have error.
Traceback (most recent call last): File "tools/train_net.py", line 161, in <module> args=(args,), File "/media/data/bachtuan/TextFuseNet/detectron2/engine/launch.py", line 52, in launch main_func(*args) File "tools/train_net.py", line 149, in main return trainer.train() File "/media/data/bachtuan/TextFuseNet/detectron2/engine/defaults.py", line 356, in train super().train(self.start_iter, self.max_iter) File "/media/data/bachtuan/TextFuseNet/detectron2/engine/train_loop.py", line 132, in train self.run_step() File "/media/data/bachtuan/TextFuseNet/detectron2/engine/train_loop.py", line 212, in run_step loss_dict = self.model(data) File "/home/asilla/miniconda3/envs/textfusenet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__ result = self.forward(*input, **kwargs) File "/media/data/bachtuan/TextFuseNet/detectron2/modeling/meta_arch/rcnn.py", line 88, in forward _, detector_losses = self.roi_heads(images, features, proposals, gt_instances) File "/home/asilla/miniconda3/envs/textfusenet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__ result = self.forward(*input, **kwargs) File "/media/data/bachtuan/TextFuseNet/detectron2/modeling/roi_heads/roi_heads.py", line 584, in forward losses.update(self._forward_mask(features_list, proposals, targets)) File "/media/data/bachtuan/TextFuseNet/detectron2/modeling/roi_heads/roi_heads.py", line 684, in _forward_mask mask_features = self.mutil_path_fuse_module(mask_features, global_context, proposals) File "/home/asilla/miniconda3/envs/textfusenet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__ result = self.forward(*input, **kwargs) File "/media/data/bachtuan/TextFuseNet/detectron2/modeling/roi_heads/mutil_path_fuse_module.py", line 94, in forward text = x[char_pos[i]] IndexError: The shape of the mask [2] at index 0does not match the shape of the indexed tensor [9, 256, 14, 14] at index 0

multi-path fusion in detection branch ?

Hi, thank you for interesting works.

I'm confused about multi-path fusion in detection branch.
In the paper, it is explained the multi-path fusion in detection branch, which fuses "word level features" and "global level features (from Semantic segmentation branch)". This is depicted in Figure.2, and is explained in section 3.1, 3.2 in the paper.

But in the code, the multi-path fusion in detection branch is not there.
The class method "_forward_box" in class "StandardROIHeads" of /detectron2/modeling/roi_heads.py, does not use multi-path fuse, unlike the class method "_forward_mask" in the same root.. right?
Moreover, "mutil_path_fuse_module.py" explains the argument is mask roi features..

Is there anything i missed? Thank you.

installation error

When I follow exactly the step-by-step installation instructions, I got the error message like this when I run the demo code:

image

However, when I used I different version of pytorch which is installed by pip install torch==1.6.0+cu101 torchvision==0.7.0+cu101 -f https://download.pytorch.org/whl/torch_stable.html, then when I run python setup.py build develop, there comes the error:
image
I tried to exactly copy your environment and changed my cuda version from 10.2 to 10.1 and followed your step-by-step instructions, but it still doesn't work. Can you give a hint on what I need to do? Thanks! @Real-YeJ

关于mutil_path_fuse_module

作者你好,paper中的流程图中显示的是 detection 分支和mask 分支都有mutil_path_fuse_module,但是在代码中只有roi_heads.py中的_forward_mask函数调用了mutil_path_fuse_module,好像检测分支并没有调用,请问在最终的实现中检测分支是否使用了mutil_path_fuse_module。

question about train synthetic data and funsd data

  1. I have tried to train the model on synthetic data (keras-ocr https://keras-ocr.readthedocs.io/en/latest/examples/end_to_end_training.html#generating-synthetic-data). I have 10000 background images. Till now i have trained for 25000 iterations with pre-trained weights of synth text model but not able to see any result so can you tell me how many iterations i have to train the model.

  2. I have also try training on https://guillaumejaume.github.io/FUNSD/download/ dataset. which is based on documents where the word is annotated. I have modified your code a little bit to train only on word level. I am training with pretrained ctw model weights. Following is my metrics file can you tell me its looks fine or I have done something wrong?
    metrics.txt

代码现在是否只支持batch_size==1?

Mutil_Path_Fuse_Module::forward
if self.training:
proposal_boxes = proposals[0].proposal_boxes
classes = proposals[0].gt_classes
else:
proposal_boxes = proposals[0].pred_boxes
classes = proposals[0].pred_classes

    if len(proposal_boxes) == 0:
          return x

代码中只取了proposals[0],batch_size >1时 text = x[char_pos[i]] 会报错

the meaning of each tag in json file

"annotations":[
{
"area":14902.5,
"bbox":[
817,
431,
164,
162
],..
What is "area"?
And bbox is [xmin, ymin, xmax, ymax].
pls help me for create custom train.json

train the model on Icdar2015 dataset

Good day! I want to train the model on ICDAR 2015 dataset. Are there anyway to convert the data in such form that the loader can understand? I already read the README file in datasets folder, but I am looking for some conversion code that helps. Thank you

Question about training custom dataset

Hi I have 2 questions about training,

  1. Are both bbox and segmentation used?
  2. Do we have to label character by character or would word/sentences be fine as well? In the sample json it does seem to have labels only by character( e.g.annotation[0]= category_id: 1, segmentation:[[...]]) but since it's a detection model I'm not sure if the label(category_id) matters.

Thanks for your time and the model!

Textfusenet model pth to mar torchserve?

Has anyone done this yet?
Is it possible to share the code?
i tried torch-model-archiver --model-name textfusenet --version 1.0 --model-file model.py --serialized-file model.pth --export-path model_store --extra-files config.yaml. with model.py = model_zoo.py and it's not success.

demo error. _C.cpython-37m-x86_64-linux-gnu.so: undefined symbol:

I followed the step-by-step installation. (https://github.com/ying09/TextFuseNet/blob/master/step-by-step%20installation.txt)
I got an error for running the demo.

Traceback (most recent call last):
File "demo/icdar2013_detection.py", line 12, in
from detectron2.data.detection_utils import read_image
File "/home/ubuntu/source/TextFuseNet/detectron2/data/init.py", line 4, in
from .build import (
File "/home/ubuntu/source/TextFuseNet/detectron2/data/build.py", line 13, in
from detectron2.structures import BoxMode
File "/home/ubuntu/source/TextFuseNet/detectron2/structures/init.py", line 2, in
from .boxes import Boxes, BoxMode, pairwise_iou
File "/home/ubuntu/source/TextFuseNet/detectron2/structures/boxes.py", line 7, in
from detectron2.layers import cat
File "/home/ubuntu/source/TextFuseNet/detectron2/layers/init.py", line 3, in
from .deform_conv import DeformConv, ModulatedDeformConv
File "/home/ubuntu/source/TextFuseNet/detectron2/layers/deform_conv.py", line 10, in
from detectron2 import _C
ImportError: /home/ubuntu/source/TextFuseNet/detectron2/_C.cpython-37m-x86_64-linux-gnu.so: undefined symbol: _ZN3c105ErrorC1ENS_14SourceLocationERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE

Runtime error

Hi I am getting the following runtime error:

(textfusenet) mickey@MICKEY-2080TI:/mnt/d/download/GitHub/Examples/2020-09-28 TextFuseNet/TextFuseNet-master$ python demo/icdar2015_detection.py --input one-frame.jpg 
Config './configs/ocr/icdar2015_101_FPN.yaml' has no VERSION. Assuming it to be compatible with latest v2.
Traceback (most recent call last):
  File "demo/icdar2015_detection.py", line 128, in <module>
    for i in glob.glob(test_images_path):
  File "/home/mickey/miniconda3/envs/textfusenet/lib/python3.7/glob.py", line 20, in glob
    return list(iglob(pathname, recursive=recursive))
  File "/home/mickey/miniconda3/envs/textfusenet/lib/python3.7/glob.py", line 40, in _iglob
    dirname, basename = os.path.split(pathname)
  File "/home/mickey/miniconda3/envs/textfusenet/lib/python3.7/posixpath.py", line 107, in split
    p = os.fspath(p)
TypeError: expected str, bytes or os.PathLike object, not list

Best,
Mickey

datasets format

hi, what is the format of training-datasets ground truth ? Is it similar to the ground truth of detection or semantic segmentation? Should the location of each word be labeled? Can this model be used to do semantic segmentation tasks only? Thank you very much!

RuntimeError: Not compiled with GPU support

Hi @ying09 ,

I followed the instructions in the step-by-step installation.txt and was able to go through with no issues.

However, when I try to run the demo\icdar2013_detection.py along with the required options, I get an error RuntimeError: Not compiled with GPU support.

Both the input options and the error is shown in the screenshot below -

image

Let me know if you need any other information.

results when use ResNet50 as backbone

Hello, the existing text detection backbone is generally ResNet50, but the results given in the paper are the results of ResNet101. What are the results of TextFuseNET on several datasets when using ResNet50 as the backbone?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.