ying09 / textfusenet Goto Github PK
View Code? Open in Web Editor NEWA PyTorch implementation of "TextFuseNet: Scene Text Detection with Richer Fused Features".
License: MIT License
A PyTorch implementation of "TextFuseNet: Scene Text Detection with Richer Fused Features".
License: MIT License
Facing this issue when running the icdar2015_detection.py file.
Not able to figure out, need help.
Note: I have changed the path for the respective weights, input, output, and config directories.
ModuleNotFoundError: No module named 'main.register_coco'; 'main' is not a package
When i run python detectron2/data/datasets/builtin.py
for register custom dataset.
Facing this issue when running the icdar2015_detection.py file.
Note: I have changed the path for the respective weights, input, output, and config directories.
Couldn't resolve it, need help.
It seems this will always generate a inter_percent
of all 1
, since boxes1
and boxes2
are the same. Is this the expected behavior of the model illustrated in the paper?
By running the demo, we can get a visualization image with bounding boxes and characters. However, is there any inference command which can return the words or phrases instead of only characters? Thanks! @Real-YeJ
I observed that when doing a demo, the model takes much of the GPU memory, making it difficult to test for large image with multiple text instances (crashed usually due to memory limit). Are there anyway to go around this, i.e., resize the image and test it or something?
Please provide any code snippets for registering icdar 2013 dataset for training.
I need tool to annotate custom data
please, provide it
thanks
Hi, how to get the word result in the word-level instance instead of the possibility? Furthermore, could you give me some advice if I need it perform better for the vertical- or even inverted-type text using new training dataset?
Looking forward to your help. Thanks.
i have custom data. can you suggest for me some tool annotate for that data like your example data train.
I have some problems when i tried to run python demo/icdar2013_detection.py on pytorch 1.4 - cuda 10.0
Step-by-step installation at https://github.com/facebookresearch/detectron2/blob/master/INSTALL.md
Step-by-step installation at https://github.com/ying09/TextFuseNet/blob/master/step-by-step%20installation.txt
I want to recognize Chinese characters, how should I train
@Real-YeJ 叶同学,你好, 实际工作中不太容易能够做到字符级别的标注或学习, 想问下, 如果不做字符级别的特征, 你们的模型效果大概在各项指标上是什么水平?
Hello, thank you for sharing this very good job. Could you please provide a trained model of ResNet50, it will be very helpful, thank you.
Looking forward to your reply, thank you.
.
Hi, I'm trying to run your demo.
I installed the pytorch environment follow your 'step-by-step installation.txt'.
But when I use python demo/icdar2015_detection.py
to run the demo. I came across this problem .
Traceback (most recent call last):
File "demo/icdar2015_detection.py", line 12, in <module>
from detectron2.data.detection_utils import read_image
File "C:\Users\Tianh\Desktop\1-detect\TextFuseNet-master\demo\detectron2\data\__init__.py", line 4, in <module>
from .build import (
File "C:\Users\Tianh\Desktop\1-detect\TextFuseNet-master\demo\detectron2\data\build.py", line 13, in <module>
from detectron2.structures import BoxMode
File "C:\Users\Tianh\Desktop\1-detect\TextFuseNet-master\demo\detectron2\structures\__init__.py", line 2, in <module>
from .boxes import Boxes, BoxMode, pairwise_iou
File "C:\Users\Tianh\Desktop\1-detect\TextFuseNet-master\demo\detectron2\structures\boxes.py", line 7, in <module>
from detectron2.layers import cat
File "C:\Users\Tianh\Desktop\1-detect\TextFuseNet-master\demo\detectron2\layers\__init__.py", line 3, in <module>
from .deform_conv import DeformConv, ModulatedDeformConv
File "C:\Users\Tianh\Desktop\1-detect\TextFuseNet-master\demo\detectron2\layers\deform_conv.py", line 10, in <module>
from detectron2 import _C
ImportError: cannot import name '_C' from 'detectron2' (C:\Users\Tianh\Desktop\1-detect\TextFuseNet-master\demo\detectron2\__init__.py)
Do you know why? Thanks!
Hi,
I'm trying to pair the detection model with a recognition model that I have already trained while working on the character annotations. However, since I've trained the model in 1.6 and pytorch doesn't have forward compatibility in a certain case, I need to use pytorch 1.6 to compile the detectron for textfusenet. There seems to be an issue with older versions of detectron when trying to compile using pytorch >1.4.
I've tried compiling the new detectron on my own and using the fvcore that was provided but I was only met with the error AttributionError: module 'fvcore' has no attribute 'version'
I've also tried using pip's fvcore but it just came out with another error about missing texfusenet key which I assume means the detectron2 provided is modified.
Is there anyway to use textfusenet with a newer version of pytorch?
Does the tool detect languages other than English?
作者你好,对于Fig 3中的字符级特征,文中说的是将每个字符对应的特征resize到14×14然后相加,但是它们对应的是不同字符的特征,比如说B的特征和A的特征相加,这样的作用是什么呢?
期待作者的回复,谢谢
Hi,
I'd like to try the TextFuseNet architecture without training on new data but only to assess the performance of the model, is it possible to do it without GPU ?
I've followed the step by step installation guide and placed the detection model in a created folder according to the python file but when running the demo, it needs to have a GPU.
Is there something missing here ?
I have a question while learning Korean dataset
$ python tools/train_net.py --num-gpus 4 --config-file
_BASE_: "./Base-RCNN-FPN.yaml"
MODEL:
MASK_ON: True
TEXTFUSENET_MUTIL_PATH_FUSE_ON: True
WEIGHTS: "./out_dir_r101/totaltext_model/model_tt_r101.pth"
PIXEL_STD: [57.375, 57.120, 58.395]
RESNETS:
STRIDE_IN_1X1: False # this is a C2 model
NUM_GROUPS: 32
WIDTH_PER_GROUP: 8
DEPTH: 101
ROI_HEADS:
NMS_THRESH_TEST: 0.4
TEXTFUSENET_SEG_HEAD:
FPN_FEATURES_FUSED_LEVEL: 1
POOLER_SCALES: (0.125,)
DATASETS:
TRAIN: ("AISLText",)
TEST: ("AISLText",)
SOLVER:
IMS_PER_BATCH: 8
BASE_LR: 0.001
STEPS: (40000,80000,)
MAX_ITER: 120000
CHECKPOINT_PERIOD: 2500
INPUT:
MIN_SIZE_TRAIN: (800,1000,1200)
MAX_SIZE_TRAIN: 1500
MIN_SIZE_TEST: 800
MAX_SIZE_TEST: 1333
OUTPUT_DIR: "./out_dir_r101/at_model/"
image_path = "/home/ensa/JYB/TextFuseNet/datasets/AISLText/train_images"
json_path = "/home/ensa/JYB/TextFuseNet/datasets/AISLText/trainval.json"
register_coco_instances("AISLText", {},json_path, image_path)
[01/19 18:35:50 d2.data.datasets.coco]: Loaded 3 images in COCO format from /home/ensa/JYB/TextFuseNet/datasets/AISLText/trainval.json
[01/19 18:35:50 d2.data.build]: Removed 0 images with no usable annotations. 3 images left.
[01/19 18:35:50 d2.data.build]: Distribution of training instances among all 31 categories:
| category | #instances | category | #instances | category | #instances |
|:----------:|:-------------|:----------:|:-------------|:----------:|:-------------|
| - | 2 | 0 | 2 | 1 | 2 |
| 3 | 3 | 5 | 1 | 7 | 2 |
| A | 2 | B | 2 | E | 4 |
| K | 2 | L | 2 | R | 1 |
| a | 1 | b | 1 | c | 1 |
| e | 2 | i | 1 | m | 1 |
| o | 2 | r | 3 | t | 1 |
| text | 7 | u | 1 | y | 1 |
| 강 | 1 | 료 | 1 | 실 | 3 |
| 의 | 1 | 자 | 1 | 장 | 1 |
| 화 | 1 | | | | |
| total | 56 | | | | |
[01/19 18:35:50 d2.data.detection_utils]: TransformGens used in training: [ResizeShortestEdge(short_edge_length=(800, 1000, 1200), max_size=1500, sample_style='choice'), RandomFlip(), RandomContrast(intensity_min=0.5, intensity_max=1.5), RandomBrightness(intensity_min=0.5, intensity_max=1.5), RandomSaturation(intensity_min=0.5, intensity_max=1.5), RandomLighting(scale=1.1931034212737668)]
[01/19 18:35:50 d2.data.build]: Using training sampler TrainingSampler
[01/19 18:35:51 fvcore.common.checkpoint]: Loading checkpoint from ./out_dir_r101/totaltext_model/model_tt_r101.pth
[01/19 18:35:51 d2.engine.train_loop]: Starting training from iteration 0
[01/19 18:35:53 d2.engine.hooks]: Total training time: 0:00:01 (0:00:00 on hooks)
Traceback (most recent call last):
File "tools/train_net.py", line 161, in <module>
args=(args,),
File "/home/ensa/JYB/TextFuseNet/detectron2/engine/launch.py", line 49, in launch
daemon=False,
File "/home/ensa/anaconda3/envs/textfusenet2/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 171, in spawn
while not spawn_context.join():
File "/home/ensa/anaconda3/envs/textfusenet2/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 118, in join
raise Exception(msg)
Exception:
-- Process 1 terminated with the following error:
Traceback (most recent call last):
File "/home/ensa/anaconda3/envs/textfusenet2/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap
fn(i, *args)
File "/home/ensa/JYB/TextFuseNet/detectron2/engine/launch.py", line 84, in _distributed_worker
main_func(*args)
File "/home/ensa/JYB/TextFuseNet/tools/train_net.py", line 149, in main
return trainer.train()
File "/home/ensa/JYB/TextFuseNet/detectron2/engine/defaults.py", line 356, in train
super().train(self.start_iter, self.max_iter)
File "/home/ensa/JYB/TextFuseNet/detectron2/engine/train_loop.py", line 132, in train
self.run_step()
File "/home/ensa/JYB/TextFuseNet/detectron2/engine/train_loop.py", line 212, in run_step
loss_dict = self.model(data)
File "/home/ensa/anaconda3/envs/textfusenet2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__
result = self.forward(*input, **kwargs)
File "/home/ensa/anaconda3/envs/textfusenet2/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 442, in forward
output = self.module(*inputs[0], **kwargs[0])
File "/home/ensa/anaconda3/envs/textfusenet2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__
result = self.forward(*input, **kwargs)
File "/home/ensa/JYB/TextFuseNet/detectron2/modeling/meta_arch/rcnn.py", line 88, in forward
_, detector_losses = self.roi_heads(images, features, proposals, gt_instances)
File "/home/ensa/anaconda3/envs/textfusenet2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__
result = self.forward(*input, **kwargs)
File "/home/ensa/JYB/TextFuseNet/detectron2/modeling/roi_heads/roi_heads.py", line 584, in forward
losses.update(self._forward_mask(features_list, proposals, targets))
File "/home/ensa/JYB/TextFuseNet/detectron2/modeling/roi_heads/roi_heads.py", line 684, in _forward_mask
mask_features = self.mutil_path_fuse_module(mask_features, global_context, proposals)
File "/home/ensa/anaconda3/envs/textfusenet2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__
result = self.forward(*input, **kwargs)
File "/home/ensa/JYB/TextFuseNet/detectron2/modeling/roi_heads/mutil_path_fuse_module.py", line 110, in forward
feature_fuse = char_context + x + global_context
RuntimeError: The size of tensor a (19) must match the size of tensor b (145) at non-singleton dimension 0
To test whether learning is possible,I just tested with 3 images.
then this error is occurred
I compared the your sample coco format to my coco format, but it was the same.
I need to learn at least 1000 characters, does this error relevant to the number of characters? or relevant to input size?
Thank you for reading
please help...
Hi,
thanks for the fantastic research. is there a code for
the pretrain model inference on new dataset and generate gt(coco json file) containing character-level annotations.
Will you share the details of the weakly supervised part?
Thanks.
I'm aware that this is more hardware issue of mine, but I was wondering if there is any way I can make the model little bit smaller to save GPU memory. Thank you in advance!
I run with batchsize = 1 is oke. but 2 have error.
Traceback (most recent call last): File "tools/train_net.py", line 161, in <module> args=(args,), File "/media/data/bachtuan/TextFuseNet/detectron2/engine/launch.py", line 52, in launch main_func(*args) File "tools/train_net.py", line 149, in main return trainer.train() File "/media/data/bachtuan/TextFuseNet/detectron2/engine/defaults.py", line 356, in train super().train(self.start_iter, self.max_iter) File "/media/data/bachtuan/TextFuseNet/detectron2/engine/train_loop.py", line 132, in train self.run_step() File "/media/data/bachtuan/TextFuseNet/detectron2/engine/train_loop.py", line 212, in run_step loss_dict = self.model(data) File "/home/asilla/miniconda3/envs/textfusenet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__ result = self.forward(*input, **kwargs) File "/media/data/bachtuan/TextFuseNet/detectron2/modeling/meta_arch/rcnn.py", line 88, in forward _, detector_losses = self.roi_heads(images, features, proposals, gt_instances) File "/home/asilla/miniconda3/envs/textfusenet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__ result = self.forward(*input, **kwargs) File "/media/data/bachtuan/TextFuseNet/detectron2/modeling/roi_heads/roi_heads.py", line 584, in forward losses.update(self._forward_mask(features_list, proposals, targets)) File "/media/data/bachtuan/TextFuseNet/detectron2/modeling/roi_heads/roi_heads.py", line 684, in _forward_mask mask_features = self.mutil_path_fuse_module(mask_features, global_context, proposals) File "/home/asilla/miniconda3/envs/textfusenet/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__ result = self.forward(*input, **kwargs) File "/media/data/bachtuan/TextFuseNet/detectron2/modeling/roi_heads/mutil_path_fuse_module.py", line 94, in forward text = x[char_pos[i]] IndexError: The shape of the mask [2] at index 0does not match the shape of the indexed tensor [9, 256, 14, 14] at index 0
Hi, thank you for interesting works.
I'm confused about multi-path fusion in detection branch.
In the paper, it is explained the multi-path fusion in detection branch, which fuses "word level features" and "global level features (from Semantic segmentation branch)". This is depicted in Figure.2, and is explained in section 3.1, 3.2 in the paper.
But in the code, the multi-path fusion in detection branch is not there.
The class method "_forward_box" in class "StandardROIHeads" of /detectron2/modeling/roi_heads.py, does not use multi-path fuse, unlike the class method "_forward_mask" in the same root.. right?
Moreover, "mutil_path_fuse_module.py" explains the argument is mask roi features..
Is there anything i missed? Thank you.
When I follow exactly the step-by-step installation instructions, I got the error message like this when I run the demo code:
However, when I used I different version of pytorch which is installed by pip install torch==1.6.0+cu101 torchvision==0.7.0+cu101 -f https://download.pytorch.org/whl/torch_stable.html
, then when I run python setup.py build develop
, there comes the error:
I tried to exactly copy your environment and changed my cuda version from 10.2 to 10.1 and followed your step-by-step instructions, but it still doesn't work. Can you give a hint on what I need to do? Thanks! @Real-YeJ
作者你好,paper中的流程图中显示的是 detection 分支和mask 分支都有mutil_path_fuse_module,但是在代码中只有roi_heads.py中的_forward_mask函数调用了mutil_path_fuse_module,好像检测分支并没有调用,请问在最终的实现中检测分支是否使用了mutil_path_fuse_module。
I have tried to train the model on synthetic data (keras-ocr https://keras-ocr.readthedocs.io/en/latest/examples/end_to_end_training.html#generating-synthetic-data). I have 10000 background images. Till now i have trained for 25000 iterations with pre-trained weights of synth text model but not able to see any result so can you tell me how many iterations i have to train the model.
I have also try training on https://guillaumejaume.github.io/FUNSD/download/ dataset. which is based on documents where the word is annotated. I have modified your code a little bit to train only on word level. I am training with pretrained ctw model weights. Following is my metrics file can you tell me its looks fine or I have done something wrong?
metrics.txt
hi there,
作者你好!请问能提供对应的json文件吗?谢谢。
Hello, in the def _forward_mask() of roi_heads.py, only the mask probability of each roi is provided in the inference phase. How can I get the mask of each text instance in roi_heads.py in the inference phase (that is, get the coordinates of the contour of each mask)? I have tried for a long time, but still can't get the mask area of each text instance. Please help me, thanks.
Mutil_Path_Fuse_Module::forward
if self.training:
proposal_boxes = proposals[0].proposal_boxes
classes = proposals[0].gt_classes
else:
proposal_boxes = proposals[0].pred_boxes
classes = proposals[0].pred_classes
if len(proposal_boxes) == 0:
return x
代码中只取了proposals[0],batch_size >1时 text = x[char_pos[i]] 会报错
"annotations":[
{
"area":14902.5,
"bbox":[
817,
431,
164,
162
],..
What is "area"?
And bbox is [xmin, ymin, xmax, ymax].
pls help me for create custom train.json
Good day! I want to train the model on ICDAR 2015 dataset. Are there anyway to convert the data in such form that the loader can understand? I already read the README file in datasets folder, but I am looking for some conversion code that helps. Thank you
Hi I have 2 questions about training,
Thanks for your time and the model!
Has anyone done this yet?
Is it possible to share the code?
i tried torch-model-archiver --model-name textfusenet --version 1.0 --model-file model.py --serialized-file model.pth --export-path model_store --extra-files config.yaml
. with model.py = model_zoo.py and it's not success.
I followed the step-by-step installation. (https://github.com/ying09/TextFuseNet/blob/master/step-by-step%20installation.txt)
I got an error for running the demo.
Traceback (most recent call last):
File "demo/icdar2013_detection.py", line 12, in
from detectron2.data.detection_utils import read_image
File "/home/ubuntu/source/TextFuseNet/detectron2/data/init.py", line 4, in
from .build import (
File "/home/ubuntu/source/TextFuseNet/detectron2/data/build.py", line 13, in
from detectron2.structures import BoxMode
File "/home/ubuntu/source/TextFuseNet/detectron2/structures/init.py", line 2, in
from .boxes import Boxes, BoxMode, pairwise_iou
File "/home/ubuntu/source/TextFuseNet/detectron2/structures/boxes.py", line 7, in
from detectron2.layers import cat
File "/home/ubuntu/source/TextFuseNet/detectron2/layers/init.py", line 3, in
from .deform_conv import DeformConv, ModulatedDeformConv
File "/home/ubuntu/source/TextFuseNet/detectron2/layers/deform_conv.py", line 10, in
from detectron2 import _C
ImportError: /home/ubuntu/source/TextFuseNet/detectron2/_C.cpython-37m-x86_64-linux-gnu.so: undefined symbol: _ZN3c105ErrorC1ENS_14SourceLocationERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
Hi I am getting the following runtime error:
(textfusenet) mickey@MICKEY-2080TI:/mnt/d/download/GitHub/Examples/2020-09-28 TextFuseNet/TextFuseNet-master$ python demo/icdar2015_detection.py --input one-frame.jpg
Config './configs/ocr/icdar2015_101_FPN.yaml' has no VERSION. Assuming it to be compatible with latest v2.
Traceback (most recent call last):
File "demo/icdar2015_detection.py", line 128, in <module>
for i in glob.glob(test_images_path):
File "/home/mickey/miniconda3/envs/textfusenet/lib/python3.7/glob.py", line 20, in glob
return list(iglob(pathname, recursive=recursive))
File "/home/mickey/miniconda3/envs/textfusenet/lib/python3.7/glob.py", line 40, in _iglob
dirname, basename = os.path.split(pathname)
File "/home/mickey/miniconda3/envs/textfusenet/lib/python3.7/posixpath.py", line 107, in split
p = os.fspath(p)
TypeError: expected str, bytes or os.PathLike object, not list
Best,
Mickey
Can you guide me on how to calculate F-measure, Recall, and Precision using this code. Do we need to implement it in this implementation?
hi, what is the format of training-datasets ground truth ? Is it similar to the ground truth of detection or semantic segmentation? Should the location of each word be labeled? Can this model be used to do semantic segmentation tasks only? Thank you very much!
Hi @ying09 ,
I followed the instructions in the step-by-step installation.txt
and was able to go through with no issues.
However, when I try to run the demo\icdar2013_detection.py
along with the required options, I get an error RuntimeError: Not compiled with GPU support
.
Both the input options and the error is shown in the screenshot below -
Let me know if you need any other information.
Hello, the existing text detection backbone is generally ResNet50, but the results given in the paper are the results of ResNet101. What are the results of TextFuseNET on several datasets when using ResNet50 as the backbone?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.