breezedeus / cnstd Goto Github PK

CnSTD: 基于 PyTorch/MXNet 的中文/英文场景文字检测（Scene Text Detection）、数学公式检测（Mathematical Formula Detection, MFD）、篇章分析（Layout Analysis）的Python3 包

Home Page: https://www.breezedeus.com/article/cnocr

License: Apache License 2.0

Python 99.68% Makefile 0.32%

object-detection pytorch text-detection deep-learning math-formula-detection ocr python scene-text-detection

cnstd's People

Contributors

Stargazers

Watchers

Forkers

kiciro sunweiconfidence shan6333 askintution lemongeek ericxsun rzechen kevinconan iceflameworm monkeyfx xyouyou zoutaov laoli2046 tangzhongliang rfhzhj kingshine58668 satanmp yafengguo salmck pypingyi pengcheng2015 taianjianbing ryansunny520 vinovo 10bits zzmcdc 1149461885 huanghua1668 mickelzhang kaixinbaba liu-guo-jing huabei-li guitaryourself guardangely chris118 fylsle aircraft-852 szliyang sanghy yangyin2016 sky8652 lirunhualk gitccb allensmile aiedward yaoxuanzhi jinmana tommy87166 wanpong007 youisbaby ghplvh york-chan visesky glqf garin-wang codesaturday bobosui couchpotato3508 zhogjiane zhguo fansun126 hyzycczz yjscode waezwtd xjiujiu xueyifeiyun ranossy youweideng80 shuyansy happyxy haiyaqingdao binary-husky aspnetcs babyaries wghabc123456 kandyjam xinzi2018 max975 lemonjoy naiya0720 xixiyahaha lee-lemay wovenmind-workspaces shadow-alex unanan alanlonglong zhangjinbo haoxiongliu fanhuafeng sharny buiduchanh alljoyland wangchuktsering joe2hpimn lihuibng hpdqddsy floatingnumber yichunde1214 nonomal ezhangle

cnstd's Issues

🔥得到的box中，weight和height有时是颠倒的

我发现得到的box中，weight和height有时是颠倒的，我只是把输出的box画在图上，未做任何调整，出现颠倒的概率还不低，我测过许多图片，都会出现这种情况。是我用法不对么？（左图直接画框，右图是经过自己的判断画的框）

from cnstd import CnStd
import cv2

std = CnStd()
img_path = 'imgs/4.jpg'
img = cv2.imread(img_path)
box_infos = std.detect(img_path)

for box_info in box_infos['detected_texts']:
    box = box_info['box']
    center_x = box[0]
    center_y = box[1]
    w = box[2]  
    h = box[3]

    first_point = (int(center_x - w / 2), int(center_y - h / 2))
    last_point = (int(center_x + w / 2), int(center_y + h / 2))
    cv2.rectangle(img, first_point, last_point, (0, 255, 0), 2)
cv2.imwrite('test.jpg', img)

ImportError: cannot import name 'xyxy24p' from 'cnstd.yolov7.general'

windows下使用cnstd evaluate -i examples/taobao.jpg -o outputs报错

Traceback (most recent call last):
File "d:\anaconda3\envs\ocr\lib\runpy.py", line 193, in run_module_as_main
"main", mod_spec)
File "d:\anaconda3\envs\ocr\lib\runpy.py", line 85, in run_code
exec(code, run_globals)
File "D:\Anaconda3\envs\ocr\Scripts\cnstd.exe_main.py", line 4, in
File "d:\anaconda3\envs\ocr\lib\site-packages\cnstd\cli.py", line 23, in
from .train import train
File "d:\anaconda3\envs\ocr\lib\site-packages\cnstd\train.py", line 28, in
from .datasets.dataloader import StdDataset
File "d:\anaconda3\envs\ocr\lib\site-packages\cnstd\datasets\dataloader.py", line 28, in
from .util import parse_lines, process_data
File "d:\anaconda3\envs\ocr\lib\site-packages\cnstd\datasets\util.py", line 22, in
from shapely.geometry import Polygon
File "d:\anaconda3\envs\ocr\lib\site-packages\shapely\geometry_init.py", line 4, in
from .base import CAP_STYLE, JOIN_STYLE
File "d:\anaconda3\envs\ocr\lib\site-packages\shapely\geometry\base.py", line 18, in
from shapely.coords import CoordinateSequence
File "d:\anaconda3\envs\ocr\lib\site-packages\shapely\coords.py", line 8, in
from shapely.geos import lgeos
File "d:\anaconda3\envs\ocr\lib\site-packages\shapely\geos.py", line 145, in
lgeos = CDLL(os.path.join(sys.prefix, 'Library', 'bin', 'geos_c.dll'))
File "d:\anaconda3\envs\ocr\lib\ctypes_init.py", line 348, in init
self._handle = _dlopen(self._name, mode)
OSError: [WinError 126] 找不到指定的模块。
在cmd中报错命令如上所示。但在pycharm的终端界面可正常运行

训练模型百度云盘连接失效了，大佬补一下连接哈！

大佬补一下连接哈，3Q

Over resource limits on Streamlit Cloud

Hey there 👋 Just wanted to let you know that your app on Streamlit Cloud deployed from this repo has gone over its resource limits. Access to the app is temporarily limited. Visit the app to see more details and possible solutions.

README.md 文档上有个错误

from cnstd import CnStd
from cnocr import CnOcr

std = CnStd()
cn_ocr = CnOcr()

box_infos = std.detect('examples/taobao.jpg')

for box_info in box_infos['detected_texts']:
cropped_img = box_info['cropped_img']
ocr_res = cn_ocr.ocr_for_single_line(cropped_img)
print('ocr result: %s' % str(ocr_out))

ocr_out变量没有和pring的时候应该是使用的 ocr_res 变量吧

Detection Benchmark

@breezedeus Hi there,
what is the current detection results on SROIE, ICDAR2019-LSVT , ICDAR-RCTW-17
Recall | Precision | Hmean

百度云失效了

Potential bugs? Hard code of the context

self._model_backend = 'onnx' at here
It seems that the det_model_backend setting will be overrided. Even if you set det_model_backend='pytorch', it still going to use the 'onnx'.

连续调用时会消耗越来越多的内存

用 memory_profiler 打了下内存消耗：

   169   3045.3 MiB      0.0 MiB           h, w, _ = resize_img.shape
   170   3054.2 MiB      8.9 MiB           resize_img = normalize_img_array(resize_img)
   171   3054.2 MiB      0.0 MiB           im_res = mx.nd.array(resize_img)
   172   3054.5 MiB      0.3 MiB           im_res = self._trans(im_res)
   173
   174   3054.7 MiB      0.2 MiB           t1 = time.time()
   175   3064.7 MiB     10.0 MiB           seg_maps = self._model(im_res.expand_dims(axis=0).as_in_context(self._context))
   176   4056.1 MiB    991.4 MiB           mx.nd.waitall()
   177   4056.1 MiB      0.0 MiB           seg_maps = seg_maps.asnumpy()
   178   4056.1 MiB      0.0 MiB           t2 = time.time()
   179   4056.1 MiB      0.0 MiB           boxes, scores, rects = detect_pse(
   180   4056.1 MiB      0.0 MiB               seg_maps,
   181   4056.1 MiB      0.0 MiB               threshold=pse_threshold,
   182   4056.1 MiB      0.0 MiB               threshold_k=pse_threshold,

发现是 mx.nd.waitall()每预测一张图片就会消耗不少内存，而这些内存好像没完全被释放，导致内存消耗越来越大。

网上搜了下，好像是MXNet的一个未解问题： Memory leak when running cpu inference - Gluon - MXNet Forum

detect()传入**kwargs时显示类型错误

代码
kwargs = {'height_border':0.05,'width_border':0.05}
box_info_list = std.detect(img,**kwargs)

运行报错
TypeError: detect() got an unexpected keyword argument 'height_border'

请问如何使用fp16的模型，有相关参数吗

使用django做了一个接口，使用了cnocr，但并行调用一段时间后mxnet报了如下错误，是模型中尺寸不匹配？

terminate called after throwing an instance of 'dmlc::Error'
what(): [16:24:07] src/operator/contrib/./../elemwise_op_common.h:135: Check failed: assign(&dattr, vec.at(i)): Incompatible attr in node at 0-th output: expected [1,1,32,115], got [1,1,32,131]
Stack trace:
[bt] (0) /home/user/anaconda3/envs/summer/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x307d3b) [0x7f50ccd31d3b]
[bt] (1) /home/user/anaconda3/envs/summer/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x4b4cb9) [0x7f50ccedecb9]
[bt] (2) /home/user/anaconda3/envs/summer/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x4b5572) [0x7f50ccedf572]
[bt] (3) /home/user/anaconda3/envs/summer/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x60f04c) [0x7f50cd03904c]
[bt] (4) /home/user/anaconda3/envs/summer/lib/python3.6/site-packages/mxnet/libmxnet.so(mxnet::imperative::SetShapeType(mxnet::Context const&, nnvm::NodeAttrs const&, std::vector<mxnet::NDArray*, std::allocatormxnet::NDArray* > const&, std::vector<mxnet::NDArray*, std::allocatormxnet::NDArray* > const&, mxnet::DispatchMode*)+0x1d27) [0x7f50cfecbaa7]

想进群不知道验证答案，运行例子box_infos长度只有2

命令行报错，因为源码中绘制边框时没有转成整形

pse的后处理是py版本的，啥时候上c++版本啊

预训练的下载链接找不到资源

预训练的下载链接失效，哪里能下载？

cropped_image完全翻转了原文字导致后面的识别失败

通过detect函数返回的信息中，有一些cropped_image被cnocr识别成乱码，把这些array转换为图片发现它们把文字完全翻转过来了

既然模型 resnet50_v1b 精度略高于模型 mobilenetv3。为什么默认的是mobilenetv3？是因为mobilenetv3 更快吗

文字矫正

大佬，我想问下，你使用的模型是pse，里面有对pse的检测结果进行矫正？？？还是直接用pse的检测结果？

Over resource limits on Streamlit Cloud

layout模型预测时能否单独设置--resized-shape

cnstd analyze 参数 --resized-shape 只能输入一个32倍数的整数，shape的height和width只能是一个整数，能否设置不同的值
`cnstd analyze -m layout --resized-shape 2336,1632 -i table_image1.png -o std_table_image1.png

Error: Invalid value for '--resized-shape': '2336,1632' is not a valid integer.`

若如此设置：
cnstd analyze -m layout --resized-shape 2336 -i table_image1.png -o std_table_image1.png则没问题

无法准确识别长图

正文部分的文本框的位置有误

推理设置GPU时报错

报错信息为 CachedOp requires all inputs to live on the same context. But data is on gpu(0) while _mobilenetv30_first-3x3-conv-conv2d_weight is on cpu(0)

无法识别出来竖着的文字这个是要我专门训练下自己的文字搜索模型吗

例如这3个图片

densenet_lite_136-fc-onnx.zip

哪位能提供下这个模型，被墙了，下载不了，无法使用ocr

能给个训练的案例么

fresh install error FileNotFoundError: Could not find module 'D:\Program Files\anaconda3\Library\bin\geos_c.dll' (or one of its dependencies). Try using the full path with constructor syntax.

python 3.8.3
windows 10

pip install cnocr

@breezedeus




from cnstd import CnStd
from cnocr import CnOcr

std = CnStd()
cn_ocr = CnOcr()

box_infos = std.detect('output_dir/t1-Scene-002-01.jpg')

for box_info in box_infos['detected_texts']:
    cropped_img = box_info['cropped_img']
    ocr_res = cn_ocr.ocr_for_single_line(cropped_img)
    print('ocr result: %s' % str(ocr_res))


dubo@LAPTOP-I2R9VG58 MINGW64 /d/Download/audio-visual/make-compilation-videos/auto_video_splitter (master)
$ python  cnst.py
Traceback (most recent call last):
  File "cnst.py", line 3, in <module>
    from cnstd import CnStd
  File "D:\Program Files\anaconda3\lib\site-packages\cnstd\__init__.py", line 20, in <module>
    from .cn_std import CnStd
  File "D:\Program Files\anaconda3\lib\site-packages\cnstd\cn_std.py", line 32, in <module>
    from .model import gen_model
  File "D:\Program Files\anaconda3\lib\site-packages\cnstd\model\__init__.py", line 22, in <module>
    from .dbnet import gen_dbnet, DBNet
  File "D:\Program Files\anaconda3\lib\site-packages\cnstd\model\dbnet.py", line 31, in <module>
    from .base import DBPostProcessor, _DBNet
  File "D:\Program Files\anaconda3\lib\site-packages\cnstd\model\base.py", line 23, in <module>
    from shapely.geometry import Polygon
  File "D:\Program Files\anaconda3\lib\site-packages\shapely\geometry\__init__.py", line 4, in <module>
    from .base import CAP_STYLE, JOIN_STYLE
  File "D:\Program Files\anaconda3\lib\site-packages\shapely\geometry\base.py", line 19, in <module>
    from shapely.coords import CoordinateSequence
  File "D:\Program Files\anaconda3\lib\site-packages\shapely\coords.py", line 8, in <module>
    from shapely.geos import lgeos
  File "D:\Program Files\anaconda3\lib\site-packages\shapely\geos.py", line 154, in <module>
    _lgeos = CDLL(os.path.join(sys.prefix, 'Library', 'bin', 'geos_c.dll'))
  File "D:\Program Files\anaconda3\lib\ctypes\__init__.py", line 381, in __init__
    self._handle = _dlopen(self._name, mode)
FileNotFoundError: Could not find module 'D:\Program Files\anaconda3\Library\bin\geos_c.dll' (or one of its dependencies). Try using the full path with constructor syntax.
(base)

Trying to run free MFD model

I am new to running such models. In your website, there's a code snippet given to run the model.
`python
from pix2text import Pix2Text, merge_line_texts

img_fp = './docs/examples/formula.jpg'
p2t = Pix2Text(analyzer_config=dict(model_name='mfd'))
outs = p2t(img_fp, resized_shape=608)

print(outs)
only_text = merge_line_texts(outs, auto_line_break=True)`

i tried to run this code as well as installed pix2text library but got this error.

Please guide me in running the model.
By the way i am using Google colab

为什么多张图片纵向拼接后的文本检测和单张图片的文本检测差别很大

我有一个PDF转成IMAGE的图片（总3页，转成了3张图片），转换后的3张image用cnstd来分别检测文本，结果可以检测出来所有文本。
但是把这3张images纵向拼接后，再用同样的程序进行进行检测文本，只检测出来可怜的一些文本。
这会不是什么原因？

识别出来的文本框乱序

识别后的图片：

原始图片：

how to transfer cnstd and cnocr mxnet model to pytorch model ? thanks

CLI用的是0.1.0的，包用的是1.2.0的

Ubuntu上安装cnstd异常

Collecting cnstd
Downloading cnstd-0.1.1-py3-none-any.whl (50 kB)
|████████████████████████████████| 50 kB 167 kB/s
Requirement already satisfied: shapely in /usr/local/lib/python3.6/dist-packages (from cnstd) (1.7.0)
Requirement already satisfied: Polygon3 in /usr/local/lib/python3.6/dist-packages (from cnstd) (3.0.8)
Requirement already satisfied: numpy<1.20.0,>=1.14.0 in /usr/local/lib/python3.6/dist-packages (from cnstd) (1.19.1)
Requirement already satisfied: pyclipper in /usr/local/lib/python3.6/dist-packages (from cnstd) (1.2.0)
ERROR: Could not find a version that satisfies the requirement opencv-python (from cnstd) (from versions: none)
ERROR: No matching distribution found for opencv-python (from cnstd)

请问模型如何转换成onnx？

input data type error ？

版面分析的模型方便公布性能指标吗

感谢非常出色的工作, 我想利用您训练的版面分析模型, 请问方便透露在CDLA测试集上的分数吗

这个是paddleocr的版面分析的指标:
https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.7/ppstructure/docs/PP-StructureV2_introduction.md#412-%E5%9C%BA%E6%99%AF%E9%80%82%E9%85%8D

遇到了python依赖环境的问题。

我的python是3.6.8版本,操作系统是server2012，一个干干净净的虚拟环境
执行pip install cnstd 后
默认安装了opencv_python 4.2.0.34
运行一个test的脚步时，加载cv2，提示有dll缺失
手动把cv2降为了3.4.0.12版本后，出现了如下的错误
[WARNING 2020-06-04 21:28:49,308 _showwarnmsg:99] C:\Soft\source\cnstd\venv\lib\site-packages\mxnet\gluon\block.py:1389: UserWarning: Cannot decide type for the following arguments. Consider providing them as input: data: None input_sym_arg_type = in_param.infer_type()[0]
这是哪里出现了问题呢？

完整的pip list 是这样的：
Package Version

certifi 2020.4.5.1
chardet 3.0.4
click 7.1.2
cnstd 0.1.0
cycler 0.10.0
gluoncv 0.6.0
graphviz 0.8.4
idna 2.6
kiwisolver 1.2.0
matplotlib 3.2.1
mxnet 1.6.0
numpy 1.18.5
opencv-python 3.4.1.15
Pillow 7.1.2
pip 20.1.1
Polygon3 3.0.8
portalocker 1.7.0
protobuf 3.12.2
pyclipper 1.1.0.post3
pyparsing 2.4.7
python-dateutil 2.8.1
pywin32 227
requests 2.18.4
scipy 1.4.1
setuptools 47.1.1
Shapely 1.7.0
six 1.15.0
tensorboardX 2.0
tqdm 4.46.1
urllib3 1.22

此项目模型无法被yolov7加载

我使用yolov7加载这个项目提供的yolov7模型，报错如下：


Namespace(weights='layout-yolov7_tiny.pt', source='inference/images', img_size=640, conf_thres=0.25, iou_thres=0.45, device='', view_img=False, save_txt=False, save_conf=False, nosave=False, classes=None, agnostic_nms=False, augment=False, update=False, project='runs/detect', name='exp', exist_ok=False, no_trace=False)
YOLOR  2023-5-30 torch 2.1.0+cpu CPU

Traceback (most recent call last):
  File "C:\Users\10706\Desktop\LaTeX-OCR-main\yolov7-main\detect.py", line 196, in 
    detect()
  File "C:\Users\10706\Desktop\LaTeX-OCR-main\yolov7-main\detect.py", line 34, in detect
    model = attempt_load(weights, map_location=device)  # load FP32 model
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\10706\Desktop\LaTeX-OCR-main\yolov7-main\models\experimental.py", line 253, in attempt_load
    model.append(ckpt['ema' if ckpt.get('ema') else 'model'].float().fuse().eval())  # FP32 model
                 ~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyError: 'model'

我想知道这个项目的模型针对与yolov7的标准预训练模型区别在什么地方，为什么不能通用，非常感谢！

std=CnStd() 实例化时报错

[WARNING 2020-07-06 16:49:15,183 _showwarnmsg:110] /home/env/rfh_01/lib/python3.7/site-packages/mxnet/gluon/block.py:1389: UserWarning: Cannot decide type for the following arguments. Consider providing them as input:
data: None
input_sym_arg_type = in_param.infer_type()[0]

terminate called after throwing an instance of 'dmlc::Error'
what(): [16:49:15] src/ndarray/ndarray.cc:1851: Check failed: fi->Read(data): Invalid NDArray file format
Stack trace:
[bt] (0) /home/env/rfh_01/lib/python3.7/site-packages/mxnet/libmxnet.so(+0x307d3b) [0x7ff22424cd3b]
[bt] (1) /home/env/rfh_01/lib/python3.7/site-packages/mxnet/libmxnet.so(mxnet::NDArray::Load(dmlc::Stream*, std::vector<mxnet::NDArray, std::allocatormxnet::NDArray >, std::vector<std::string, std::allocatorstd::string >)+0x1d6) [0x7ff22752b146]

Aborted (core dumped)

求大佬指点

检测模型我想重新训练过，请问应该这么搞？

请问可以用自己的数据集训练吗？

加载包错误

File "D:/Users/PycharmProjects/data_analysis/ocr_detection/image_detection.py", line 10, in
from cnstd import CnStd
File "D:\Anaconda3\envs\python36\lib\site-packages\cnstd_init_.py", line 20, in
from .cn_std import CnStd
File "D:\Anaconda3\envs\python36\lib\site-packages\cnstd\cn_std.py", line 32, in
from .model import gen_model
File "D:\Anaconda3\envs\python36\lib\site-packages\cnstd\model_init_.py", line 22, in
from .dbnet import gen_dbnet, DBNet
File "D:\Anaconda3\envs\python36\lib\site-packages\cnstd\model\dbnet.py", line 31, in
from .base import DBPostProcessor, DBNet
File "D:\Anaconda3\envs\python36\lib\site-packages\cnstd\model\base.py", line 23, in
from shapely.geometry import Polygon
File "D:\Anaconda3\envs\python36\lib\site-packages\shapely\geometry_init.py", line 4, in
from .base import CAP_STYLE, JOIN_STYLE
File "D:\Anaconda3\envs\python36\lib\site-packages\shapely\geometry\base.py", line 19, in
from shapely.coords import CoordinateSequence
File "D:\Anaconda3\envs\python36\lib\site-packages\shapely\coords.py", line 8, in
from shapely.geos import lgeos
File "D:\Anaconda3\envs\python36\lib\site-packages\shapely\geos.py", line 154, in
lgeos = CDLL(os.path.join(sys.prefix, 'Library', 'bin', 'geos_c.dll'))
File "D:\Anaconda3\envs\python36\lib\ctypes_init.py", line 348, in init
self._handle = _dlopen(self._name, mode)
OSError: [WinError 126] 找不到指定的模块。

貌似安装包缺少 LoadLibrary，FreeLibrary

ARM ubunt18.04 调用mxnet出错

我在 ARM架构下的ubunt成功安装。使用时出现OSError: / 目录/python3.7/site-packages/mxnet/libmxnet.so:cannot open shared object file: No such file or directory。安的mxnet=1.6.0，文件夹下有这个libmxnet,so，也已经添加了共享库的搜素目录还是不行,不知道为啥？

layout yolov7 support

I am excited to share that I have trained a yolov7 model on the CDLA dataset, adding a small amount of self-labeled documents. I have adjusted the default training input to 1280x1280, as opposed to the standard 640. This modification has improved the model's performance on standard documents, particularly when the document's output dpi is set above 150. I am thrilled to contribute this model to your open-source community and hope it will be a valuable addition to your model zoo.