Giter Club home page Giter Club logo

breezedeus / cnstd Goto Github PK

View Code? Open in Web Editor NEW
665.0 14.0 103.0 25.37 MB

CnSTD: 基于 PyTorch/MXNet 的 中文/英文 场景文字检测(Scene Text Detection)、数学公式检测(Mathematical Formula Detection, MFD)、篇章分析(Layout Analysis)的Python3 包

Home Page: https://www.breezedeus.com/article/cnocr

License: Apache License 2.0

Python 99.68% Makefile 0.32%
object-detection pytorch text-detection deep-learning math-formula-detection ocr python scene-text-detection

cnstd's People

Contributors

breezedeus avatar vasissualiyp avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cnstd's Issues

🔥得到的box中,weight和height有时是颠倒的

lala lala2

我发现得到的box中,weight和height有时是颠倒的,我只是把输出的box画在图上,未做任何调整,出现颠倒的概率还不低,我测过许多图片,都会出现这种情况。是我用法不对么?(左图直接画框,右图是经过自己的判断画的框)

from cnstd import CnStd
import cv2

std = CnStd()
img_path = 'imgs/4.jpg'
img = cv2.imread(img_path)
box_infos = std.detect(img_path)

for box_info in box_infos['detected_texts']:
    box = box_info['box']
    center_x = box[0]
    center_y = box[1]
    w = box[2]  
    h = box[3]

    first_point = (int(center_x - w / 2), int(center_y - h / 2))
    last_point = (int(center_x + w / 2), int(center_y + h / 2))
    cv2.rectangle(img, first_point, last_point, (0, 255, 0), 2)
cv2.imwrite('test.jpg', img)

windows下使用cnstd evaluate -i examples/taobao.jpg -o outputs报错

Traceback (most recent call last):
File "d:\anaconda3\envs\ocr\lib\runpy.py", line 193, in run_module_as_main
"main", mod_spec)
File "d:\anaconda3\envs\ocr\lib\runpy.py", line 85, in run_code
exec(code, run_globals)
File "D:\Anaconda3\envs\ocr\Scripts\cnstd.exe_main
.py", line 4, in
File "d:\anaconda3\envs\ocr\lib\site-packages\cnstd\cli.py", line 23, in
from .train import train
File "d:\anaconda3\envs\ocr\lib\site-packages\cnstd\train.py", line 28, in
from .datasets.dataloader import StdDataset
File "d:\anaconda3\envs\ocr\lib\site-packages\cnstd\datasets\dataloader.py", line 28, in
from .util import parse_lines, process_data
File "d:\anaconda3\envs\ocr\lib\site-packages\cnstd\datasets\util.py", line 22, in
from shapely.geometry import Polygon
File "d:\anaconda3\envs\ocr\lib\site-packages\shapely\geometry_init
.py", line 4, in
from .base import CAP_STYLE, JOIN_STYLE
File "d:\anaconda3\envs\ocr\lib\site-packages\shapely\geometry\base.py", line 18, in
from shapely.coords import CoordinateSequence
File "d:\anaconda3\envs\ocr\lib\site-packages\shapely\coords.py", line 8, in
from shapely.geos import lgeos
File "d:\anaconda3\envs\ocr\lib\site-packages\shapely\geos.py", line 145, in
lgeos = CDLL(os.path.join(sys.prefix, 'Library', 'bin', 'geos_c.dll'))
File "d:\anaconda3\envs\ocr\lib\ctypes_init
.py", line 348, in init
self._handle = _dlopen(self._name, mode)
OSError: [WinError 126] 找不到指定的模块。
在cmd中报错命令如上所示。但在pycharm的终端界面可正常运行

README.md 文档上有个错误

from cnstd import CnStd
from cnocr import CnOcr

std = CnStd()
cn_ocr = CnOcr()

box_infos = std.detect('examples/taobao.jpg')

for box_info in box_infos['detected_texts']:
cropped_img = box_info['cropped_img']
ocr_res = cn_ocr.ocr_for_single_line(cropped_img)
print('ocr result: %s' % str(ocr_out))

ocr_out变量没有 和pring的时候应该是 使用的 ocr_res 变量吧

Potential bugs? Hard code of the context

self._model_backend = 'onnx' at here
It seems that the det_model_backend setting will be overrided. Even if you set det_model_backend='pytorch', it still going to use the 'onnx'.

连续调用时会消耗越来越多的内存

memory_profiler 打了下内存消耗:

   169   3045.3 MiB      0.0 MiB           h, w, _ = resize_img.shape
   170   3054.2 MiB      8.9 MiB           resize_img = normalize_img_array(resize_img)
   171   3054.2 MiB      0.0 MiB           im_res = mx.nd.array(resize_img)
   172   3054.5 MiB      0.3 MiB           im_res = self._trans(im_res)
   173
   174   3054.7 MiB      0.2 MiB           t1 = time.time()
   175   3064.7 MiB     10.0 MiB           seg_maps = self._model(im_res.expand_dims(axis=0).as_in_context(self._context))
   176   4056.1 MiB    991.4 MiB           mx.nd.waitall()
   177   4056.1 MiB      0.0 MiB           seg_maps = seg_maps.asnumpy()
   178   4056.1 MiB      0.0 MiB           t2 = time.time()
   179   4056.1 MiB      0.0 MiB           boxes, scores, rects = detect_pse(
   180   4056.1 MiB      0.0 MiB               seg_maps,
   181   4056.1 MiB      0.0 MiB               threshold=pse_threshold,
   182   4056.1 MiB      0.0 MiB               threshold_k=pse_threshold,

发现是 mx.nd.waitall()每预测一张图片就会消耗不少内存,而这些内存好像没完全被释放,导致内存消耗越来越大。

网上搜了下,好像是MXNet的一个未解问题: Memory leak when running cpu inference - Gluon - MXNet Forum

detect()传入**kwargs时显示类型错误

代码
kwargs = {'height_border':0.05,'width_border':0.05}
box_info_list = std.detect(img,**kwargs)

运行报错
TypeError: detect() got an unexpected keyword argument 'height_border'

使用django做了一个接口,使用了cnocr,但并行调用一段时间后mxnet报了如下错误,是模型中尺寸不匹配?

terminate called after throwing an instance of 'dmlc::Error'
what(): [16:24:07] src/operator/contrib/./../elemwise_op_common.h:135: Check failed: assign(&dattr, vec.at(i)): Incompatible attr in node at 0-th output: expected [1,1,32,115], got [1,1,32,131]
Stack trace:
[bt] (0) /home/user/anaconda3/envs/summer/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x307d3b) [0x7f50ccd31d3b]
[bt] (1) /home/user/anaconda3/envs/summer/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x4b4cb9) [0x7f50ccedecb9]
[bt] (2) /home/user/anaconda3/envs/summer/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x4b5572) [0x7f50ccedf572]
[bt] (3) /home/user/anaconda3/envs/summer/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x60f04c) [0x7f50cd03904c]
[bt] (4) /home/user/anaconda3/envs/summer/lib/python3.6/site-packages/mxnet/libmxnet.so(mxnet::imperative::SetShapeType(mxnet::Context const&, nnvm::NodeAttrs const&, std::vector<mxnet::NDArray*, std::allocatormxnet::NDArray* > const&, std::vector<mxnet::NDArray*, std::allocatormxnet::NDArray* > const&, mxnet::DispatchMode*)+0x1d27) [0x7f50cfecbaa7]

文字矫正

大佬,我想问下,你使用的模型是pse,里面有对pse的检测结果进行矫正???还是直接用pse的检测结果?

layout模型预测时能否单独设置--resized-shape

cnstd analyze 参数 --resized-shape 只能输入一个32倍数的整数,shape的height和width只能是一个整数,能否设置不同的值
`cnstd analyze -m layout --resized-shape 2336,1632 -i table_image1.png -o std_table_image1.png

Error: Invalid value for '--resized-shape': '2336,1632' is not a valid integer.`

若如此设置:
cnstd analyze -m layout --resized-shape 2336 -i table_image1.png -o std_table_image1.png则没问题

推理设置GPU时报错

报错信息为 CachedOp requires all inputs to live on the same context. But data is on gpu(0) while _mobilenetv30_first-3x3-conv-conv2d_weight is on cpu(0)

fresh install error FileNotFoundError: Could not find module 'D:\Program Files\anaconda3\Library\bin\geos_c.dll' (or one of its dependencies). Try using the full path with constructor syntax.

python 3.8.3
windows 10

pip install cnocr

@breezedeus




from cnstd import CnStd
from cnocr import CnOcr

std = CnStd()
cn_ocr = CnOcr()

box_infos = std.detect('output_dir/t1-Scene-002-01.jpg')

for box_info in box_infos['detected_texts']:
    cropped_img = box_info['cropped_img']
    ocr_res = cn_ocr.ocr_for_single_line(cropped_img)
    print('ocr result: %s' % str(ocr_res))


dubo@LAPTOP-I2R9VG58 MINGW64 /d/Download/audio-visual/make-compilation-videos/auto_video_splitter (master)
$ python  cnst.py
Traceback (most recent call last):
  File "cnst.py", line 3, in <module>
    from cnstd import CnStd
  File "D:\Program Files\anaconda3\lib\site-packages\cnstd\__init__.py", line 20, in <module>
    from .cn_std import CnStd
  File "D:\Program Files\anaconda3\lib\site-packages\cnstd\cn_std.py", line 32, in <module>
    from .model import gen_model
  File "D:\Program Files\anaconda3\lib\site-packages\cnstd\model\__init__.py", line 22, in <module>
    from .dbnet import gen_dbnet, DBNet
  File "D:\Program Files\anaconda3\lib\site-packages\cnstd\model\dbnet.py", line 31, in <module>
    from .base import DBPostProcessor, _DBNet
  File "D:\Program Files\anaconda3\lib\site-packages\cnstd\model\base.py", line 23, in <module>
    from shapely.geometry import Polygon
  File "D:\Program Files\anaconda3\lib\site-packages\shapely\geometry\__init__.py", line 4, in <module>
    from .base import CAP_STYLE, JOIN_STYLE
  File "D:\Program Files\anaconda3\lib\site-packages\shapely\geometry\base.py", line 19, in <module>
    from shapely.coords import CoordinateSequence
  File "D:\Program Files\anaconda3\lib\site-packages\shapely\coords.py", line 8, in <module>
    from shapely.geos import lgeos
  File "D:\Program Files\anaconda3\lib\site-packages\shapely\geos.py", line 154, in <module>
    _lgeos = CDLL(os.path.join(sys.prefix, 'Library', 'bin', 'geos_c.dll'))
  File "D:\Program Files\anaconda3\lib\ctypes\__init__.py", line 381, in __init__
    self._handle = _dlopen(self._name, mode)
FileNotFoundError: Could not find module 'D:\Program Files\anaconda3\Library\bin\geos_c.dll' (or one of its dependencies). Try using the full path with constructor syntax.
(base) 

Trying to run free MFD model

I am new to running such models. In your website, there's a code snippet given to run the model.
`python
from pix2text import Pix2Text, merge_line_texts

img_fp = './docs/examples/formula.jpg'
p2t = Pix2Text(analyzer_config=dict(model_name='mfd'))
outs = p2t(img_fp, resized_shape=608)

print(outs)
only_text = merge_line_texts(outs, auto_line_break=True)`

i tried to run this code as well as installed pix2text library but got this error.
Screenshot_23-7-2024_234356_colab research google com
Please guide me in running the model.
By the way i am using Google colab

Ubuntu上安装cnstd异常

Collecting cnstd
Downloading cnstd-0.1.1-py3-none-any.whl (50 kB)
|████████████████████████████████| 50 kB 167 kB/s
Requirement already satisfied: shapely in /usr/local/lib/python3.6/dist-packages (from cnstd) (1.7.0)
Requirement already satisfied: Polygon3 in /usr/local/lib/python3.6/dist-packages (from cnstd) (3.0.8)
Requirement already satisfied: numpy<1.20.0,>=1.14.0 in /usr/local/lib/python3.6/dist-packages (from cnstd) (1.19.1)
Requirement already satisfied: pyclipper in /usr/local/lib/python3.6/dist-packages (from cnstd) (1.2.0)
ERROR: Could not find a version that satisfies the requirement opencv-python (from cnstd) (from versions: none)
ERROR: No matching distribution found for opencv-python (from cnstd)

遇到了python依赖环境的问题。

我的python是3.6.8版本,操作系统是server2012,一个干干净净的虚拟环境
执行pip install cnstd 后
默认安装了opencv_python 4.2.0.34
运行一个test的脚步时,加载cv2,提示有dll缺失
手动把cv2降为了3.4.0.12版本后,出现了如下的错误
[WARNING 2020-06-04 21:28:49,308 _showwarnmsg:99] C:\Soft\source\cnstd\venv\lib\site-packages\mxnet\gluon\block.py:1389: UserWarning: Cannot decide type for the following arguments. Consider providing them as input: data: None input_sym_arg_type = in_param.infer_type()[0]
这是哪里出现了问题呢?

完整的pip list 是这样的:
Package Version


certifi 2020.4.5.1
chardet 3.0.4
click 7.1.2
cnstd 0.1.0
cycler 0.10.0
gluoncv 0.6.0
graphviz 0.8.4
idna 2.6
kiwisolver 1.2.0
matplotlib 3.2.1
mxnet 1.6.0
numpy 1.18.5
opencv-python 3.4.1.15
Pillow 7.1.2
pip 20.1.1
Polygon3 3.0.8
portalocker 1.7.0
protobuf 3.12.2
pyclipper 1.1.0.post3
pyparsing 2.4.7
python-dateutil 2.8.1
pywin32 227
requests 2.18.4
scipy 1.4.1
setuptools 47.1.1
Shapely 1.7.0
six 1.15.0
tensorboardX 2.0
tqdm 4.46.1
urllib3 1.22

此项目模型无法被yolov7加载

我使用yolov7加载这个项目提供的yolov7模型,报错如下:


Namespace(weights='layout-yolov7_tiny.pt', source='inference/images', img_size=640, conf_thres=0.25, iou_thres=0.45, device='', view_img=False, save_txt=False, save_conf=False, nosave=False, classes=None, agnostic_nms=False, augment=False, update=False, project='runs/detect', name='exp', exist_ok=False, no_trace=False)
YOLOR  2023-5-30 torch 2.1.0+cpu CPU

Traceback (most recent call last):
  File "C:\Users\10706\Desktop\LaTeX-OCR-main\yolov7-main\detect.py", line 196, in 
    detect()
  File "C:\Users\10706\Desktop\LaTeX-OCR-main\yolov7-main\detect.py", line 34, in detect
    model = attempt_load(weights, map_location=device)  # load FP32 model
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\10706\Desktop\LaTeX-OCR-main\yolov7-main\models\experimental.py", line 253, in attempt_load
    model.append(ckpt['ema' if ckpt.get('ema') else 'model'].float().fuse().eval())  # FP32 model
                 ~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyError: 'model'

我想知道这个项目的模型针对与yolov7的标准预训练模型区别在什么地方,为什么不能通用,非常感谢!

std=CnStd() 实例化时报错

[WARNING 2020-07-06 16:49:15,183 _showwarnmsg:110] /home/env/rfh_01/lib/python3.7/site-packages/mxnet/gluon/block.py:1389: UserWarning: Cannot decide type for the following arguments. Consider providing them as input:
data: None
input_sym_arg_type = in_param.infer_type()[0]

terminate called after throwing an instance of 'dmlc::Error'
what(): [16:49:15] src/ndarray/ndarray.cc:1851: Check failed: fi->Read(data): Invalid NDArray file format
Stack trace:
[bt] (0) /home/env/rfh_01/lib/python3.7/site-packages/mxnet/libmxnet.so(+0x307d3b) [0x7ff22424cd3b]
[bt] (1) /home/env/rfh_01/lib/python3.7/site-packages/mxnet/libmxnet.so(mxnet::NDArray::Load(dmlc::Stream*, std::vector<mxnet::NDArray, std::allocatormxnet::NDArray >, std::vector<std::string, std::allocatorstd::string >)+0x1d6) [0x7ff22752b146]

Aborted (core dumped)

求大佬指点

加载包错误

File "D:/Users/PycharmProjects/data_analysis/ocr_detection/image_detection.py", line 10, in
from cnstd import CnStd
File "D:\Anaconda3\envs\python36\lib\site-packages\cnstd_init_.py", line 20, in
from .cn_std import CnStd
File "D:\Anaconda3\envs\python36\lib\site-packages\cnstd\cn_std.py", line 32, in
from .model import gen_model
File "D:\Anaconda3\envs\python36\lib\site-packages\cnstd\model_init_.py", line 22, in
from .dbnet import gen_dbnet, DBNet
File "D:\Anaconda3\envs\python36\lib\site-packages\cnstd\model\dbnet.py", line 31, in
from .base import DBPostProcessor, DBNet
File "D:\Anaconda3\envs\python36\lib\site-packages\cnstd\model\base.py", line 23, in
from shapely.geometry import Polygon
File "D:\Anaconda3\envs\python36\lib\site-packages\shapely\geometry_init
.py", line 4, in
from .base import CAP_STYLE, JOIN_STYLE
File "D:\Anaconda3\envs\python36\lib\site-packages\shapely\geometry\base.py", line 19, in
from shapely.coords import CoordinateSequence
File "D:\Anaconda3\envs\python36\lib\site-packages\shapely\coords.py", line 8, in
from shapely.geos import lgeos
File "D:\Anaconda3\envs\python36\lib\site-packages\shapely\geos.py", line 154, in
lgeos = CDLL(os.path.join(sys.prefix, 'Library', 'bin', 'geos_c.dll'))
File "D:\Anaconda3\envs\python36\lib\ctypes_init
.py", line 348, in init
self._handle = _dlopen(self._name, mode)
OSError: [WinError 126] 找不到指定的模块。

貌似安装包缺少 LoadLibrary,FreeLibrary

ARM ubunt18.04 调用mxnet出错

我在 ARM架构下的ubunt成功安装。使用时出现OSError: / 目录/python3.7/site-packages/mxnet/libmxnet.so:cannot open shared object file: No such file or directory。安的mxnet=1.6.0,文件夹下有这个libmxnet,so,也已经添加了共享库的搜素目录还是不行,不知道为啥?

layout yolov7 support

I am excited to share that I have trained a yolov7 model on the CDLA dataset, adding a small amount of self-labeled documents. I have adjusted the default training input to 1280x1280, as opposed to the standard 640. This modification has improved the model's performance on standard documents, particularly when the document's output dpi is set above 150. I am thrilled to contribute this model to your open-source community and hope it will be a valuable addition to your model zoo.

垂直排版的文字旋转之后无法在CnOcr中进行识别

比如这个图片中的文字
image

CnStd返回的cropped image为

image

返回的结果没法在CnOcr使用。
请问这个有什么比较好的解决方法吗,可否在返回的dict里面添加旋转相关的信息?

另外感谢作者付出的心血!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.