fh2019ustc / doctr Goto Github PK

The official code for “DocTr: Document Image Transformer for Geometric Unwarping and Illumination Correction”, ACM MM, Oral Paper, 2021.

License: MIT License

Python 98.04% MATLAB 1.96%

document-image-processing document-image-rectification document-unwarping ocr pytorch-implementation

doctr's People

Contributors

Stargazers

Watchers

doctr's Issues

The tail module is different from the description in the paper？

DocTr is introducing distortions. Am I doing anything wrong?

Hello there,

I am using DocTr to enhance quality of few images in my project and I am finding that DocTr is introducing distortions in the file output. Pls let me know if I am using it incorrectly.

Steps I followed.

Cloned the DocTr project to my google drive.
Copied the 3 .pth files and the image to desired location.
Ran the following command in the colab.
%cd /content/drive/MyDrive/MSProject/DocTr-cloned-repo/DocTr/
! python inference.py --ill_rec True --distorrted_path '/content/drive/MyDrive/MSProject/temp/' --isave_path '/content/drive/MyDrive/MSProject/DocTr_output_images/'

The original file that has been used is this

The image output from DocTr is this

Comparison for ease of reference

Please let me know if this is a desired behavior or am I doing anything wrong

Request your immediate response, as I have to conclude my research and submit my project as a part of my MS program.

Thank you in advance

Some questions about convex upsample in GeoTr

I'm a little confused about the use of coords0, coords1, and mask.
Why not just simply used upsample or ConvTranspose

DocTr/GeoTr.py

Lines 226 to 233 in 729fcb8

 # convex upsample baesd on fmap 

 coodslar, coords0, coords1 = self.initialize_flow(image1) 

 coords1 = coords1.detach() 

 mask, coords1 = self.update_block(fmap, coords1) 

 flow_up = self.upsample_flow(coords1 - coords0, mask) 

 bm_up = coodslar + flow_up 

 return bm_up

how to convert the pretrained model to onnx? How to use such onnx model?

Training code

Hello, can I train the model by myself with this code please?

Hi, splendid work you've done! I've been considering to reproduce it but I don't know what kind of hardware conditions are recommended to successfully reproduced the training of Geo and Illu network, so I wish to know the GPU memory consumed for your training process. Is it 4 1080ti GPUs that you've used? Tks. I really need to know this!

Training Code

Hi，
when will you release the training code?
Thank you in advance for any help you can provide.

AssertionError: Torch not compiled with CUDA enabled

Traceback (most recent call last):
File "E:\jiaozheng\DocTr-main\inference.py", line 138, in
main()
File "E:\jiaozheng\DocTr-main\inference.py", line 134, in main
rec(opt)
File "E:\jiaozheng\DocTr-main\inference.py", line 76, in rec
GeoTr_Seg_model = GeoTr_Seg().cuda()
File "E:\jiaozheng\DocTr-main\venv\lib\site-packages\torch\nn\modules\module.py", line 905, in cuda
return self._apply(lambda t: t.cuda(device))
File "E:\jiaozheng\DocTr-main\venv\lib\site-packages\torch\nn\modules\module.py", line 797, in _apply
module._apply(fn)
File "E:\jiaozheng\DocTr-main\venv\lib\site-packages\torch\nn\modules\module.py", line 797, in _apply
module._apply(fn)
File "E:\jiaozheng\DocTr-main\venv\lib\site-packages\torch\nn\modules\module.py", line 797, in _apply
module._apply(fn)
[Previous line repeated 1 more time]
File "E:\jiaozheng\DocTr-main\venv\lib\site-packages\torch\nn\modules\module.py", line 820, in _apply
param_applied = fn(param)
File "E:\jiaozheng\DocTr-main\venv\lib\site-packages\torch\nn\modules\module.py", line 905, in
return self.apply(lambda t: t.cuda(device))
File "E:\jiaozheng\DocTr-main\venv\lib\site-packages\torch\cuda_init.py", line 239, in _lazy_init
raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled

Requirements cannot be installed

Hi,
I tried multiple times with different versions of python to install the requirements, but could not because an internal error. Can you please confirm that all the packages are present with the correct version ? Also, with which versions of Python did you work ?

 RUN pip install --no-cache-dir -r ./DocTr/requirements.txt
Collecting numpy==1.19.0 (from -r ./DocTr/requirements.txt (line 1))
  Downloading numpy-1.19.0.zip (7.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7.3/7.3 MB 39.2 MB/s eta 0:00:00
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'error'
  error: subprocess-exited-with-error
  
  × Preparing metadata (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [54 lines of output]
      Running from numpy source directory.
      <string>:460: UserWarning: Unrecognized setuptools command, proceeding with generating Cython sources and expanding templates
      /tmp/pip-install-wolm3m1r/numpy_c4a1088a8ada41f3a3061abbffdf7ccc/tools/cythonize.py:73: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
        required_version = LooseVersion('0.29.14')
      /tmp/pip-install-wolm3m1r/numpy_c4a1088a8ada41f3a3061abbffdf7ccc/tools/cythonize.py:75: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
        if LooseVersion(cython_version) < required_version:
      
      Error compiling Cython file:
      ------------------------------------------------------------
      ...
          cdef sfc64_state rng_state
      
          def __init__(self, seed=None):
              BitGenerator.__init__(self, seed)
              self._bitgen.state = <void *>&self.rng_state
              self._bitgen.next_uint64 = &sfc64_uint64
                                         ^
      ------------------------------------------------------------
      
      _sfc64.pyx:90:35: Cannot assign type 'uint64_t (*)(void *) except? -1 nogil' to 'uint64_t (*)(void *) noexcept nogil'. Exception values are incompatible. Suggest adding 'noexcept' to type 'uint64_t (void *) except? -1 nogil'.
      Processing numpy/random/_bounded_integers.pxd.in
      Processing numpy/random/_sfc64.pyx
      Traceback (most recent call last):
        File "/tmp/pip-install-wolm3m1r/numpy_c4a1088a8ada41f3a3061abbffdf7ccc/tools/cythonize.py", line 235, in <module>
          main()
        File "/tmp/pip-install-wolm3m1r/numpy_c4a1088a8ada41f3a3061abbffdf7ccc/tools/cythonize.py", line 231, in main
          find_process_files(root_dir)
        File "/tmp/pip-install-wolm3m1r/numpy_c4a1088a8ada41f3a3061abbffdf7ccc/tools/cythonize.py", line 222, in find_process_files
          process(root_dir, fromfile, tofile, function, hash_db)
        File "/tmp/pip-install-wolm3m1r/numpy_c4a1088a8ada41f3a3061abbffdf7ccc/tools/cythonize.py", line 188, in process
          processor_function(fromfile, tofile)
        File "/tmp/pip-install-wolm3m1r/numpy_c4a1088a8ada41f3a3061abbffdf7ccc/tools/cythonize.py", line 77, in process_pyx
          subprocess.check_call(
        File "/usr/local/lib/python3.9/subprocess.py", line 373, in check_call
          raise CalledProcessError(retcode, cmd)
      subprocess.CalledProcessError: Command '['/usr/local/bin/python', '-m', 'cython', '-3', '--fast-fail', '-o', '_sfc64.c', '_sfc64.pyx']' returned non-zero exit status 1.
      Cythonizing sources
      Traceback (most recent call last):
        File "/usr/local/lib/python3.9/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
          main()
        File "/usr/local/lib/python3.9/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
          json_out['return_val'] = hook(**hook_input['kwargs'])
        File "/usr/local/lib/python3.9/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 149, in prepare_metadata_for_build_wheel
          return hook(metadata_directory, config_settings)
        File "/tmp/pip-build-env-246aljwk/overlay/lib/python3.9/site-packages/setuptools/build_meta.py", line 396, in prepare_metadata_for_build_wheel
          self.run_setup()
        File "/tmp/pip-build-env-246aljwk/overlay/lib/python3.9/site-packages/setuptools/build_meta.py", line 507, in run_setup
          super(_BuildMetaLegacyBackend, self).run_setup(setup_script=setup_script)
        File "/tmp/pip-build-env-246aljwk/overlay/lib/python3.9/site-packages/setuptools/build_meta.py", line 341, in run_setup
          exec(code, locals())
        File "<string>", line 489, in <module>
        File "<string>", line 469, in setup_package
        File "<string>", line 274, in generate_cython
      RuntimeError: Running cythonize failed!
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

Thank you

what does" bm = (2 * (bm / 286.8) - 1) * 0.99" in inference.py mean?

training code

Hi，thanks for your great work, and when will you release the training code?

Does the codes support GPU?

Thanks for your great work first. When testing, it seems the process is by using the CPU. Can you use the GPU power when run inference_ill.py?

About the OCR engine that you use, three questions need your help

Q1: Hello, in section5.1 of your paper, I notice you used Pytesseract V3.02.02, as shown in the above picture ↑
But on the homepage of pytesseract, I only find the version of 0.3.~ or 0.2.~， could you please tell me the detailed version you use. By the way, in the paper of DewarpNet, they specify the Pytesseract on version 0.2.9. Are there big differences caused by the version of OCR engine?

Q2: For the calculation of CER metric, it needs the ground true of each character in images, I also notice your repository provides 60 images index for OCR metric test, while the DewarpNet provided 25 images index as well as ground true in JSON form. Can you tell me how do you annotate the ground true? And if possible, can you share your ground true file?

In addition, I also noticed 25 ground trues in DewarpNet have several label errors, I guess they also use some OCR metric. If you also use OCR engine to label the ground true, can your some me more details about how do you annotate?

Q3: In fact, I also try to test the OCR performance over your model output. However, neither Pytesseract version 0.3.~ nor 0.2.~ achieve the same result in paper.
Here is my OCR test code:

from PIL import Image
import pytesseract

import json
import os
from os.path import join as pjoin
from pathlib import Path
import numpy as np


def edit_distance(str1, str2):
    """计算两个字符串之间的编辑距离。
    Args:
        str1: 字符串1。
        str2: 字符串2。
    Returns:
        dist: 编辑距离。
    """
    matrix = [[i + j for j in range(len(str2) + 1)] for i in range(len(str1) + 1)]
    for i in range(1, len(str1) + 1):
        for j in range(1, len(str2) + 1):
            if str1[i - 1] == str2[j - 1]:
                d = 0
            else:
                d = 1
            matrix[i][j] = min(matrix[i - 1][j] + 1, matrix[i][j - 1] + 1, matrix[i - 1][j - 1] + d)
    dist = matrix[len(str1)][len(str2)]
    return dist



def get_cer(src, trg):
    """把源字符串src修改成目标字符串trg的字符错误率。
    Args:
        src: 源字符串。
        trg: 目标字符串。
    Returns:
        cer: 字符错误率。
    """
    dist = edit_distance(src, trg)
    cer = dist / len(trg)
    return cer

if __name__ == "__main__":
    reference_list=[]
    reference_index=[] 
    img_dirList=[] 
    cer_list=[]  
    r_path = pjoin('./doctr/')
    reslut_file = open('result1.log', 'w')
    print(pytesseract.get_languages(config=''))
    with open('ocr_files.txt','r') as fr:	
        for l,line in enumerate(fr):
            reference_list.append(line)
            reference_index.append(l)
            print(len(line),line)
            print(len(line),line,file=reslut_file)
            h1str="./doctr/"+line[7:-1]+"_1 copy.png"
            h2str="./doctr/"+line[7:-1]+"_2 copy.png"
            print(h1str,h2str)
            h1=pytesseract.image_to_string(Image.open(h1str),lang='eng')
            h2=pytesseract.image_to_string(Image.open(h2str),lang='eng')

            with open('tess_gt.json','r') as file:
                str = file.read()
                r = json.loads(str).get(line[:-1])
            cer_value1=get_cer(h1, r)
            cer_value2=get_cer(h2, r)
            print(cer_value1,cer_value2)
            print(cer_value1,cer_value2,file=reslut_file)
            cer_list.append(cer_value1)
            cer_list.append(cer_value2)
    
    print(np.mean(cer_list)) 
    print(np.mean(cer_list),file=reslut_file)
    reslut_file.close()

In brief, the core code for OCR is h1=pytesseract.image_to_string(Image.open(h1str),lang='eng') , with which I only get CER of 0.6. This result is far away from 0.2~0.3 CER as previous models.

Could you share your OCR version and code for the OCR metric? Many thanks for your generous response!

When will the code be uploaded ?

I want to run the pre-training model to know its effect.

About the Doc3d dataset.

Who has the method to get the Doc3d dataset? I has not received a reply after the application.

DRIC dataset

Hi, "The DRIC dataset [18] consists of 2700 distorted document images, each at 2400 × 1800 resolution. For each distorted
document image, there are corresponding backward mapping map
and scanned PDF image. " But I cannot find the scanned PDF images of DRIC dataset?

illtr train

@fh2019ustc 有几个问题请教一下
1）已经实现geotr，CER指标比repo低，但是SSIM、LD比repo差，这有可能是什么导致的？
2）现在准备复现illtr，docproj数据如何获取矫正后的图片，这个issue提到使用resampling.rectification，但是运行后并不能获取矫正图片
3）下载 ground truth scanned image只有550张，但是img、flow文件夹有2750张，那只用000_0.png矫正的图片训练illtr吗
4）论文中有crop为128*128，overlap=12.5%，但是detail中写到randomly crop，请问这部分是如何randomly crop的

为什么推理结果图和网盘图片不一样

使用百度网盘模型，运行inference.py得到几何矫正结果，为什么和网盘中Rectified_DocUNet_DocTr结果评估结果不一致，而且读取后图片元素也不相等

Several pages on one image?

Hello! Thank you so much for your excellent work! I've got the following question: what is the best way to split one image with several pages (say, leaf notes or book opening) into smaller images containing each page separately in order to use your code on the later step? Thank you very much in advance!

How to do light correction only

How to do light correction only？

About training

Hello, 1: when are you going to open source training? 2: How fast is your plan? 3: What is used in the document detection of your scheme and how is the effect

Do you have a plan to release the training code of this paper?

Hi @fh2019ustc ,
Thanks you for great repo!
Do you have a plan to release the training code of this paper?
Hope to see your response, thanks in advance!

pls add environment in README

Hi, Thank you for sharing your work.
I wanna try the inference.py, but I'm getting in stuck with setting.
could you please share the environment or the requirements.txt? (CUDA Version, pytorch version, other packages .. etc)

how to generate backward mapping

If I want to train the model with my own data, is there any way to generate the backward mapping?

The demo site is not working properly

Runtime error
failed to create containerd task: failed to create shim task: context canceled: unknown
Container logs:

===== Application Startup at 2023-11-16 13:17:28 =====

Caching examples at: '/home/user/app/gradio_cached_examples/12'

four question about GeoTr.py

hello hao,
I have read your pioneering work on using ViT for document unwarping.
In the paper, the design of the geometry tail is marvellous. which proposed a learnable module to perform
upsampling on the decoded features $f_{d}$.
As shown in the following figure:

for the part. I have four questions:
Q1: Based on my understanding, such a design is essentially the local dot product of two features map(i mean $f_{o}$ and $f_m$ ). Do I understand correctly? I feel this design is un-imaginable for me.
So, I wonder what is your motivation for this tail design? Is there any similar design in other reference papers?

Q2: in the following codeblock, why the flow need to multiply by 8

DocTr/GeoTr.py

Line 211 in bbb1af9

up_flow = F.unfold(8 * flow, [3, 3], padding=1)

Q3: in the following codeblock, why the mask need to be operated by softmax ? Is the softmax operation have some special significance here?

DocTr/GeoTr.py

Line 209 in bbb1af9

mask = torch.softmax(mask, dim=2)

Q4: in the following codeblock, why coodslar should be added to pred backward mapping? Is this operation important? My guess is that the operation here is similar to a kind of position encoding. But here is the final operation in the network, why don't you add this position encoding to previous layer?

DocTr/GeoTr.py

Line 231 in bbb1af9

bm_up = coodslar + flow_up

many thanks to your explanation.

Best wishes,
Weiguang Zhang

training code

Great work. So do u have the schedule to realease the traing code？

Rectified Images

thank you for your work! I have a question about the rectified images you gave in GoogleDrive or Baidu Cloud, are they rectified by your illumination rectification model?

	# convex upsample baesd on fmap
	coodslar, coords0, coords1 = self.initialize_flow(image1)
	coords1 = coords1.detach()
	mask, coords1 = self.update_block(fmap, coords1)
	flow_up = self.upsample_flow(coords1 - coords0, mask)
	bm_up = coodslar + flow_up

	return bm_up

fh2019ustc / doctr Goto Github PK

doctr's People

Contributors

Stargazers

Watchers

Forkers

doctr's Issues

Recommend Projects

Recommend Topics

Recommend Org