Giter Club home page Giter Club logo

doctr's People

Contributors

fh2019ustc avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

doctr's Issues

DocTr is introducing distortions. Am I doing anything wrong?

Hello there,

I am using DocTr to enhance quality of few images in my project and I am finding that DocTr is introducing distortions in the file output. Pls let me know if I am using it incorrectly.

Steps I followed.

  1. Cloned the DocTr project to my google drive.
  2. Copied the 3 .pth files and the image to desired location.
  3. Ran the following command in the colab.
    %cd /content/drive/MyDrive/MSProject/DocTr-cloned-repo/DocTr/
    ! python inference.py --ill_rec True --distorrted_path '/content/drive/MyDrive/MSProject/temp/' --isave_path '/content/drive/MyDrive/MSProject/DocTr_output_images/'

The original file that has been used is this
image

The image output from DocTr is this
image

Comparison for ease of reference
Untitled_1

Please let me know if this is a desired behavior or am I doing anything wrong

Request your immediate response, as I have to conclude my research and submit my project as a part of my MS program.

Thank you in advance

Some questions about convex upsample in GeoTr

I'm a little confused about the use of coords0, coords1, and mask.
Why not just simply used upsample or ConvTranspose

DocTr/GeoTr.py

Lines 226 to 233 in 729fcb8

# convex upsample baesd on fmap
coodslar, coords0, coords1 = self.initialize_flow(image1)
coords1 = coords1.detach()
mask, coords1 = self.update_block(fmap, coords1)
flow_up = self.upsample_flow(coords1 - coords0, mask)
bm_up = coodslar + flow_up
return bm_up

Training code

Hello, can I train the model by myself with this code please?

GPU memory for training

Hi, splendid work you've done! I've been considering to reproduce it but I don't know what kind of hardware conditions are recommended to successfully reproduced the training of Geo and Illu network, so I wish to know the GPU memory consumed for your training process. Is it 4 1080ti GPUs that you've used? Tks. I really need to know this!

Training Code

Hi,
when will you release the training code?
Thank you in advance for any help you can provide.

AssertionError: Torch not compiled with CUDA enabled

Traceback (most recent call last):
File "E:\jiaozheng\DocTr-main\inference.py", line 138, in
main()
File "E:\jiaozheng\DocTr-main\inference.py", line 134, in main
rec(opt)
File "E:\jiaozheng\DocTr-main\inference.py", line 76, in rec
GeoTr_Seg_model = GeoTr_Seg().cuda()
File "E:\jiaozheng\DocTr-main\venv\lib\site-packages\torch\nn\modules\module.py", line 905, in cuda
return self._apply(lambda t: t.cuda(device))
File "E:\jiaozheng\DocTr-main\venv\lib\site-packages\torch\nn\modules\module.py", line 797, in _apply
module._apply(fn)
File "E:\jiaozheng\DocTr-main\venv\lib\site-packages\torch\nn\modules\module.py", line 797, in _apply
module._apply(fn)
File "E:\jiaozheng\DocTr-main\venv\lib\site-packages\torch\nn\modules\module.py", line 797, in _apply
module._apply(fn)
[Previous line repeated 1 more time]
File "E:\jiaozheng\DocTr-main\venv\lib\site-packages\torch\nn\modules\module.py", line 820, in _apply
param_applied = fn(param)
File "E:\jiaozheng\DocTr-main\venv\lib\site-packages\torch\nn\modules\module.py", line 905, in
return self.apply(lambda t: t.cuda(device))
File "E:\jiaozheng\DocTr-main\venv\lib\site-packages\torch\cuda_init
.py", line 239, in _lazy_init
raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled

Requirements cannot be installed

Hi,
I tried multiple times with different versions of python to install the requirements, but could not because an internal error. Can you please confirm that all the packages are present with the correct version ? Also, with which versions of Python did you work ?

 RUN pip install --no-cache-dir -r ./DocTr/requirements.txt
Collecting numpy==1.19.0 (from -r ./DocTr/requirements.txt (line 1))
  Downloading numpy-1.19.0.zip (7.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7.3/7.3 MB 39.2 MB/s eta 0:00:00
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'error'
  error: subprocess-exited-with-error
  
  × Preparing metadata (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [54 lines of output]
      Running from numpy source directory.
      <string>:460: UserWarning: Unrecognized setuptools command, proceeding with generating Cython sources and expanding templates
      /tmp/pip-install-wolm3m1r/numpy_c4a1088a8ada41f3a3061abbffdf7ccc/tools/cythonize.py:73: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
        required_version = LooseVersion('0.29.14')
      /tmp/pip-install-wolm3m1r/numpy_c4a1088a8ada41f3a3061abbffdf7ccc/tools/cythonize.py:75: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
        if LooseVersion(cython_version) < required_version:
      
      Error compiling Cython file:
      ------------------------------------------------------------
      ...
          cdef sfc64_state rng_state
      
          def __init__(self, seed=None):
              BitGenerator.__init__(self, seed)
              self._bitgen.state = <void *>&self.rng_state
              self._bitgen.next_uint64 = &sfc64_uint64
                                         ^
      ------------------------------------------------------------
      
      _sfc64.pyx:90:35: Cannot assign type 'uint64_t (*)(void *) except? -1 nogil' to 'uint64_t (*)(void *) noexcept nogil'. Exception values are incompatible. Suggest adding 'noexcept' to type 'uint64_t (void *) except? -1 nogil'.
      Processing numpy/random/_bounded_integers.pxd.in
      Processing numpy/random/_sfc64.pyx
      Traceback (most recent call last):
        File "/tmp/pip-install-wolm3m1r/numpy_c4a1088a8ada41f3a3061abbffdf7ccc/tools/cythonize.py", line 235, in <module>
          main()
        File "/tmp/pip-install-wolm3m1r/numpy_c4a1088a8ada41f3a3061abbffdf7ccc/tools/cythonize.py", line 231, in main
          find_process_files(root_dir)
        File "/tmp/pip-install-wolm3m1r/numpy_c4a1088a8ada41f3a3061abbffdf7ccc/tools/cythonize.py", line 222, in find_process_files
          process(root_dir, fromfile, tofile, function, hash_db)
        File "/tmp/pip-install-wolm3m1r/numpy_c4a1088a8ada41f3a3061abbffdf7ccc/tools/cythonize.py", line 188, in process
          processor_function(fromfile, tofile)
        File "/tmp/pip-install-wolm3m1r/numpy_c4a1088a8ada41f3a3061abbffdf7ccc/tools/cythonize.py", line 77, in process_pyx
          subprocess.check_call(
        File "/usr/local/lib/python3.9/subprocess.py", line 373, in check_call
          raise CalledProcessError(retcode, cmd)
      subprocess.CalledProcessError: Command '['/usr/local/bin/python', '-m', 'cython', '-3', '--fast-fail', '-o', '_sfc64.c', '_sfc64.pyx']' returned non-zero exit status 1.
      Cythonizing sources
      Traceback (most recent call last):
        File "/usr/local/lib/python3.9/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
          main()
        File "/usr/local/lib/python3.9/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
          json_out['return_val'] = hook(**hook_input['kwargs'])
        File "/usr/local/lib/python3.9/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 149, in prepare_metadata_for_build_wheel
          return hook(metadata_directory, config_settings)
        File "/tmp/pip-build-env-246aljwk/overlay/lib/python3.9/site-packages/setuptools/build_meta.py", line 396, in prepare_metadata_for_build_wheel
          self.run_setup()
        File "/tmp/pip-build-env-246aljwk/overlay/lib/python3.9/site-packages/setuptools/build_meta.py", line 507, in run_setup
          super(_BuildMetaLegacyBackend, self).run_setup(setup_script=setup_script)
        File "/tmp/pip-build-env-246aljwk/overlay/lib/python3.9/site-packages/setuptools/build_meta.py", line 341, in run_setup
          exec(code, locals())
        File "<string>", line 489, in <module>
        File "<string>", line 469, in setup_package
        File "<string>", line 274, in generate_cython
      RuntimeError: Running cythonize failed!
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

Thank you

training code

Hi,thanks for your great work, and when will you release the training code?

Does the codes support GPU?

Thanks for your great work first. When testing, it seems the process is by using the CPU. Can you use the GPU power when run inference_ill.py?

About the OCR engine that you use, three questions need your help

image
Q1: Hello, in section5.1 of your paper, I notice you used Pytesseract V3.02.02, as shown in the above picture ↑
But on the homepage of pytesseract, I only find the version of 0.3.~ or 0.2.~, could you please tell me the detailed version you use. By the way, in the paper of DewarpNet, they specify the Pytesseract on version 0.2.9. Are there big differences caused by the version of OCR engine?

Q2: For the calculation of CER metric, it needs the ground true of each character in images, I also notice your repository provides 60 images index for OCR metric test, while the DewarpNet provided 25 images index as well as ground true in JSON form. Can you tell me how do you annotate the ground true? And if possible, can you share your ground true file?

In addition, I also noticed 25 ground trues in DewarpNet have several label errors, I guess they also use some OCR metric. If you also use OCR engine to label the ground true, can your some me more details about how do you annotate?

Q3: In fact, I also try to test the OCR performance over your model output. However, neither Pytesseract version 0.3.~ nor 0.2.~ achieve the same result in paper.
Here is my OCR test code:

from PIL import Image
import pytesseract

import json
import os
from os.path import join as pjoin
from pathlib import Path
import numpy as np


def edit_distance(str1, str2):
    """计算两个字符串之间的编辑距离。
    Args:
        str1: 字符串1。
        str2: 字符串2。
    Returns:
        dist: 编辑距离。
    """
    matrix = [[i + j for j in range(len(str2) + 1)] for i in range(len(str1) + 1)]
    for i in range(1, len(str1) + 1):
        for j in range(1, len(str2) + 1):
            if str1[i - 1] == str2[j - 1]:
                d = 0
            else:
                d = 1
            matrix[i][j] = min(matrix[i - 1][j] + 1, matrix[i][j - 1] + 1, matrix[i - 1][j - 1] + d)
    dist = matrix[len(str1)][len(str2)]
    return dist



def get_cer(src, trg):
    """把源字符串src修改成目标字符串trg的字符错误率。
    Args:
        src: 源字符串。
        trg: 目标字符串。
    Returns:
        cer: 字符错误率。
    """
    dist = edit_distance(src, trg)
    cer = dist / len(trg)
    return cer

if __name__ == "__main__":
    reference_list=[]
    reference_index=[] 
    img_dirList=[] 
    cer_list=[]  
    r_path = pjoin('./doctr/')
    reslut_file = open('result1.log', 'w')
    print(pytesseract.get_languages(config=''))
    with open('ocr_files.txt','r') as fr:	
        for l,line in enumerate(fr):
            reference_list.append(line)
            reference_index.append(l)
            print(len(line),line)
            print(len(line),line,file=reslut_file)
            h1str="./doctr/"+line[7:-1]+"_1 copy.png"
            h2str="./doctr/"+line[7:-1]+"_2 copy.png"
            print(h1str,h2str)
            h1=pytesseract.image_to_string(Image.open(h1str),lang='eng')
            h2=pytesseract.image_to_string(Image.open(h2str),lang='eng')

            with open('tess_gt.json','r') as file:
                str = file.read()
                r = json.loads(str).get(line[:-1])
            cer_value1=get_cer(h1, r)
            cer_value2=get_cer(h2, r)
            print(cer_value1,cer_value2)
            print(cer_value1,cer_value2,file=reslut_file)
            cer_list.append(cer_value1)
            cer_list.append(cer_value2)
    
    print(np.mean(cer_list)) 
    print(np.mean(cer_list),file=reslut_file)
    reslut_file.close()

In brief, the core code for OCR is h1=pytesseract.image_to_string(Image.open(h1str),lang='eng') , with which I only get CER of 0.6. This result is far away from 0.2~0.3 CER as previous models.

Could you share your OCR version and code for the OCR metric? Many thanks for your generous response!

About the Doc3d dataset.

Who has the method to get the Doc3d dataset? I has not received a reply after the application.

DRIC dataset

Hi, "The DRIC dataset [18] consists of 2700 distorted document images, each at 2400 × 1800 resolution. For each distorted
document image, there are corresponding backward mapping map
and scanned PDF image. " But I cannot find the scanned PDF images of DRIC dataset?

illtr train

@fh2019ustc 有几个问题请教一下
1)已经实现geotr,CER指标比repo低,但是SSIM、LD比repo差,这有可能是什么导致的?
2)现在准备复现illtr,docproj数据如何获取矫正后的图片,这个issue提到使用resampling.rectification,但是运行后并不能获取矫正图片
3)下载 ground truth scanned image只有550张,但是img、flow文件夹有2750张,那只用000_0.png矫正的图片训练illtr吗
4)论文中有crop为128*128,overlap=12.5%,但是detail中写到randomly crop,请问这部分是如何randomly crop的
image
image

Several pages on one image?

Hello! Thank you so much for your excellent work! I've got the following question: what is the best way to split one image with several pages (say, leaf notes or book opening) into smaller images containing each page separately in order to use your code on the later step? Thank you very much in advance!

About training

Hello, 1: when are you going to open source training? 2: How fast is your plan? 3: What is used in the document detection of your scheme and how is the effect

pls add environment in README

Hi, Thank you for sharing your work.
I wanna try the inference.py, but I'm getting in stuck with setting.
could you please share the environment or the requirements.txt? (CUDA Version, pytorch version, other packages .. etc)

The demo site is not working properly

Runtime error
failed to create containerd task: failed to create shim task: context canceled: unknown
Container logs:

===== Application Startup at 2023-11-16 13:17:28 =====

Caching examples at: '/home/user/app/gradio_cached_examples/12'

four question about GeoTr.py

hello hao,
I have read your pioneering work on using ViT for document unwarping.
In the paper, the design of the geometry tail is marvellous. which proposed a learnable module to perform
upsampling on the decoded features $f_{d}$.
As shown in the following figure:
image
for the part. I have four questions:
Q1: Based on my understanding, such a design is essentially the local dot product of two features map(i mean $f_{o}$ and $f_m$ ). Do I understand correctly? I feel this design is un-imaginable for me.
So, I wonder what is your motivation for this tail design? Is there any similar design in other reference papers?

Q2: in the following codeblock, why the flow need to multiply by 8

DocTr/GeoTr.py

Line 211 in bbb1af9

up_flow = F.unfold(8 * flow, [3, 3], padding=1)

Q3: in the following codeblock, why the mask need to be operated by softmax ? Is the softmax operation have some special significance here?

DocTr/GeoTr.py

Line 209 in bbb1af9

mask = torch.softmax(mask, dim=2)

Q4: in the following codeblock, why coodslar should be added to pred backward mapping? Is this operation important? My guess is that the operation here is similar to a kind of position encoding. But here is the final operation in the network, why don't you add this position encoding to previous layer?

DocTr/GeoTr.py

Line 231 in bbb1af9

bm_up = coodslar + flow_up

many thanks to your explanation.

Best wishes,
Weiguang Zhang

training code

Great work. So do u have the schedule to realease the traing code?

Rectified Images

thank you for your work! I have a question about the rectified images you gave in GoogleDrive or Baidu Cloud, are they rectified by your illumination rectification model?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.