fh2019ustc / doctr Goto Github PK
View Code? Open in Web Editor NEWThe official code for “DocTr: Document Image Transformer for Geometric Unwarping and Illumination Correction”, ACM MM, Oral Paper, 2021.
License: MIT License
The official code for “DocTr: Document Image Transformer for Geometric Unwarping and Illumination Correction”, ACM MM, Oral Paper, 2021.
License: MIT License
Hello there,
I am using DocTr to enhance quality of few images in my project and I am finding that DocTr is introducing distortions in the file output. Pls let me know if I am using it incorrectly.
Steps I followed.
The original file that has been used is this
The image output from DocTr is this
Comparison for ease of reference
Please let me know if this is a desired behavior or am I doing anything wrong
Request your immediate response, as I have to conclude my research and submit my project as a part of my MS program.
Thank you in advance
I'm a little confused about the use of coords0, coords1, and mask.
Why not just simply used upsample or ConvTranspose
Lines 226 to 233 in 729fcb8
how to convert the pretrained model to onnx? How to use such onnx model?
Hello, can I train the model by myself with this code please?
Hi, splendid work you've done! I've been considering to reproduce it but I don't know what kind of hardware conditions are recommended to successfully reproduced the training of Geo and Illu network, so I wish to know the GPU memory consumed for your training process. Is it 4 1080ti GPUs that you've used? Tks. I really need to know this!
Hi,
when will you release the training code?
Thank you in advance for any help you can provide.
Traceback (most recent call last):
File "E:\jiaozheng\DocTr-main\inference.py", line 138, in
main()
File "E:\jiaozheng\DocTr-main\inference.py", line 134, in main
rec(opt)
File "E:\jiaozheng\DocTr-main\inference.py", line 76, in rec
GeoTr_Seg_model = GeoTr_Seg().cuda()
File "E:\jiaozheng\DocTr-main\venv\lib\site-packages\torch\nn\modules\module.py", line 905, in cuda
return self._apply(lambda t: t.cuda(device))
File "E:\jiaozheng\DocTr-main\venv\lib\site-packages\torch\nn\modules\module.py", line 797, in _apply
module._apply(fn)
File "E:\jiaozheng\DocTr-main\venv\lib\site-packages\torch\nn\modules\module.py", line 797, in _apply
module._apply(fn)
File "E:\jiaozheng\DocTr-main\venv\lib\site-packages\torch\nn\modules\module.py", line 797, in _apply
module._apply(fn)
[Previous line repeated 1 more time]
File "E:\jiaozheng\DocTr-main\venv\lib\site-packages\torch\nn\modules\module.py", line 820, in _apply
param_applied = fn(param)
File "E:\jiaozheng\DocTr-main\venv\lib\site-packages\torch\nn\modules\module.py", line 905, in
return self.apply(lambda t: t.cuda(device))
File "E:\jiaozheng\DocTr-main\venv\lib\site-packages\torch\cuda_init.py", line 239, in _lazy_init
raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled
Hi,
I tried multiple times with different versions of python to install the requirements, but could not because an internal error. Can you please confirm that all the packages are present with the correct version ? Also, with which versions of Python did you work ?
RUN pip install --no-cache-dir -r ./DocTr/requirements.txt
Collecting numpy==1.19.0 (from -r ./DocTr/requirements.txt (line 1))
Downloading numpy-1.19.0.zip (7.3 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7.3/7.3 MB 39.2 MB/s eta 0:00:00
Installing build dependencies: started
Installing build dependencies: finished with status 'done'
Getting requirements to build wheel: started
Getting requirements to build wheel: finished with status 'done'
Preparing metadata (pyproject.toml): started
Preparing metadata (pyproject.toml): finished with status 'error'
error: subprocess-exited-with-error
× Preparing metadata (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> [54 lines of output]
Running from numpy source directory.
<string>:460: UserWarning: Unrecognized setuptools command, proceeding with generating Cython sources and expanding templates
/tmp/pip-install-wolm3m1r/numpy_c4a1088a8ada41f3a3061abbffdf7ccc/tools/cythonize.py:73: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
required_version = LooseVersion('0.29.14')
/tmp/pip-install-wolm3m1r/numpy_c4a1088a8ada41f3a3061abbffdf7ccc/tools/cythonize.py:75: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
if LooseVersion(cython_version) < required_version:
Error compiling Cython file:
------------------------------------------------------------
...
cdef sfc64_state rng_state
def __init__(self, seed=None):
BitGenerator.__init__(self, seed)
self._bitgen.state = <void *>&self.rng_state
self._bitgen.next_uint64 = &sfc64_uint64
^
------------------------------------------------------------
_sfc64.pyx:90:35: Cannot assign type 'uint64_t (*)(void *) except? -1 nogil' to 'uint64_t (*)(void *) noexcept nogil'. Exception values are incompatible. Suggest adding 'noexcept' to type 'uint64_t (void *) except? -1 nogil'.
Processing numpy/random/_bounded_integers.pxd.in
Processing numpy/random/_sfc64.pyx
Traceback (most recent call last):
File "/tmp/pip-install-wolm3m1r/numpy_c4a1088a8ada41f3a3061abbffdf7ccc/tools/cythonize.py", line 235, in <module>
main()
File "/tmp/pip-install-wolm3m1r/numpy_c4a1088a8ada41f3a3061abbffdf7ccc/tools/cythonize.py", line 231, in main
find_process_files(root_dir)
File "/tmp/pip-install-wolm3m1r/numpy_c4a1088a8ada41f3a3061abbffdf7ccc/tools/cythonize.py", line 222, in find_process_files
process(root_dir, fromfile, tofile, function, hash_db)
File "/tmp/pip-install-wolm3m1r/numpy_c4a1088a8ada41f3a3061abbffdf7ccc/tools/cythonize.py", line 188, in process
processor_function(fromfile, tofile)
File "/tmp/pip-install-wolm3m1r/numpy_c4a1088a8ada41f3a3061abbffdf7ccc/tools/cythonize.py", line 77, in process_pyx
subprocess.check_call(
File "/usr/local/lib/python3.9/subprocess.py", line 373, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/usr/local/bin/python', '-m', 'cython', '-3', '--fast-fail', '-o', '_sfc64.c', '_sfc64.pyx']' returned non-zero exit status 1.
Cythonizing sources
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
main()
File "/usr/local/lib/python3.9/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
json_out['return_val'] = hook(**hook_input['kwargs'])
File "/usr/local/lib/python3.9/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 149, in prepare_metadata_for_build_wheel
return hook(metadata_directory, config_settings)
File "/tmp/pip-build-env-246aljwk/overlay/lib/python3.9/site-packages/setuptools/build_meta.py", line 396, in prepare_metadata_for_build_wheel
self.run_setup()
File "/tmp/pip-build-env-246aljwk/overlay/lib/python3.9/site-packages/setuptools/build_meta.py", line 507, in run_setup
super(_BuildMetaLegacyBackend, self).run_setup(setup_script=setup_script)
File "/tmp/pip-build-env-246aljwk/overlay/lib/python3.9/site-packages/setuptools/build_meta.py", line 341, in run_setup
exec(code, locals())
File "<string>", line 489, in <module>
File "<string>", line 469, in setup_package
File "<string>", line 274, in generate_cython
RuntimeError: Running cythonize failed!
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed
× Encountered error while generating package metadata.
╰─> See above for output.
note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
Thank you
Hi,thanks for your great work, and when will you release the training code?
Thanks for your great work first. When testing, it seems the process is by using the CPU. Can you use the GPU power when run inference_ill.py?
Q1: Hello, in section5.1 of your paper, I notice you used Pytesseract V3.02.02, as shown in the above picture ↑
But on the homepage of pytesseract, I only find the version of 0.3.~ or 0.2.~, could you please tell me the detailed version you use. By the way, in the paper of DewarpNet, they specify the Pytesseract on version 0.2.9. Are there big differences caused by the version of OCR engine?
Q2: For the calculation of CER metric, it needs the ground true of each character in images, I also notice your repository provides 60 images index for OCR metric test, while the DewarpNet provided 25 images index as well as ground true in JSON form. Can you tell me how do you annotate the ground true? And if possible, can you share your ground true file?
In addition, I also noticed 25 ground trues in DewarpNet have several label errors, I guess they also use some OCR metric. If you also use OCR engine to label the ground true, can your some me more details about how do you annotate?
Q3: In fact, I also try to test the OCR performance over your model output. However, neither Pytesseract version 0.3.~ nor 0.2.~ achieve the same result in paper.
Here is my OCR test code:
from PIL import Image
import pytesseract
import json
import os
from os.path import join as pjoin
from pathlib import Path
import numpy as np
def edit_distance(str1, str2):
"""计算两个字符串之间的编辑距离。
Args:
str1: 字符串1。
str2: 字符串2。
Returns:
dist: 编辑距离。
"""
matrix = [[i + j for j in range(len(str2) + 1)] for i in range(len(str1) + 1)]
for i in range(1, len(str1) + 1):
for j in range(1, len(str2) + 1):
if str1[i - 1] == str2[j - 1]:
d = 0
else:
d = 1
matrix[i][j] = min(matrix[i - 1][j] + 1, matrix[i][j - 1] + 1, matrix[i - 1][j - 1] + d)
dist = matrix[len(str1)][len(str2)]
return dist
def get_cer(src, trg):
"""把源字符串src修改成目标字符串trg的字符错误率。
Args:
src: 源字符串。
trg: 目标字符串。
Returns:
cer: 字符错误率。
"""
dist = edit_distance(src, trg)
cer = dist / len(trg)
return cer
if __name__ == "__main__":
reference_list=[]
reference_index=[]
img_dirList=[]
cer_list=[]
r_path = pjoin('./doctr/')
reslut_file = open('result1.log', 'w')
print(pytesseract.get_languages(config=''))
with open('ocr_files.txt','r') as fr:
for l,line in enumerate(fr):
reference_list.append(line)
reference_index.append(l)
print(len(line),line)
print(len(line),line,file=reslut_file)
h1str="./doctr/"+line[7:-1]+"_1 copy.png"
h2str="./doctr/"+line[7:-1]+"_2 copy.png"
print(h1str,h2str)
h1=pytesseract.image_to_string(Image.open(h1str),lang='eng')
h2=pytesseract.image_to_string(Image.open(h2str),lang='eng')
with open('tess_gt.json','r') as file:
str = file.read()
r = json.loads(str).get(line[:-1])
cer_value1=get_cer(h1, r)
cer_value2=get_cer(h2, r)
print(cer_value1,cer_value2)
print(cer_value1,cer_value2,file=reslut_file)
cer_list.append(cer_value1)
cer_list.append(cer_value2)
print(np.mean(cer_list))
print(np.mean(cer_list),file=reslut_file)
reslut_file.close()
In brief, the core code for OCR is h1=pytesseract.image_to_string(Image.open(h1str),lang='eng')
, with which I only get CER of 0.6. This result is far away from 0.2~0.3 CER as previous models.
Could you share your OCR version and code for the OCR metric? Many thanks for your generous response!
I want to run the pre-training model to know its effect.
Who has the method to get the Doc3d dataset? I has not received a reply after the application.
Hi, "The DRIC dataset [18] consists of 2700 distorted document images, each at 2400 × 1800 resolution. For each distorted
document image, there are corresponding backward mapping map
and scanned PDF image. " But I cannot find the scanned PDF images of DRIC dataset?
@fh2019ustc 有几个问题请教一下
1)已经实现geotr,CER指标比repo低,但是SSIM、LD比repo差,这有可能是什么导致的?
2)现在准备复现illtr,docproj数据如何获取矫正后的图片,这个issue提到使用resampling.rectification,但是运行后并不能获取矫正图片
3)下载 ground truth scanned image只有550张,但是img、flow文件夹有2750张,那只用000_0.png矫正的图片训练illtr吗
4)论文中有crop为128*128,overlap=12.5%,但是detail中写到randomly crop,请问这部分是如何randomly crop的
Hello! Thank you so much for your excellent work! I've got the following question: what is the best way to split one image with several pages (say, leaf notes or book opening) into smaller images containing each page separately in order to use your code on the later step? Thank you very much in advance!
How to do light correction only?
Hello, 1: when are you going to open source training? 2: How fast is your plan? 3: What is used in the document detection of your scheme and how is the effect
Hi @fh2019ustc ,
Thanks you for great repo!
Do you have a plan to release the training code of this paper?
Hope to see your response, thanks in advance!
Hi, Thank you for sharing your work.
I wanna try the inference.py, but I'm getting in stuck with setting.
could you please share the environment or the requirements.txt? (CUDA Version, pytorch version, other packages .. etc)
If I want to train the model with my own data, is there any way to generate the backward mapping?
Runtime error
failed to create containerd task: failed to create shim task: context canceled: unknown
Container logs:
===== Application Startup at 2023-11-16 13:17:28 =====
Caching examples at: '/home/user/app/gradio_cached_examples/12'
hello hao,
I have read your pioneering work on using ViT for document unwarping.
In the paper, the design of the geometry tail is marvellous. which proposed a learnable module to perform
upsampling on the decoded features
As shown in the following figure:
for the part. I have four questions:
Q1: Based on my understanding, such a design is essentially the local dot product of two features map(i mean
So, I wonder what is your motivation for this tail design? Is there any similar design in other reference papers?
Q2: in the following codeblock, why the flow
need to multiply by 8
Line 211 in bbb1af9
Q3: in the following codeblock, why the mask
need to be operated by softmax ? Is the softmax operation have some special significance here?
Line 209 in bbb1af9
Q4: in the following codeblock, why coodslar
should be added to pred backward mapping? Is this operation important? My guess is that the operation here is similar to a kind of position encoding. But here is the final operation in the network, why don't you add this position encoding to previous layer?
Line 231 in bbb1af9
many thanks to your explanation.
Best wishes,
Weiguang Zhang
Great work. So do u have the schedule to realease the traing code?
thank you for your work! I have a question about the rectified images you gave in GoogleDrive or Baidu Cloud, are they rectified by your illumination rectification model?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.