Light

lenubolim / textdiff Goto Github PK

View Code? Open in Web Editor NEW

18.0 6.0 0.0 707 KB

Official code implementation of " TextDiff: Mask-Guided Residual Diffusion Models for Scene Text Image "

Home Page: https://www.aibupt.com/

diffusion-models scene-text-image-super-resolution datasets deep-learning deep-neural-networks image-to-image image-translation img2img low-level-vision pytorch-implementation

textdiff's Introduction

简体中文 | English | Paper

TextDiff: Mask-Guided Residual Diffusion Models for Scene Text Image Super-Resolution

这里是论文TextDiff: Mask-Guided Residual Diffusion Models for Scene Text Image Super-Resolution的官方复现仓库。TextDiff是一个场景文字超分辨率优化模型（详见论文).

网络结构

News

置顶: 介绍一款我们实验室开发的多功能且多平台的OCR软件,包含常用的各种OCR功能，例如PDF转word，PDF转excel，公式识别，表格识别以及自动去除水印功能，欢迎试用！
查看To-do lists，获取最新信息。

使用指南

环境配置

深度学习环境

python >= 3.7
pytorch >= 1.7.0
torchvision >= 0.8.0
lmdb >= 0.98
pillow >= 7.1.2
numpy
six
tqdm
python-opencv
easydict
yaml

数据集

下载TextZoom数据集

相关权重文件

训练

安装

git clone https://github.com/Lenubolim/TextDiff.git

参数配置
见config.yaml文件
训练

python train.py

推理

python test.py

To-do lists

添加训练代码(To be released soon.)
添加推理代码(To be released soon.)
使用DPM_solver减少推理步长

效果图

感谢

如果你觉得TextDiff对你有帮助，请给个star，谢谢！
如果你有任何问题，欢迎提issue(issue通知与我邮箱绑定，看到后我会及时回复)。
如果你愿意将TextDiff作为你的项目的baseline，欢迎引用我们的论文。

References

[1] Scene text telescope: Text-focused scene image super-resolution
[2] Activating more pixels in image super-resolution transformer.
[3] Srdiff: Single image super-resolution with diffusion probabilistic models.
[4] DocDiff: Document Enhancement via Residual Diffusion Models
[5] Improving Scene Text Image Super-Resolution via Dual Prior Modulation Network

📖 Citation

If you use (part of) my code or find my work helpful, please consider citing

@article{liu2023textdiff,
  title={TextDiff: Mask-Guided Residual Diffusion Models for Scene Text Image Super-Resolution},
  author={Liu, Baolin and Yang, Zongyuan and Wang, Pengfei and Zhou, Junjie and Liu, Ziqi and Song, Ziyi and Liu, Yan and Xiong, Yongping},
  journal={arXiv preprint arXiv:2308.06743},
  year={2023}
}

Acknowledgement

This code is developed relying on DocDiff and TATT. Thanks for these great projects. Among them, DocDiff is the main research content of my classmate, and I participated in part of the research.

textdiff's People

Contributors

Stargazers

Watchers

textdiff's Issues

Fantastic work! When will the inference code be published?

Fantastic work! When will the inference code be published?

开源代码？

关于扩散模型在训练和预测过程的输入

您好！我阅读了您的论文，但在某些部分我还有一些理解上的困惑。因此，我希望能够向您请教并解答一些疑问。非常感谢您的时间和帮助！
我看到您在扩散模型的训练过程中，给出的公式中xt是由xres加噪获取的图像，这里为什么可以将GT的信息（GT减去SR）作为模型训练的条件？在预测过程中，这部分依旧是xres吗，还是其他的内容？

Request for Open Sourcing Weights and Code

Dear Lenubolim,
I hope this email finds you well. I am reaching out to you from BUAA. I am writing to express my keen interest in your recent publication titled "TextDiff", I believe that your work holds significant potential to greatly assist me in my current research, and I am eager to explore it further.
In light of this, I kindly request your consideration in open-sourcing the weights and code associated with your research. I assure you that any insights gained or progress made as a result of accessing your resources will be duly acknowledged in any subsequent publications or presentations.
I understand the importance of intellectual property rights and assure you that the usage of your code and weights will be strictly for academic and research purposes. Any terms or conditions you may have regarding the sharing of these resources will be fully respected and adhered to.
Thank you very much for considering my request. I look forward to your favorable response. Please do not hesitate to reach out if you require any further information or clarification.

Warm regards,
[email protected]

Question

作者您好，
我在尝试自行复现的过程中遇到以下几点疑惑：

论文中报告DocDiff在TextZoom数据集的aster识别平均正确率只有56%，与我跑的结果相似，您认为是什么原因导致DocDiff在该数据集上表现不佳呢？
您是否尝试过在输入网络前对图片进行STN，该操作对识别率影响大嘛？
您是否尝试过在TextZoom上不使用第一阶段的CNN而是直接训练ddpm呢，效果是否有明显下降？

期待您的回复~

Any chance to open-source the code and weights?

Dear authors, this work looks great on restoring missing text. Any chance to add your code and weight files here? It would be great even if you only add inference code here.

Code

Hi,

This is a nice work! Is there a plan to release the code?

Thanks

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.