Giter Club home page Giter Club logo

textdiff's Introduction

简体中文 | English | Paper

TextDiff: Mask-Guided Residual Diffusion Models for Scene Text Image Super-Resolution

这里是论文TextDiff: Mask-Guided Residual Diffusion Models for Scene Text Image Super-Resolution的官方复现仓库。TextDiff是一个场景文字超分辨率优化模型(详见论文).

网络结构

News

  • 置顶: 介绍一款我们实验室开发的多功能且多平台的OCR软件,包含常用的各种OCR功能,例如PDF转word,PDF转excel,公式识别,表格识别以及自动去除水印功能,欢迎试用!
  • 查看To-do lists,获取最新信息。

使用指南

环境配置

深度学习环境

  • python >= 3.7
  • pytorch >= 1.7.0
  • torchvision >= 0.8.0
  • lmdb >= 0.98
  • pillow >= 7.1.2
  • numpy
  • six
  • tqdm
  • python-opencv
  • easydict
  • yaml

数据集

相关权重文件

训练

  1. 安装
git clone https://github.com/Lenubolim/TextDiff.git
  1. 参数配置
    见config.yaml文件

  2. 训练

python train.py

推理

python test.py

To-do lists

  • 添加训练代码(To be released soon.)
  • 添加推理代码(To be released soon.)
  • 使用DPM_solver减少推理步长

效果图

感谢

  • 如果你觉得TextDiff对你有帮助,请给个star,谢谢!
  • 如果你有任何问题,欢迎提issue(issue通知与我邮箱绑定,看到后我会及时回复)。
  • 如果你愿意将TextDiff作为你的项目的baseline,欢迎引用我们的论文。

References

  • [1] Scene text telescope: Text-focused scene image super-resolution
  • [2] Activating more pixels in image super-resolution transformer.
  • [3] Srdiff: Single image super-resolution with diffusion probabilistic models.
  • [4] DocDiff: Document Enhancement via Residual Diffusion Models
  • [5] Improving Scene Text Image Super-Resolution via Dual Prior Modulation Network

📖 Citation

If you use (part of) my code or find my work helpful, please consider citing

@article{liu2023textdiff,
  title={TextDiff: Mask-Guided Residual Diffusion Models for Scene Text Image Super-Resolution},
  author={Liu, Baolin and Yang, Zongyuan and Wang, Pengfei and Zhou, Junjie and Liu, Ziqi and Song, Ziyi and Liu, Yan and Xiong, Yongping},
  journal={arXiv preprint arXiv:2308.06743},
  year={2023}
}

Acknowledgement

This code is developed relying on DocDiff and TATT. Thanks for these great projects. Among them, DocDiff is the main research content of my classmate, and I participated in part of the research.

textdiff's People

Contributors

lenubolim avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

textdiff's Issues

关于扩散模型在训练和预测过程的输入

您好!我阅读了您的论文,但在某些部分我还有一些理解上的困惑。因此,我希望能够向您请教并解答一些疑问。非常感谢您的时间和帮助!
我看到您在扩散模型的训练过程中,给出的公式中xt是由xres加噪获取的图像,这里为什么可以将GT的信息(GT减去SR)作为模型训练的条件?在预测过程中,这部分依旧是xres吗,还是其他的内容?

Request for Open Sourcing Weights and Code

Dear Lenubolim,
I hope this email finds you well. I am reaching out to you from BUAA. I am writing to express my keen interest in your recent publication titled "TextDiff", I believe that your work holds significant potential to greatly assist me in my current research, and I am eager to explore it further.
In light of this, I kindly request your consideration in open-sourcing the weights and code associated with your research. I assure you that any insights gained or progress made as a result of accessing your resources will be duly acknowledged in any subsequent publications or presentations.
I understand the importance of intellectual property rights and assure you that the usage of your code and weights will be strictly for academic and research purposes. Any terms or conditions you may have regarding the sharing of these resources will be fully respected and adhered to.
Thank you very much for considering my request. I look forward to your favorable response. Please do not hesitate to reach out if you require any further information or clarification.

Warm regards,
[email protected]

Question

作者您好,
我在尝试自行复现的过程中遇到以下几点疑惑:

  1. 论文中报告DocDiff在TextZoom数据集的aster识别平均正确率只有56%,与我跑的结果相似,您认为是什么原因导致DocDiff在该数据集上表现不佳呢?
  2. 您是否尝试过在输入网络前对图片进行STN,该操作对识别率影响大嘛?
  3. 您是否尝试过在TextZoom上不使用第一阶段的CNN而是直接训练ddpm呢,效果是否有明显下降?

期待您的回复~

Code

Hi,

This is a nice work! Is there a plan to release the code?

Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.