mjq11302010044 / tpgsr Goto Github PK

View Code? Open in Web Editor NEW

134.0 1.0 17.0 4.28 MB

Code for Text Prior Guided Scene Text Image Super-Resolution (TIP 2023)

License: MIT License

Python 99.80% Shell 0.20%

tpgsr's Introduction

Text Prior Guided Scene Text Image Super-Resolution (TIP 2023)

https://arxiv.org/abs/2106.15368

Jianqi Ma, Shi Guo, Lei Zhang
Department of Computing, The Hong Kong Polytechnic University, Hong Kong, China

Recovering TextZoom samples

Environment:

Other possible python packages like pyyaml, cv2, Pillow and imgaug

Main idea

Single stage with loss

Multi-stage version

Configure your training

Download the pretrained recognizer from:

Aster: https://github.com/ayumiymk/aster.pytorch  
MORAN:  https://github.com/Canjie-Luo/MORAN_v2  
CRNN: https://github.com/meijieru/crnn.pytorch

Unzip the codes and walk into the 'TPGSR_ROOT/', place the pretrained weights from recognizer in 'TPGSR_ROOT/'.

Download the TextZoom dataset:

https://github.com/JasonBoy1/TextZoom

Train the corresponding model (e.g. TPGSR-TSRN):

chmod a+x train_TPGSR-TSRN.sh
./train_TPGSR-TSRN.sh
or
python3 main.py --arch="tsrn_tl_cascade" \       # The architecture
                --batch_size=48 \                # The batch size
                --STN \                          # Using STN net for alignment
		--mask \                         # Using the contour mask
		--use_distill \                  # Using the TP loss
		--gradient \                     # Using the Gradient Prior Loss
		--sr_share \                     # Sharing weights for SR Module
		--stu_iter=1 \                   # The number of interations in multi-stage version
		--vis_dir='vis_TPGSR-TSRN' \     # The checkpoint directory

Run the test-prefixed shell to test the corresponding model.

Adding '--go_test' in the shell file

Cite this paper:

@article{ma2021text,
title={Text Prior Guided Scene Text Image Super-resolution},
author={Ma, Jianqi and Guo, Shi and Zhang, Lei},
journal={IEEE Transactions on Image Processing},
year={2023}
}

tpgsr's People

Contributors

Stargazers

Watchers

Forkers

tpgsr's Issues

trained model

hi,
would you please publish your trained model?(.pth file)
due to gpu constraint, i can not train the model.
thanks alot

Errror when run demo ?

I use GPU when inference but i don't know why error . Which one model run on CPU ?

loading pretrained crnn model from crnn.pth
0%| | 0/4 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/home/thorpham/Documents/challenge/super-resolution/TPGSR/main.py", line 76, in
main(config, args, opt_TPG=opt)
File "/home/thorpham/Documents/challenge/super-resolution/TPGSR/main.py", line 16, in main
Mission.demo()
File "/home/thorpham/Documents/challenge/super-resolution/TPGSR/interfaces/super_resolution.py", line 1480, in demo
images_sr = model(images_lr)
File "/home/thorpham/anaconda3/envs/torch/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/thorpham/Documents/challenge/super-resolution/TPGSR/model/tsrn.py", line 195, in forward
spatial_t_emb = self.infoGen(text_emb)
File "/home/thorpham/anaconda3/envs/torch/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/thorpham/Documents/challenge/super-resolution/TPGSR/model/tsrn.py", line 103, in forward
x = F.relu(self.bn1(self.tconv1(t_embedding)))
File "/home/thorpham/anaconda3/envs/torch/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/home/thorpham/anaconda3/envs/torch/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 916, in forward
return F.conv_transpose2d(
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking arugment for argument weight in method wrapper_slow_conv_transpose2d)

TPGSR-3

我在训练TPGSR-3时发现效果很差，没对代码进行修改只是将--stu_iter改成3，请问您在训练TPGSR-3时采用了什么配置

Where is the final model?

Hi there,

I'd like to reproduce your amazing work but I can only find the pretrained models and not the final fine-tuned model. Am I correct?

Could you please upload the final model?

Thank you.

Code visualize when training error ?

Thank for your work . The paper is great . I read paper and training to understand model . But when i don't know how many epoch model is best so i want to visualize some image when training. But code error, can you show me how to fix it

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0!

Hi @mjq11302010044,

I was successfully able to train the model, using the code in the repository. But , when I run Test.sh script, I have following error:
"RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument weight in method wrapper_slow_conv_transpose2d)".

I spent almost more than 2 days debugging it, but cannot get past this error.  Can you please help me resolve the issue if you have solution for this?

Regards,
Nakul

Issues about TSRN derived structures!

Hi, Ma, thanks for your nice job! Actually, I got some issues and begging for your early rely.

There are several TSRN derived structures mentioned in the code, like 'sem_tsrn', 'tsrn_c2f', 'tsrn_tl', 'tsrn_tl_cascade', 'tsrn_tl_wmask' etc. But actually, I just reproduced the 'tsrn_tl_cascade' arch successfully. The 'sem_tsrn' arch should be the core arch, isn't it? But why is there no 'sem_tsrn' in the 'args.arch' choices. Unfortunately, I still failed to reproduced it when I added 'sem_tsrn' into the choices of args.arch and set the args.arch=‘sem_tsrn’. Maybe there is something wrong in the released code I guess.
Can you explain the differences in these derived structures like ''tsrn_c2f', 'tsrn_tl_cascade', 'tsrn_tl_wmask' expect the 'data difference' from different arch? Or could you please give some detailed instructions in the README.md. It's a bit hard to understand the purpose of these structures when I read the code.

Thx again!

Why doesn't the loss converge when I train？

I trained for more than 400 epochs and the loss is still 1.x

请问有没有预训练模型呢

我想把您的模型作为对照测试组，但是发现没有预训练模型，请问能否分享一个呢

Clarification (Bangali Text).

I was reading your paper, interesting work. However, are you sure this is Bengali font? Can you check again?

您好,可以发布预训练模型吗？

您好，我对您的研究非常感兴趣，方便发布预训练模型吗？
对我非常有用，感谢您的开源

跪求一份 BTS: 双语文本分割数据集可以么

你好。冒昧打扰了，可以跟你申请一份 BTS: 双语文本分割数据集？祝好

Request to add a license

Hi Ma,
Great work on the paper and the implementation! I noticed that the repo did not have an license. I was wondering if you could add one so that I can understand the scope of use for the code.

Best,
Jeswin James

about arch

Amazing work! hello, what is the difference between 'sem_tsrn', 'tsrn_c2f', 'tsrn_tl', 'tsrn_tl_cascade', 'tsrn_tl_wmask'?
I want to reproduce your work, which one should be selected?Thanks!

训练时间

你好，我看了你的论文和code，论文描述：The batch size is set to 48 and the model is trained for 500 epochs with one NVIDIA RTX 2080Ti GPU，请问一下跑500epochs大概需要多长时间

共享SR和非共享TP

论文中多阶段训练提出使用共享SR和非共享TP,但是代码中写的是非共享SR和共享TP

根据你提出的训练命令
python3 -u main.py --arch="tsrn_tl_cascade" --batch_size=48 --STN --mask --use_distill --gradient --sr_share --stu_iter=3 --vis_dir='vis_TPGSR-TSRN'
--sr_share 默认为False,训练时是True

RuntimeError: Given groups=1, weight of size [64, 4, 9, 9], expected input[1, 5, 32, 256] to have 4 channels

Hello, thank you for your excellent work. I have trained a model and want to process several images with blurred text. The command I use is as follows:
python main.py --arch="tsrn_tl_cascade" --test_model="CRNN" --batch_size=4 --STN --mask --sr_share --gradient --demo --stu_iter=1 --vis_dir='default' --resume=ckpt/vis_TPGSR-TSRN/model_best_0.pth --demo_dir demo

The blurred images are in the demo folder(four jpg images), run and prompt error:
RuntimeError: Given groups=1, weight of size [64, 4, 9, 9], expected input[1, 5, 32, 256] to have 4 channels, but got 5 channels instead
why?

loading pre-trained model from ckpt/vis_TPGSR-TSRN/model_best_0.pth
File "D:\anaconda-install\envs\envpython38\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "D:\anaconda-install\envs\envpython38\lib\site-packages\torch\nn\modules\container.py", line 204, in forward
input = module(input)
File "D:\anaconda-install\envs\envpython38\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "D:\anaconda-install\envs\envpython38\lib\site-packages\torch\nn\modules\conv.py", line 463, in forward
return self._conv_forward(input, self.weight, self.bias)
File "D:\anaconda-install\envs\envpython38\lib\site-packages\torch\nn\modules\conv.py", line 459, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Given groups=1, weight of size [64, 4, 9, 9], expected input[1, 5, 32, 256] to have 4 channels, but got 5 channels instead

Add new TP Generator Model

Hi, How can I add a new tp generator and train? Which rows that I have to change for changing tp generator model and what kind of changes that I have to do? Can I obtain label_vecs_final from other text recognition models to give model?

您好，仔细看了您的论文，觉得您的思路很棒，许多地方令我茅塞顿开，想好好学习一下您的源代码，就想问下你的整个项目核心程序是哪几个.py文件呢，一时半会拿到您的代码感觉很头疼。

How can i infer a low resolution image?

I have finished the training process, how can i use the trained model to get a high resolution text image?

RuntimeError: Tensor for argument #1 'input' is on CPU, Tensor for argument #2 'output' is on CPU, but expected them to be on GPU (while checking arguments for slow_conv_transpose2d_out_cuda)

Hi, when i try to test your model with this command:
python main.py --arch="tsrn_tl_cascade" --test_model="CRNN" --test_data_dir=../TPGSR-main/dataset/TextZoom/test/hard --batch_size=48 --STN --mask --sr_share --gradient --test --stu_iter=1 --vis_dir='hard'
in this line(TPGSR-main\interfaces\super_resolution.py", line 1382, in test):
images_sr = model(images_lr)
i receive this error:
RuntimeError: Tensor for argument #1 'input' is on CPU, Tensor for argument #2 'output' is on CPU, but expected them to be on GPU (while checking arguments for slow_conv_transpose2d_out_cuda)
what should i do?