zhiqwang / sightseq Goto Github PK

View Code? Open in Web Editor NEW

123.0 11.0 35.0 208 KB

Computer vision tools for fairseq, containing PyTorch implementation of text recognition and object detection

License: MIT License

Python 100.00%

crnn ctc scene-texts pytorch ocr attention transformer image-captioning text-recognition mobilenet

sightseq's Introduction

🔭sightseq

Now, Let's go sightseeing by vision and sequence language multimodal around the deep learning world.

What's New:

July 30, 2019: Add faster rcnn models. And I rename this repo from image-captioning to sightseq, this is the last time I rename this repo, I promise.
June 11, 2019: I rewrite the text recognition part base on fairseq. Stable version refer to branch crnn, which provides pre-trained model checkpoints. Current branch is work in process. Very pleasure for suggestion and cooperation in the fairseq text recognition project.

Features:

sightseq provides reference implementations of various deep learning tasks, including:

Text Recognition
- Shi et al. (2015), CRNN: An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition
Object Detection
- New Ren et al. (2015), Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Additionally:

All features of fairseq
Flexible to enable convolution layer, recurrent layer in CRNN
Positional Encoding of images

General Requirements and Installation

PyTorch (There is a bug in nn.CTCLoss which is solved in nightly version)
Python version >= 3.5
Fairseq version >= 0.7.1
torchvision version >= 0.3.0
For training new models, you'll also need an NVIDIA GPU and NCCL

Pre-trained models and examples

License

sightseq is MIT-licensed. The license applies to the pre-trained models as well.

sightseq's People

Stargazers

Watchers

sightseq's Issues

Getting accuracy as 0.00

I am trying to train a model but always getting accuracy as 0.00 :)
My data folder ->
data.zip
Command used : python ./main.py --dataset-root G:\11\crnn.pytorch-master\crnn.pytorch-master\data --arch densenet121 --alphabet G:\11\crnn.pytorch-master\crnn.pytorch-master\data\alphabet_decode_5990.txt --lr 5e-5 --optimizer rmsprop
Training Log screenshot :

Need help here, thanks in advance for same

Error(s) in loading state_dict for CRNN:

I downloaded and tested the pretrained densenet model, but it showed the error messages.

Error(s) in loading state_dict for CRNN:
Unexpected key(s) in state_dict: "features.1.denselayer1.norm1.num_batches_tracked", ...

同一批测试数据，test-only 的accuracy和训练时的validate accuracy 差很多？

多gpu模式没有加入吗？

The vanilla cnn downsampling architecture cannot recover spatial information of a image

The convolutional part of the architecture act as a encoder part, it capture image's contexture information, the architecture should ensemble a decoder part (deconvolution layer or RNN layer) to recover image's spatial information.

中文识别率不高问题

请问下，我这边数字识别精度挺好的，中文识别率为何这么低，而且我的字典里就19个特定的中文字，图片像素也调过，训练数据也产生了1000多w的，是否需要调节模型一些参数？还是crnn里尝试blstm？

Questions about dataset object

Hi, @zhiqwang
Can you, please, explain what happens here?
https://github.com/zhiqwang/crnn.pytorch/blob/master/datasets/dataset.py#L53
After this batch target will always be 1D array, but according to the documentation you need to return batch-wise 2D array.
For example, if you input ([2], [3, 4]), you will return [2, 3, 4]

Input size

Is it possible to train the network with images of different size?

How to use it just for testing one normal image?

Don't know hot to use it.

dimensions in forward pass

@zhiqwang, could you please correct dimensions I've added below in comments in the forward() pass of CRNN class, because I cannot figure out what happens after permute line

out = self.features(x) # out: (B, H, W, C)
# features -> pool -> flatten -> decoder -> softmax
out = self.avgpool(out) # out: (B, 1, W, C)
out = out.permute(3, 0, 1, 2).view(out.size(3), out.size(0), -1) # out: (C, B, 1*W)
out = self.classifier(out) # expected in: (B, W, C) != (C, B, 1*W) ?

关于加载预训练模型的问题

想问问，我在加载您的与训练模型的时候，出现了这个问题，

我用的模型是densnet121，并且把模型放在了

这个里面，
想问问，这个是为什么呢？我模型没有改，gpu是单卡运行。
然后我又改了一下这个地方：

结果还是这样，加载不了与训练模型，想问问这是什么原因呢？
我在程序中把内置的ctc改为了warpctc，是不是这个原因呢？
谢谢

Must the training data be of equal length?

Must the training data be of equal length? How to train unequal length labelling data?

Not found recurrent layer in model files

I checked the network roughly, and I found it seems no recurrent layers like Bi-LSTM？
Is this repo another implementation for CRNN? I just see several CNN backbone and fully connected layers, but not found RNN layers.

中文识别率不高是不是因为感受野的原因？

我用的模型是mobilenetv2,在这个网络中，block的重复次数是增加感受野的，我计算了一下你小尺寸的模型的感受野是139,但是图片的尺寸是32×280,一般来讲，感受野在64附近就是比较合适，这个感受野过大会不会是影响中文识别率不高的一个原因？感谢

有关loss变为nan的情况，我看了之前的解答，但还是想问问

我要用mobilenetv2+ctc训练一批自己的数据，数据的size是32258,数据集图片是这样的，

都是32258,一共9k张训练，1k张验证，标签是这样的：
00000000.jpg 144 80 91 9 213 24 16 217 91 682 129 100 5
00000001.jpg 140 481 9 102 2612 31 330 71 65 15 4
00000002.jpg 1688 195 91 49 678 4 24 1166 2700 58 135
每一张图片中的字都是不一定的，是10个左右，比如11,13这样

首先，我在代码中改了：
parser.add_argument('--width', type=int, default=256,）
然后运行的代码为：
python main.py --gpu-id 0 --not-pretrained --optimizer adam
可是在运行了四个epoch后，出现了loss为nan的情况

ctc设置为：
criterion = nn.CTCLoss(zero_infinity=True)
图片的方差和标准差设置了成自己图片的方差和标准差，为：
model_params['mean'] = (0.57680161,0.57680161,0.57680161)
model_params['std'] = (0.1311234,0.1311234,0.1311234)

求问，这是什么原因啊，我按您之前讲解的都设置好了，还是出现了nan的情况，因为我得训练mobilenetv2的网络因为课题需要，谢谢您

How is the picture processed in sequence_generate?

The shape of the picture is (Batch, Channel, H, W)
The data shape that the sequence_generate can receive is (batch, seq_len,...)
I did not find a solution in your code, how did you deal with this problem?
Thank you

loss become inf , then Nan

mtwi_2018_train/images/001807_00031.jpg

Train: [1][108/90000] Time 0.348 (0.361) Data 0.003 (0.006) Loss 30.0584 (31.5477)
mtwi_2018_train/images/007947_00013.jpg
Train: [1][109/90000] Time 0.422 (0.361) Data 0.003 (0.006) Loss inf (inf)
mtwi_2018_train/images/002394_00012.jpg
Train: [1][110/90000] Time 0.332 (0.361) Data 0.003 (0.006) Loss nan (nan)

command:
python ./main.py --dataset-root mtwi_2018_train --arch densenet121 --alphabet ./data/alphabet_decode_5990.txt --lr 1e-6 --optimizer rmsprop --gpu-id -1 --workers 1 --not-pretrained --batch-size 1 --keep-ratio --print-freq 1

attach 007947_00013.jpg

TypeError: 'DigitsBatchTrain' object is not iterable

Extract the Chinese_dataset.rar to data folder and move all pictures to images, then modify data_test.txt to data_dev.txt.

run main.py, it shows that:

Creating directory if it does not exist:
'./checkpoint/densenet121_rmsprop_lr5.0e-05_wd5.0e-04_bsize64_imsize32'
Using model from scratch (random weights) 'densenet121'
Traceback (most recent call last):
File "/home/luban/repository/crnn.pytorch/main.py", line 352, in
main()
File "/home/luban/repository/crnn.pytorch/main.py", line 197, in main
loss = train(train_loader, model, criterion, optimizer, epoch)
File "/home/luban/repository/crnn.pytorch/main.py", line 231, in train
for i, (images, targets, target_lengths) in enumerate(train_loader):
TypeError: 'DigitsBatchTrain' object is not iterable

Process finished with exit code 1

Help Needed

I am training a CRNN model in pytorch
max_seq_length=99
number_of_alphabets=96
batch_size=16
output=CRNN(image)
what should be the expected shape of output?
Secondly, should we apply softmax in CRNN after fully connected layer?
Any help would be appreciated. Thanks

annotation file format for English data

Please, can you share an example for training English text.
CHARMAP used for data include all variable [A-Z a-z0-9 :,>/-].

target = torch.IntTensor([get_key(char_convert,i) for i in target])
TypeError: an integer is required (got type NoneType)

What should be the format of encoded data in annotation file after conversion?

训练结果在其他图片上的结果很差？

我用你的模型测试其他图片，无法识别出里面的汉字。

能提供新的依赖版本么？

RuntimeError: CUDA error: an illegal memory access was encountered

python ./main.py --dataset-root datasets --arch densenet121 --alphabet datasets/alphabet_decode_5990.txt --lr 5e-5 --optimizer rmsprop --gpu-id 6 --not-pretrained

Traceback (most recent call last):
File "./main.py", line 387, in
main()
File "./main.py", line 209, in main
_ = train(train_loader, model, criterion, optimizer, epoch)
File "./main.py", line 274, in train
loss = criterion(log_probs, targets, input_lengths, target_lengths)
File "/home/ronghui/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/ronghui/anaconda3/lib/python3.6/site-packages/torch/nn/modules/loss.py", line 1248, in forward
return F.ctc_loss(log_probs, targets, input_lengths, target_lengths, self.blank, self.reduction)
File "/home/ronghui/anaconda3/lib/python3.6/site-packages/torch/nn/functional.py", line 1732, in ctc_loss
return torch.ctc_loss(log_probs, targets, input_lengths, target_lengths, blank, _Reduction.get_enum(reduction))
RuntimeError: CUDA error: an illegal memory access was encountered