zhiqwang / sightseq Goto Github PK

Computer vision tools for fairseq, containing PyTorch implementation of text recognition and object detection

License: MIT License

Python 100.00%

crnn ctc scene-texts pytorch ocr attention transformer image-captioning text-recognition mobilenet

sightseq's Issues

Getting accuracy as 0.00

I am trying to train a model but always getting accuracy as 0.00 :)
My data folder ->
data.zip
Command used : python ./main.py --dataset-root G:\11\crnn.pytorch-master\crnn.pytorch-master\data --arch densenet121 --alphabet G:\11\crnn.pytorch-master\crnn.pytorch-master\data\alphabet_decode_5990.txt --lr 5e-5 --optimizer rmsprop
Training Log screenshot :

Need help here, thanks in advance for same

Must the training data be of equal length?

Must the training data be of equal length? How to train unequal length labelling data?

关于加载预训练模型的问题

想问问，我在加载您的与训练模型的时候，出现了这个问题，

我用的模型是densnet121，并且把模型放在了

这个里面，
想问问，这个是为什么呢？我模型没有改，gpu是单卡运行。
然后我又改了一下这个地方：

结果还是这样，加载不了与训练模型，想问问这是什么原因呢？
我在程序中把内置的ctc改为了warpctc，是不是这个原因呢？
谢谢

有关loss变为nan的情况，我看了之前的解答，但还是想问问

我要用mobilenetv2+ctc训练一批自己的数据，数据的size是32258,数据集图片是这样的，

都是32258,一共9k张训练，1k张验证，标签是这样的：
00000000.jpg 144 80 91 9 213 24 16 217 91 682 129 100 5
00000001.jpg 140 481 9 102 2612 31 330 71 65 15 4
00000002.jpg 1688 195 91 49 678 4 24 1166 2700 58 135
每一张图片中的字都是不一定的，是10个左右，比如11,13这样

首先，我在代码中改了：
parser.add_argument('--width', type=int, default=256,）
然后运行的代码为：
python main.py --gpu-id 0 --not-pretrained --optimizer adam
可是在运行了四个epoch后，出现了loss为nan的情况

ctc设置为：
criterion = nn.CTCLoss(zero_infinity=True)
图片的方差和标准差设置了成自己图片的方差和标准差，为：
model_params['mean'] = (0.57680161,0.57680161,0.57680161)
model_params['std'] = (0.1311234,0.1311234,0.1311234)

求问，这是什么原因啊，我按您之前讲解的都设置好了，还是出现了nan的情况，因为我得训练mobilenetv2的网络因为课题需要，谢谢您

RuntimeError: CUDA error: an illegal memory access was encountered

python ./main.py --dataset-root datasets --arch densenet121 --alphabet datasets/alphabet_decode_5990.txt --lr 5e-5 --optimizer rmsprop --gpu-id 6 --not-pretrained

Traceback (most recent call last):
File "./main.py", line 387, in
main()
File "./main.py", line 209, in main
_ = train(train_loader, model, criterion, optimizer, epoch)
File "./main.py", line 274, in train
loss = criterion(log_probs, targets, input_lengths, target_lengths)
File "/home/ronghui/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/ronghui/anaconda3/lib/python3.6/site-packages/torch/nn/modules/loss.py", line 1248, in forward
return F.ctc_loss(log_probs, targets, input_lengths, target_lengths, self.blank, self.reduction)
File "/home/ronghui/anaconda3/lib/python3.6/site-packages/torch/nn/functional.py", line 1732, in ctc_loss
return torch.ctc_loss(log_probs, targets, input_lengths, target_lengths, blank, _Reduction.get_enum(reduction))
RuntimeError: CUDA error: an illegal memory access was encountered

Error(s) in loading state_dict for CRNN:

I downloaded and tested the pretrained densenet model, but it showed the error messages.

Error(s) in loading state_dict for CRNN:
Unexpected key(s) in state_dict: "features.1.denselayer1.norm1.num_batches_tracked", ...

Input size

Is it possible to train the network with images of different size?

Questions about dataset object

Hi, @zhiqwang
Can you, please, explain what happens here?
https://github.com/zhiqwang/crnn.pytorch/blob/master/datasets/dataset.py#L53
After this batch target will always be 1D array, but according to the documentation you need to return batch-wise 2D array.
For example, if you input ([2], [3, 4]), you will return [2, 3, 4]

Help Needed

I am training a CRNN model in pytorch
max_seq_length=99
number_of_alphabets=96
batch_size=16
output=CRNN(image)
what should be the expected shape of output?
Secondly, should we apply softmax in CRNN after fully connected layer?
Any help would be appreciated. Thanks

TypeError: 'DigitsBatchTrain' object is not iterable

Extract the Chinese_dataset.rar to data folder and move all pictures to images, then modify data_test.txt to data_dev.txt.

run main.py, it shows that:

Creating directory if it does not exist:
'./checkpoint/densenet121_rmsprop_lr5.0e-05_wd5.0e-04_bsize64_imsize32'
Using model from scratch (random weights) 'densenet121'
Traceback (most recent call last):
File "/home/luban/repository/crnn.pytorch/main.py", line 352, in
main()
File "/home/luban/repository/crnn.pytorch/main.py", line 197, in main
loss = train(train_loader, model, criterion, optimizer, epoch)
File "/home/luban/repository/crnn.pytorch/main.py", line 231, in train
for i, (images, targets, target_lengths) in enumerate(train_loader):
TypeError: 'DigitsBatchTrain' object is not iterable

Process finished with exit code 1

中文识别率不高是不是因为感受野的原因？

我用的模型是mobilenetv2,在这个网络中，block的重复次数是增加感受野的，我计算了一下你小尺寸的模型的感受野是139,但是图片的尺寸是32×280,一般来讲，感受野在64附近就是比较合适，这个感受野过大会不会是影响中文识别率不高的一个原因？感谢

How is the picture processed in sequence_generate?

The shape of the picture is (Batch, Channel, H, W)
The data shape that the sequence_generate can receive is (batch, seq_len,...)
I did not find a solution in your code, how did you deal with this problem?
Thank you

dimensions in forward pass

@zhiqwang, could you please correct dimensions I've added below in comments in the forward() pass of CRNN class, because I cannot figure out what happens after permute line

out = self.features(x) # out: (B, H, W, C)
# features -> pool -> flatten -> decoder -> softmax
out = self.avgpool(out) # out: (B, 1, W, C)
out = out.permute(3, 0, 1, 2).view(out.size(3), out.size(0), -1) # out: (C, B, 1*W)
out = self.classifier(out) # expected in: (B, W, C) != (C, B, 1*W) ?

训练结果在其他图片上的结果很差？

我用你的模型测试其他图片，无法识别出里面的汉字。

annotation file format for English data

Please, can you share an example for training English text.
CHARMAP used for data include all variable [A-Z a-z0-9 :,>/-].

target = torch.IntTensor([get_key(char_convert,i) for i in target])
TypeError: an integer is required (got type NoneType)

What should be the format of encoded data in annotation file after conversion?

中文识别率不高问题

请问下，我这边数字识别精度挺好的，中文识别率为何这么低，而且我的字典里就19个特定的中文字，图片像素也调过，训练数据也产生了1000多w的，是否需要调节模型一些参数？还是crnn里尝试blstm？

loss become inf , then Nan

mtwi_2018_train/images/001807_00031.jpg

Train: [1][108/90000] Time 0.348 (0.361) Data 0.003 (0.006) Loss 30.0584 (31.5477)
mtwi_2018_train/images/007947_00013.jpg
Train: [1][109/90000] Time 0.422 (0.361) Data 0.003 (0.006) Loss inf (inf)
mtwi_2018_train/images/002394_00012.jpg
Train: [1][110/90000] Time 0.332 (0.361) Data 0.003 (0.006) Loss nan (nan)

command:
python ./main.py --dataset-root mtwi_2018_train --arch densenet121 --alphabet ./data/alphabet_decode_5990.txt --lr 1e-6 --optimizer rmsprop --gpu-id -1 --workers 1 --not-pretrained --batch-size 1 --keep-ratio --print-freq 1

attach 007947_00013.jpg

zhiqwang / sightseq Goto Github PK

sightseq's Issues

Recommend Projects

Recommend Topics

Recommend Org