zhiqwang / sightseq Goto Github PK
View Code? Open in Web Editor NEWComputer vision tools for fairseq, containing PyTorch implementation of text recognition and object detection
License: MIT License
Computer vision tools for fairseq, containing PyTorch implementation of text recognition and object detection
License: MIT License
I am trying to train a model but always getting accuracy as 0.00 :)
My data folder ->
data.zip
Command used : python ./main.py --dataset-root G:\11\crnn.pytorch-master\crnn.pytorch-master\data --arch densenet121 --alphabet G:\11\crnn.pytorch-master\crnn.pytorch-master\data\alphabet_decode_5990.txt --lr 5e-5 --optimizer rmsprop
Training Log screenshot :
Need help here, thanks in advance for same
Must the training data be of equal length? How to train unequal length labelling data?
我要用mobilenetv2+ctc训练一批自己的数据,数据的size是32258,数据集图片是这样的,
都是32258,一共9k张训练,1k张验证,标签是这样的:
00000000.jpg 144 80 91 9 213 24 16 217 91 682 129 100 5
00000001.jpg 140 481 9 102 2612 31 330 71 65 15 4
00000002.jpg 1688 195 91 49 678 4 24 1166 2700 58 135
每一张图片中的字都是不一定的,是10个左右,比如11,13这样
首先,我在代码中改了:
parser.add_argument('--width', type=int, default=256,)
然后运行的代码为:
python main.py --gpu-id 0 --not-pretrained --optimizer adam
可是在运行了四个epoch后,出现了loss为nan的情况
ctc设置为:
criterion = nn.CTCLoss(zero_infinity=True)
图片的方差和标准差设置了成自己图片的方差和标准差,为:
model_params['mean'] = (0.57680161,0.57680161,0.57680161)
model_params['std'] = (0.1311234,0.1311234,0.1311234)
求问,这是什么原因啊,我按您之前讲解的都设置好了,还是出现了nan的情况,因为我得训练mobilenetv2的网络因为课题需要,谢谢您
python ./main.py --dataset-root datasets --arch densenet121 --alphabet datasets/alphabet_decode_5990.txt --lr 5e-5 --optimizer rmsprop --gpu-id 6 --not-pretrained
Traceback (most recent call last):
File "./main.py", line 387, in
main()
File "./main.py", line 209, in main
_ = train(train_loader, model, criterion, optimizer, epoch)
File "./main.py", line 274, in train
loss = criterion(log_probs, targets, input_lengths, target_lengths)
File "/home/ronghui/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/ronghui/anaconda3/lib/python3.6/site-packages/torch/nn/modules/loss.py", line 1248, in forward
return F.ctc_loss(log_probs, targets, input_lengths, target_lengths, self.blank, self.reduction)
File "/home/ronghui/anaconda3/lib/python3.6/site-packages/torch/nn/functional.py", line 1732, in ctc_loss
return torch.ctc_loss(log_probs, targets, input_lengths, target_lengths, blank, _Reduction.get_enum(reduction))
RuntimeError: CUDA error: an illegal memory access was encountered
I downloaded and tested the pretrained densenet model, but it showed the error messages.
Error(s) in loading state_dict for CRNN:
Unexpected key(s) in state_dict: "features.1.denselayer1.norm1.num_batches_tracked", ...
Is it possible to train the network with images of different size?
Hi, @zhiqwang
Can you, please, explain what happens here?
https://github.com/zhiqwang/crnn.pytorch/blob/master/datasets/dataset.py#L53
After this batch target will always be 1D array, but according to the documentation you need to return batch-wise 2D array.
For example, if you input ([2], [3, 4])
, you will return [2, 3, 4]
I am training a CRNN model in pytorch
max_seq_length=99
number_of_alphabets=96
batch_size=16
output=CRNN(image)
what should be the expected shape of output?
Secondly, should we apply softmax in CRNN after fully connected layer?
Any help would be appreciated. Thanks
Extract the Chinese_dataset.rar to data folder and move all pictures to images, then modify data_test.txt to data_dev.txt.
run main.py, it shows that:
Creating directory if it does not exist:
'./checkpoint/densenet121_rmsprop_lr5.0e-05_wd5.0e-04_bsize64_imsize32'
Using model from scratch (random weights) 'densenet121'
Traceback (most recent call last):
File "/home/luban/repository/crnn.pytorch/main.py", line 352, in
main()
File "/home/luban/repository/crnn.pytorch/main.py", line 197, in main
loss = train(train_loader, model, criterion, optimizer, epoch)
File "/home/luban/repository/crnn.pytorch/main.py", line 231, in train
for i, (images, targets, target_lengths) in enumerate(train_loader):
TypeError: 'DigitsBatchTrain' object is not iterable
Process finished with exit code 1
我用的模型是mobilenetv2,在这个网络中,block的重复次数是增加感受野的,我计算了一下你小尺寸的模型的感受野是139,但是图片的尺寸是32×280,一般来讲,感受野在64附近就是比较合适,这个感受野过大会不会是影响中文识别率不高的一个原因?感谢
The shape of the picture is (Batch, Channel, H, W)
The data shape that the sequence_generate can receive is (batch, seq_len,...)
I did not find a solution in your code, how did you deal with this problem?
Thank you
@zhiqwang, could you please correct dimensions I've added below in comments in the forward() pass of CRNN class, because I cannot figure out what happens after permute line
out = self.features(x) # out: (B, H, W, C)
# features -> pool -> flatten -> decoder -> softmax
out = self.avgpool(out) # out: (B, 1, W, C)
out = out.permute(3, 0, 1, 2).view(out.size(3), out.size(0), -1) # out: (C, B, 1*W)
out = self.classifier(out) # expected in: (B, W, C) != (C, B, 1*W) ?
我用你的模型测试其他图片,无法识别出里面的汉字。
Please, can you share an example for training English text.
CHARMAP used for data include all variable [A-Z a-z0-9 :,>/-].
target = torch.IntTensor([get_key(char_convert,i) for i in target])
TypeError: an integer is required (got type NoneType)
What should be the format of encoded data in annotation file after conversion?
请问下,我这边数字识别精度挺好的,中文识别率为何这么低,而且我的字典里就19个特定的中文字,图片像素也调过,训练数据也产生了1000多w的,是否需要调节模型一些参数?还是crnn里尝试blstm?
mtwi_2018_train/images/001807_00031.jpg
Train: [1][108/90000] Time 0.348 (0.361) Data 0.003 (0.006) Loss 30.0584 (31.5477)
mtwi_2018_train/images/007947_00013.jpg
Train: [1][109/90000] Time 0.422 (0.361) Data 0.003 (0.006) Loss inf (inf)
mtwi_2018_train/images/002394_00012.jpg
Train: [1][110/90000] Time 0.332 (0.361) Data 0.003 (0.006) Loss nan (nan)
command:
python ./main.py --dataset-root mtwi_2018_train --arch densenet121 --alphabet ./data/alphabet_decode_5990.txt --lr 1e-6 --optimizer rmsprop --gpu-id -1 --workers 1 --not-pretrained --batch-size 1 --keep-ratio --print-freq 1
I checked the network roughly, and I found it seems no recurrent layers like Bi-LSTM?
Is this repo another implementation for CRNN? I just see several CNN backbone and fully connected layers, but not found RNN layers.
Don't know hot to use it.
The convolutional part of the architecture act as a encoder part, it capture image's contexture information, the architecture should ensemble a decoder part (deconvolution layer or RNN layer) to recover image's spatial information.
多gpu模式没有加入吗?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.