Giter Club home page Giter Club logo

sightseq's Issues

Getting accuracy as 0.00

I am trying to train a model but always getting accuracy as 0.00 :)
My data folder ->
data.zip
Command used : python ./main.py --dataset-root G:\11\crnn.pytorch-master\crnn.pytorch-master\data --arch densenet121 --alphabet G:\11\crnn.pytorch-master\crnn.pytorch-master\data\alphabet_decode_5990.txt --lr 5e-5 --optimizer rmsprop
Training Log screenshot :
image

Need help here, thanks in advance for same

关于加载预训练模型的问题

想问问,我在加载您的与训练模型的时候,出现了这个问题,

image

我用的模型是densnet121,并且把模型放在了
image
这个里面,
想问问,这个是为什么呢?我模型没有改,gpu是单卡运行。
然后我又改了一下这个地方:
image
结果还是这样,加载不了与训练模型,想问问这是什么原因呢?
我在程序中把内置的ctc改为了warpctc,是不是这个原因呢?
谢谢

有关loss变为nan的情况,我看了之前的解答,但还是想问问

我要用mobilenetv2+ctc训练一批自己的数据,数据的size是32258,数据集图片是这样的,
image
都是32
258,一共9k张训练,1k张验证,标签是这样的:
00000000.jpg 144 80 91 9 213 24 16 217 91 682 129 100 5
00000001.jpg 140 481 9 102 2612 31 330 71 65 15 4
00000002.jpg 1688 195 91 49 678 4 24 1166 2700 58 135
每一张图片中的字都是不一定的,是10个左右,比如11,13这样

首先,我在代码中改了:
parser.add_argument('--width', type=int, default=256,)
然后运行的代码为:
python main.py --gpu-id 0 --not-pretrained --optimizer adam
可是在运行了四个epoch后,出现了loss为nan的情况

ctc设置为:
criterion = nn.CTCLoss(zero_infinity=True)
图片的方差和标准差设置了成自己图片的方差和标准差,为:
model_params['mean'] = (0.57680161,0.57680161,0.57680161)
model_params['std'] = (0.1311234,0.1311234,0.1311234)

image
求问,这是什么原因啊,我按您之前讲解的都设置好了,还是出现了nan的情况,因为我得训练mobilenetv2的网络因为课题需要,谢谢您

RuntimeError: CUDA error: an illegal memory access was encountered

python ./main.py --dataset-root datasets --arch densenet121 --alphabet datasets/alphabet_decode_5990.txt --lr 5e-5 --optimizer rmsprop --gpu-id 6 --not-pretrained

Traceback (most recent call last):
File "./main.py", line 387, in
main()
File "./main.py", line 209, in main
_ = train(train_loader, model, criterion, optimizer, epoch)
File "./main.py", line 274, in train
loss = criterion(log_probs, targets, input_lengths, target_lengths)
File "/home/ronghui/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/ronghui/anaconda3/lib/python3.6/site-packages/torch/nn/modules/loss.py", line 1248, in forward
return F.ctc_loss(log_probs, targets, input_lengths, target_lengths, self.blank, self.reduction)
File "/home/ronghui/anaconda3/lib/python3.6/site-packages/torch/nn/functional.py", line 1732, in ctc_loss
return torch.ctc_loss(log_probs, targets, input_lengths, target_lengths, blank, _Reduction.get_enum(reduction))
RuntimeError: CUDA error: an illegal memory access was encountered

Error(s) in loading state_dict for CRNN:

I downloaded and tested the pretrained densenet model, but it showed the error messages.

Error(s) in loading state_dict for CRNN:
Unexpected key(s) in state_dict: "features.1.denselayer1.norm1.num_batches_tracked", ...

Input size

Is it possible to train the network with images of different size?

Help Needed

I am training a CRNN model in pytorch
max_seq_length=99
number_of_alphabets=96
batch_size=16
output=CRNN(image)
what should be the expected shape of output?
Secondly, should we apply softmax in CRNN after fully connected layer?
Any help would be appreciated. Thanks

TypeError: 'DigitsBatchTrain' object is not iterable

Extract the Chinese_dataset.rar to data folder and move all pictures to images, then modify data_test.txt to data_dev.txt.

run main.py, it shows that:

Creating directory if it does not exist:
'./checkpoint/densenet121_rmsprop_lr5.0e-05_wd5.0e-04_bsize64_imsize32'
Using model from scratch (random weights) 'densenet121'
Traceback (most recent call last):
File "/home/luban/repository/crnn.pytorch/main.py", line 352, in
main()
File "/home/luban/repository/crnn.pytorch/main.py", line 197, in main
loss = train(train_loader, model, criterion, optimizer, epoch)
File "/home/luban/repository/crnn.pytorch/main.py", line 231, in train
for i, (images, targets, target_lengths) in enumerate(train_loader):
TypeError: 'DigitsBatchTrain' object is not iterable

Process finished with exit code 1

中文识别率不高是不是因为感受野的原因?

我用的模型是mobilenetv2,在这个网络中,block的重复次数是增加感受野的,我计算了一下你小尺寸的模型的感受野是139,但是图片的尺寸是32×280,一般来讲,感受野在64附近就是比较合适,这个感受野过大会不会是影响中文识别率不高的一个原因?感谢

How is the picture processed in sequence_generate?

The shape of the picture is (Batch, Channel, H, W)
The data shape that the sequence_generate can receive is (batch, seq_len,...)
I did not find a solution in your code, how did you deal with this problem?
Thank you

dimensions in forward pass

@zhiqwang, could you please correct dimensions I've added below in comments in the forward() pass of CRNN class, because I cannot figure out what happens after permute line

out = self.features(x) # out: (B, H, W, C)
# features -> pool -> flatten -> decoder -> softmax
out = self.avgpool(out) # out: (B, 1, W, C)
out = out.permute(3, 0, 1, 2).view(out.size(3), out.size(0), -1) # out: (C, B, 1*W)
out = self.classifier(out) # expected in: (B, W, C) != (C, B, 1*W) ?

annotation file format for English data

Please, can you share an example for training English text.
CHARMAP used for data include all variable [A-Z a-z0-9 :,>/-].

target = torch.IntTensor([get_key(char_convert,i) for i in target])
TypeError: an integer is required (got type NoneType)

What should be the format of encoded data in annotation file after conversion?

中文识别率不高问题

请问下,我这边数字识别精度挺好的,中文识别率为何这么低,而且我的字典里就19个特定的中文字,图片像素也调过,训练数据也产生了1000多w的,是否需要调节模型一些参数?还是crnn里尝试blstm?

loss become inf , then Nan

mtwi_2018_train/images/001807_00031.jpg

Train: [1][108/90000] Time 0.348 (0.361) Data 0.003 (0.006) Loss 30.0584 (31.5477)
mtwi_2018_train/images/007947_00013.jpg
Train: [1][109/90000] Time 0.422 (0.361) Data 0.003 (0.006) Loss inf (inf)
mtwi_2018_train/images/002394_00012.jpg
Train: [1][110/90000] Time 0.332 (0.361) Data 0.003 (0.006) Loss nan (nan)


command:
python ./main.py --dataset-root mtwi_2018_train --arch densenet121 --alphabet ./data/alphabet_decode_5990.txt --lr 1e-6 --optimizer rmsprop --gpu-id -1 --workers 1 --not-pretrained --batch-size 1 --keep-ratio --print-freq 1


attach 007947_00013.jpg
image

Not found recurrent layer in model files

I checked the network roughly, and I found it seems no recurrent layers like Bi-LSTM?
Is this repo another implementation for CRNN? I just see several CNN backbone and fully connected layers, but not found RNN layers.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.