OCR using MXNet Gluon. The pipeline is composed of a CNN + biLSTM + CTC. The dataset is from: http://www.fki.inf.unibe.ch/databases/iam-handwriting-database. You need to register and get a username and password from their website.

handwrittentextrecognition_mxnet's People

Contributors

Stargazers

Watchers

handwrittentextrecognition_mxnet's Issues

Kernel intizialiser automatically

When I execute this line of code the kernel initialize automatically any help please!!

net.collect_params().initialize(mx.init.Xavier(), ctx=ctx)

Link to alicewonder.txt

Hi
Can you please provide a link to alicewonder.txt which is needed for beam search with language model?

Thank you

Question on the shape of feature map of OCR_LSTM_CTC

In handwriting_recognition.ipynb：

    def forward(self, x):
        x = x.transpose((0, 3, 1, 2))
        x = x.flatten()
        x = x.split(num_outputs=max_seq_len, axis=1) # (SEQ_LEN, N, CHANNELS)
        x = nd.concat(*[elem.expand_dims(axis=0) for elem in x], dim=0)
        x = self.lstm(x)
        x = x.transpose((1, 0, 2)) #(N, SEQ_LEN, HIDDEN_UNITS)
        return x

I notice the input featuremap for EncoderLayer has first been reshaped by: x = x.transpose((0, 3, 1, 2)) , but I think this code maybe useless, as this kind of transpose is usually done for picture array which has channel at the last dimension, but not for featuremap. Is there a special reason for the code?

In addition, for the reshape before doing lstm, I firstly replace code:

 x = x.split(num_outputs=max_seq_len, axis=1) # (SEQ_LEN, N, CHANNELS)
 x = nd.concat(*[elem.expand_dims(axis=0) for elem in x], dim=0)

with x = x.reshape(SEQ_LEN, BATCH_SIZE, -1), and I found the elements are ordered differently with the old one, though their final shapes are the same. Then I wonder if there is some reason to reshape it the way you did?

what is this global variable 'NUM_CLASSES' in lstm_ocr_ctc?

In lstm_ocr_ctc.ipynb, this is a global variable NUM_CLASSES, but it was not used. So is there something wrong here?

Training CNN LSTM with images of different size without padding images

Hi,

I trained the CNN LSTM CTC model with the IAM line images of size (128, 1600) and i am getting good results with that. But when i tried to test the model with images of different size (100, 500), i am getting shape errors near LSTM. So i resized the image to (128, 1600), i am not getting good results as the image resolution changed.

Is there a way to train CNN LSTM model with images of different sizes without padding the images with 0? And test the same with images of different sizes?

Any suggestions will be helpful...

Thanks,
Harathi

Pickling error on Windows

when i run 1_b_paragraph_segmentation_dcnn

I got this error.

Traceback (most recent call last):
File "", line 1, in
File "C:\Users\Mateen\AppData\Local\conda\conda\envs\pentoscan\lib\multiprocessing\spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "C:\Users\Mateen\AppData\Local\conda\conda\envs\pentoscan\lib\multiprocessing\spawn.py", line 115, in _main
self = reduction.pickle.load(from_parent)
AttributeError: Can't get attribute 'augment_transform' on <module 'main' (built-in)>
Traceback (most recent call last):
File "", line 1, in
File "C:\Users\Mateen\AppData\Local\conda\conda\envs\pentoscan\lib\multiprocessing\spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "C:\Users\Mateen\AppData\Local\conda\conda\envs\pentoscan\lib\multiprocessing\spawn.py", line 115, in _main
self = reduction.pickle.load(from_parent)
AttributeError: Can't get attribute 'augment_transform' on <module 'main' (built-in)>
Traceback (most recent call last):
File "", line 1, in

Please help me I am making my final year project it will help me alot

Thanks!

Pretrained Model word_segmentation2.params

The pertained model word_segmentation2.params, which is used in https://github.com/ThomasDelteil/HandwrittenTextRecognition_MXNet/blob/master/handwriting_ocr.ipynb is not available for download.

Adding license

I wonder on what license repository can be used. Would it be possible to add one?

an issues about your ocr data iteration

Hi,your project is cool ,but your OCR_LSTM_CTC's data iteration is very slow?
Could you update it?
Thank you very much

Question: is it possible to hybridize lstm_ocr_ctc?

Hi, before all amazing work with HandwrittenTextRecognition_MXNet.
It's also my first time working with MxNet so a please be patience.

I've successfully trained my own word ocr net, although when trying to hybridize the net I've not been able to converge into a usable solution.

I think the problem has been presented in [https://github.com//issues/9](Question on the shape of feature map of OCR_LSTM_CTC) witch is the necessity of making this transformation make us use split which returns a NDArray .

Is there a way in which is possible to Hybridize the net and train with it? Maybe do it for an already trained net?

thomasdelteil / handwrittentextrecognition_mxnet Goto Github PK

handwrittentextrecognition_mxnet's People

Contributors

Stargazers

Watchers

Forkers

handwrittentextrecognition_mxnet's Issues

Recommend Projects

Recommend Topics

Recommend Org