Hello Breta Appreciate your work, While executing charclassifier wit

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Hi, Breta. You can include "how to make a folder case sensitiv

Error while executing charclassifier.ipynb about handwriting-ocr HOT 13 OPEN

ManojNirale commented on May 26, 2024

Error while executing charclassifier.ipynb

from handwriting-ocr.

Comments (13)

Breta01 commented on May 26, 2024

Hi,

this is interesting. It looks like the something didn't load. Can you set up try-except, exception catching and print the index, image and gaps. This should point us to what is missing. Also check number of files in words2 folder there should be exactly 10138 files.

from handwriting-ocr.

ManojNirale commented on May 26, 2024

Thanks for quick reply, I'll check it and let you know if same I face any Issues. Also I had same file count in words2 folder.

from handwriting-ocr.

haripranavk commented on May 26, 2024

Hi @Breta01
I guess this is because of the issue in characters dataset.
The data in char-class in 'en' doesn't have 53 folders (I downloaded from the link twice, just to be sure) . Whereas in the loadCharsData function, you are assigning chars = CHARS(:53) and then comparing it to d in directory list (in the assertion)
There are only 27 folders.
P.S. -I'm attaching the screenshot of the data downloaded.

from handwriting-ocr.

Breta01 commented on May 26, 2024

That is wierd. When I download the data, I see all 53 folders.

Also the error is in the line:
images, labels = loadCharsData(charloc='',wordloc='data/words2/',lang=LANG)
Which means that he is using words2 folder rather than data/charclas/en.

from handwriting-ocr.

haripranavk commented on May 26, 2024

Hi Breta.
I just realised that windows is case insensitive. When I downloaded the data in linux, it showed all the folders. I'll try asap and let you know if this is the issue for the bug.
Meanwhile, I have another question, how is the error in that line?

Since wordloc represents word location - which is in words2 folder. Similarly, charloc represents the character location which is in data/charclas/en.
In the code in repo (CharClassifier), it is -
images, labels = loadCharsData(
charloc='',
wordloc='data/words2/',
lang=LANG)
Would you please explain that? Thanks!

from handwriting-ocr.

haripranavk commented on May 26, 2024

Okay, I tried running CharClassifier after making my folder case sensitive. But the error persists -

108 for i, gaps in enumerate(gaplines):
109 for pos in range(len(gaps) - 1):
--> 110 imgs[idx] = images[i][0:height, gaps[pos]:gaps[pos+1]]
111 newLabels.append(char2idx(labels[i][pos]))
112 idx += 1

TypeError: 'NoneType' object is not subscriptable

from handwriting-ocr.

Breta01 commented on May 26, 2024

First, I forget that the Windows is case insensitive. I will definitely have to save the data in different way.

To explain: images, labels = loadCharsData(charloc=''", wordloc='data/words2/', lang=LANG)
It means that you can specify both characters location (folders for each letter) or words location which contains word images along with positions where to separate word into letters. If you don't specify the option it isn't used.
This also means that the issue is somewhere in data/words2 folder.

from handwriting-ocr.

haripranavk commented on May 26, 2024

Hi, Breta.

You can include "how to make a folder case sensitive on windows" in your ReadMe. This might help the users.
This link helped -
https://www.howtogeek.com/354220/how-to-enable-case-sensitive-folders-on-windows-10/
Also, the number of files in words2 is exactly 10138 as you mentioned. CharlassificationDM and OCR notebook is perfectly working. The issue is while running CharClassifier. I just checked in one of the 'closed issues' someone else also had a similar issue (NoneType object is not subscriptable) while loading images and labels in csv. I'm unable to understand what the issue is.

from handwriting-ocr.

haripranavk commented on May 26, 2024

Hi, Breta.
I finally found the solution to this issue. There was encoding issue in loadWordsData in datahelpers.py. While loading images, I changed the code to this-
My code
for i, img in enumerate(imglist):
stream = open(img,'rb')
bytes = bytearray(stream.read())
numpyarray = np.asarray(bytes,dtype=np.uint8)
images[i] = cv2.imdecode(numpyarray,cv2.IMREAD_UNCHANGED)

Original Code -
for i, img in enumerate(imglist):
images[i] = cv2.imread(img, 0)

Finally, I'm able to run CharClassifier. Nevertheless, thank you for being so prompt!

from handwriting-ocr.

Breta01 commented on May 26, 2024

Well, this is interesting. It looks like issue is in cv2.imread() function behaviour on Windows. On Ubuntu it works fine. I will try to run it on Windows then.
There are some images with special characters (accents in their path) maybe that cause the problem.

from handwriting-ocr.

longwall commented on May 26, 2024

I faced this issue on my windows env. data.zip cannot be unpacked on Windows normally since the folders of capital letters went to small letters and windows os cannot create 2 folders having names "a" and "A" in one directory. . I uploaded data.zip to unix Solaris system and unzipped it ther in ssh session. All folders unpacked well there but on unix file system some Czech letters turned into 2-symbols pairs due to unconfigured utf locale...
I think good way is to rename all capital letter folder adding underscore symbol: "a" and "_A" - or some similar way. Of course the reading loop in the code needs changes. Now I don't have ability to start any python script of the project as I don't have linux environment, even virtual.

from handwriting-ocr.

Breta01 commented on May 26, 2024

Ok, that sounds reasonably. Can you @longwall make the change? I am quite busy during next few weeks.

from handwriting-ocr.

Breta01 commented on May 26, 2024

In new big update, currently rework branch this will be replaced by loading images from CSV files. Probably, the easiest solution.

from handwriting-ocr.

Error while executing charclassifier.ipynb about handwriting-ocr HOT 13 OPEN

Comments (13)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent