Giter Club home page Giter Club logo

Comments (13)

Breta01 avatar Breta01 commented on May 26, 2024

Hi,

this is interesting. It looks like the something didn't load. Can you set up try-except, exception catching and print the index, image and gaps. This should point us to what is missing. Also check number of files in words2 folder there should be exactly 10138 files.

from handwriting-ocr.

ManojNirale avatar ManojNirale commented on May 26, 2024

Thanks for quick reply, I'll check it and let you know if same I face any Issues. Also I had same file count in words2 folder.

from handwriting-ocr.

haripranavk avatar haripranavk commented on May 26, 2024

Hi @Breta01
I guess this is because of the issue in characters dataset.
The data in char-class in 'en' doesn't have 53 folders (I downloaded from the link twice, just to be sure) . Whereas in the loadCharsData function, you are assigning chars = CHARS(:53) and then comparing it to d in directory list (in the assertion)
There are only 27 folders.
P.S. -I'm attaching the screenshot of the data downloaded.

char data

from handwriting-ocr.

Breta01 avatar Breta01 commented on May 26, 2024

That is wierd. When I download the data, I see all 53 folders.

Also the error is in the line:
images, labels = loadCharsData(charloc='',wordloc='data/words2/',lang=LANG)
Which means that he is using words2 folder rather than data/charclas/en.

from handwriting-ocr.

haripranavk avatar haripranavk commented on May 26, 2024

Hi Breta.
I just realised that windows is case insensitive. When I downloaded the data in linux, it showed all the folders. I'll try asap and let you know if this is the issue for the bug.
Meanwhile, I have another question, how is the error in that line?

  • Since wordloc represents word location - which is in words2 folder. Similarly, charloc represents the character location which is in data/charclas/en.
  • In the code in repo (CharClassifier), it is -
    images, labels = loadCharsData(
    charloc='',
    wordloc='data/words2/',
    lang=LANG)

    Would you please explain that? Thanks!

from handwriting-ocr.

haripranavk avatar haripranavk commented on May 26, 2024

Okay, I tried running CharClassifier after making my folder case sensitive. But the error persists -

108 for i, gaps in enumerate(gaplines):
109 for pos in range(len(gaps) - 1):
--> 110 imgs[idx] = images[i][0:height, gaps[pos]:gaps[pos+1]]
111 newLabels.append(char2idx(labels[i][pos]))
112 idx += 1

TypeError: 'NoneType' object is not subscriptable

from handwriting-ocr.

Breta01 avatar Breta01 commented on May 26, 2024

First, I forget that the Windows is case insensitive. I will definitely have to save the data in different way.

To explain: images, labels = loadCharsData(charloc=''", wordloc='data/words2/', lang=LANG)
It means that you can specify both characters location (folders for each letter) or words location which contains word images along with positions where to separate word into letters. If you don't specify the option it isn't used.
This also means that the issue is somewhere in data/words2 folder.

from handwriting-ocr.

haripranavk avatar haripranavk commented on May 26, 2024

Hi, Breta.

  • You can include "how to make a folder case sensitive on windows" in your ReadMe. This might help the users.
    This link helped -
    https://www.howtogeek.com/354220/how-to-enable-case-sensitive-folders-on-windows-10/
  • Also, the number of files in words2 is exactly 10138 as you mentioned. CharlassificationDM and OCR notebook is perfectly working. The issue is while running CharClassifier. I just checked in one of the 'closed issues' someone else also had a similar issue (NoneType object is not subscriptable) while loading images and labels in csv. I'm unable to understand what the issue is.

from handwriting-ocr.

haripranavk avatar haripranavk commented on May 26, 2024

Hi, Breta.
I finally found the solution to this issue. There was encoding issue in loadWordsData in datahelpers.py. While loading images, I changed the code to this-
My code
for i, img in enumerate(imglist):
stream = open(img,'rb')
bytes = bytearray(stream.read())
numpyarray = np.asarray(bytes,dtype=np.uint8)
images[i] = cv2.imdecode(numpyarray,cv2.IMREAD_UNCHANGED)

Original Code -
for i, img in enumerate(imglist):
images[i] = cv2.imread(img, 0)

Finally, I'm able to run CharClassifier. Nevertheless, thank you for being so prompt!

from handwriting-ocr.

Breta01 avatar Breta01 commented on May 26, 2024

Well, this is interesting. It looks like issue is in cv2.imread() function behaviour on Windows. On Ubuntu it works fine. I will try to run it on Windows then.
There are some images with special characters (accents in their path) maybe that cause the problem.

from handwriting-ocr.

longwall avatar longwall commented on May 26, 2024

I faced this issue on my windows env. data.zip cannot be unpacked on Windows normally since the folders of capital letters went to small letters and windows os cannot create 2 folders having names "a" and "A" in one directory. . I uploaded data.zip to unix Solaris system and unzipped it ther in ssh session. All folders unpacked well there but on unix file system some Czech letters turned into 2-symbols pairs due to unconfigured utf locale...
I think good way is to rename all capital letter folder adding underscore symbol: "a" and "_A" - or some similar way. Of course the reading loop in the code needs changes. Now I don't have ability to start any python script of the project as I don't have linux environment, even virtual.

from handwriting-ocr.

Breta01 avatar Breta01 commented on May 26, 2024

Ok, that sounds reasonably. Can you @longwall make the change? I am quite busy during next few weeks.

from handwriting-ocr.

Breta01 avatar Breta01 commented on May 26, 2024

In new big update, currently rework branch this will be replaced by loading images from CSV files. Probably, the easiest solution.

from handwriting-ocr.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.