Comments (13)
Hi,
this is interesting. It looks like the something didn't load. Can you set up try-except, exception catching and print the index, image and gaps. This should point us to what is missing. Also check number of files in words2
folder there should be exactly 10138 files.
from handwriting-ocr.
Thanks for quick reply, I'll check it and let you know if same I face any Issues. Also I had same file count in words2 folder.
from handwriting-ocr.
Hi @Breta01
I guess this is because of the issue in characters dataset.
The data in char-class in 'en' doesn't have 53 folders (I downloaded from the link twice, just to be sure) . Whereas in the loadCharsData function, you are assigning chars = CHARS(:53) and then comparing it to d in directory list (in the assertion)
There are only 27 folders.
P.S. -I'm attaching the screenshot of the data downloaded.
from handwriting-ocr.
That is wierd. When I download the data, I see all 53 folders.
Also the error is in the line:
images, labels = loadCharsData(charloc='',wordloc='data/words2/',lang=LANG)
Which means that he is using words2
folder rather than data/charclas/en
.
from handwriting-ocr.
Hi Breta.
I just realised that windows is case insensitive. When I downloaded the data in linux, it showed all the folders. I'll try asap and let you know if this is the issue for the bug.
Meanwhile, I have another question, how is the error in that line?
- Since wordloc represents word location - which is in words2 folder. Similarly, charloc represents the character location which is in data/charclas/en.
- In the code in repo (CharClassifier), it is -
images, labels = loadCharsData(
charloc='',
wordloc='data/words2/',
lang=LANG)
Would you please explain that? Thanks!
from handwriting-ocr.
Okay, I tried running CharClassifier after making my folder case sensitive. But the error persists -
108 for i, gaps in enumerate(gaplines):
109 for pos in range(len(gaps) - 1):
--> 110 imgs[idx] = images[i][0:height, gaps[pos]:gaps[pos+1]]
111 newLabels.append(char2idx(labels[i][pos]))
112 idx += 1
TypeError: 'NoneType' object is not subscriptable
from handwriting-ocr.
First, I forget that the Windows is case insensitive. I will definitely have to save the data in different way.
To explain: images, labels = loadCharsData(charloc=''", wordloc='data/words2/', lang=LANG)
It means that you can specify both characters location (folders for each letter) or words location which contains word images along with positions where to separate word into letters. If you don't specify the option it isn't used.
This also means that the issue is somewhere in data/words2
folder.
from handwriting-ocr.
Hi, Breta.
- You can include "how to make a folder case sensitive on windows" in your ReadMe. This might help the users.
This link helped -
https://www.howtogeek.com/354220/how-to-enable-case-sensitive-folders-on-windows-10/ - Also, the number of files in words2 is exactly 10138 as you mentioned. CharlassificationDM and OCR notebook is perfectly working. The issue is while running CharClassifier. I just checked in one of the 'closed issues' someone else also had a similar issue (NoneType object is not subscriptable) while loading images and labels in csv. I'm unable to understand what the issue is.
from handwriting-ocr.
Hi, Breta.
I finally found the solution to this issue. There was encoding issue in loadWordsData in datahelpers.py. While loading images, I changed the code to this-
My code
for i, img in enumerate(imglist):
stream = open(img,'rb')
bytes = bytearray(stream.read())
numpyarray = np.asarray(bytes,dtype=np.uint8)
images[i] = cv2.imdecode(numpyarray,cv2.IMREAD_UNCHANGED)
Original Code -
for i, img in enumerate(imglist):
images[i] = cv2.imread(img, 0)
Finally, I'm able to run CharClassifier. Nevertheless, thank you for being so prompt!
from handwriting-ocr.
Well, this is interesting. It looks like issue is in cv2.imread() function behaviour on Windows. On Ubuntu it works fine. I will try to run it on Windows then.
There are some images with special characters (accents in their path) maybe that cause the problem.
from handwriting-ocr.
I faced this issue on my windows env. data.zip cannot be unpacked on Windows normally since the folders of capital letters went to small letters and windows os cannot create 2 folders having names "a" and "A" in one directory. . I uploaded data.zip to unix Solaris system and unzipped it ther in ssh session. All folders unpacked well there but on unix file system some Czech letters turned into 2-symbols pairs due to unconfigured utf locale...
I think good way is to rename all capital letter folder adding underscore symbol: "a" and "_A" - or some similar way. Of course the reading loop in the code needs changes. Now I don't have ability to start any python script of the project as I don't have linux environment, even virtual.
from handwriting-ocr.
Ok, that sounds reasonably. Can you @longwall make the change? I am quite busy during next few weeks.
from handwriting-ocr.
In new big update, currently rework
branch this will be replaced by loading images from CSV files. Probably, the easiest solution.
from handwriting-ocr.
Related Issues (20)
- Query: Punctuation Marks HOT 1
- Language HOT 3
- not giving output same as in your github ocr.ipynb ctc model HOT 9
- ValueError: zero-size array to reduction operation minimum which has no identity
- unimplementederror: tensor array has size zero, but element shape [?,256] is not fully defined. currently only static shapes are supported when packing zero-size tensorarray
- File models/gap-clas/CNN-CG.meta does not exist.
- No Function : imageNorm ? HOT 1
- 'TrainingPlot' object has no attribute 'updateCost' HOT 2
- Tensor shape error / not training my images HOT 1
- handwriting-ocr/word_classifier_CTC.ipynb question
- ModuleNotFoundError: No module named 'ocr'
- ValueError: too many values to unpack (expected 2) HOT 5
- training time
- How much time it takes for training i am waiting for 2 hours and what is value of LOSS_ITER and also can you check the train.csv, dev.csv, test.csv i have generated are good to use or have some error?
- What does this code doing and how can i visualize it's output. HOT 1
- ValueError: Cannot feed value of shape (13, 1, 3600) for Tensor 'inputs:0', which has shape '(None, 64, None, 1)'
- Javascript implementation HOT 1
- File does not exist. Received: F:\MY_PROJECT\handwriting-ocr-master\src\ocr\../../models/gap-clas/CNN-CG.meta. HOT 1
- Request for resources
- field to access
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from handwriting-ocr.