Comments (11)
Thank you so much for reporting this issue. Indeed the problem was that you had labels that looked like "foo bar", which made it very difficult for the model to train since it is all but impossible to distinguish five spaces from four spaces. I was able to use your dataset to train successfully using the following snippet.
The key line is ' '.join(f.read().split())
which splits the strings using whitespace and then recombines them with only a single space.
import glob
import string
import keras_ocr
def load(label_filepath):
with open(label_filepath) as f:
label = ' '.join(f.read().split())
image_filepath = label_filepath.replace('.txt', '.jpg')
return image_filepath, None, label
labels = list(map(load, glob.glob('sample_dataset/*.txt')))
alphabet = string.ascii_letters + string.digits + '* /.:,+-¥='
assert all(not any(t not in alphabet for t in text) for _, _, text in labels), 'An illegal character was found.'
recognizer = keras_ocr.recognition.Recognizer(alphabet=alphabet)
recognizer.compile()
image_generator = keras_ocr.datasets.get_recognizer_image_generator(alphabet=alphabet, labels=labels, height=31, width=200)
batch_generator = recognizer.get_batch_generator(image_generator=image_generator)
recognizer.training_model.fit(
x=batch_generator,
steps_per_epoch=10
)
I've just pushed 900f873, which adds an assertion to check for this problem. Without this issue, we probably would not have found it. Again, thanks!
from keras-ocr.
Just chiming in as someone who has been able to train the recognizer using a custom dataset and a custom alphabet that includes more than just lowercase letters and digits.
Here's my alphabet:
alphabet = ' #()-./0123456789:ABCDEGHIKLMNRSTUVWabcdeghiklmnoprstuvwyz'
I then instantiate the recognizier like this:
recognizer = keras_ocr.recognition.Recognizer(alphabet=alphabet, weights=None)
Then a training script identical to the one here in the docs works, with the only change being the kwarg for keras_ocr.datasets.get_recognizer_image_generator
being changed from alphabet=recognizer.alphabet
to alphabet=alphabet
.
from keras-ocr.
Thanks @csmcallister! What you proposed is what I was planning to say.
I think the main problem here is that it seems you are expecting the recognizer to be able to pick up on leading / trailing spaces and newlines. The recognizer architecture, being a convolutional recurrent neural network, makes its predictions using full height vertical slices being passed sequentially to the RNN portion of the network. This architecture makes it all but impossible for the network to discern between what is actual whitespace (i.e., margin) and semantically meaningful whitespace (i.e., trailing space) when it occurs at the start or end of a sentence. Spaces embedded within a sentence can be picked up but not spaces at the start or end. This is why the .strip()
is important and should not be removed.
from keras-ocr.
@csmcallister and @faustomorales Thank you both for your quick replies. That is essentially what I'm doing, except for the fact that I'm using actual crops from real images instead of artificially generated images. (dataset was annotated manually and used effectively to train other models.
The strip function is back in place. But the spaces embedded within sentences are not being picked up by the model, instead sending the errors to inf... I can share some code snippets below:
DEFAULT_ALPHABET = '\t\n!"#$\'/()*+.,-0123456789:;=?<>@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]_abcdefghijklmnopqrstuvwxyz{}~¢¥µñÑΩ€'`
recognizer = keras_ocr.recognition.Recognizer(alphabet=DEFAULT_ALPHABET, weights='kurapan')
`
To cleanup the sentences and introduce the start/end characters (which are useful for my application and other RNNs later in the chain), i do:
def clean_text(samples):
new_samples = []
for _,text,img in samples:
text = remove_accents(text)
text = '\t'+text+'\n'
text = ''.join([c for c in text if c in DEFAULT_ALPHABET])
new_samples.append((_,text,img))
return new_samples
Then, I call your batch generator as below:
batch_size = 8
training_gen, validation_gen = [
recognizer.get_batch_generator(
recognizer,
image_generator=image_generator,
batch_size=batch_size
)
for image_generator in [sample_generator(images_train, texts_train),sample_generator(images_val, texts_val) ]
]
And start the training proces..
However, if the dicionary includes the space character, this is what I'm seing in the training log: 100/100 [==============================] - 5s 47ms/step - loss: inf - val_loss: inf
and in the output console:
./tensorflow/core/util/ctc/ctc_loss_calculator.h:499] No valid path found
If the alphabet doesn't include the space character, it is removed by my cleanup function...
from keras-ocr.
The example script that @csmcallister linked to does not use artificially generated images. They are crops from real images.
The recognizer architecture will not be able to detect whitespace characters like \t
or \n
. These must be removed. Spaces are okay, as long as they are between words (like the space between "the" and "fox" in the phrase "the fox").
To help diagnose the exploding gradient, I would suggest using a single image as a test case and see if the gradient continues to explode. If you share a sample image, I can try to take a look.
from keras-ocr.
If you need the whitespace characters to wrap the predictions, that can happen as a post-processing step after the network output, rather than including it as part of the network output.
from keras-ocr.
Thank you, that would be great. I can send you a few sample images privately (please let me know an email address, I can't post them here...).
About the extra characters, they are useful for a seq2seq model that come after in the chain, but for the purposes of this discussion they have been removed (with the same result).
I don't think the problem is exploding gradients per se... the message in the console leads me to believe that the ctc error calculation is aborted for some reason, because the loss changes to inf imediately after that message appears.
JM
from keras-ocr.
Sure, you can reach me at [email protected].
from keras-ocr.
Done. Thank you :)
from keras-ocr.
Hi! I just solved my issue. Turns out that some of my labels had duplicate spaces... I fixed that and now it's training flawlessly :D thanks again for your comments, they definitely pointed me in the right direction (our own data).
Cheers!
from keras-ocr.
Awesome! Thank you very much for this!
from keras-ocr.
Related Issues (20)
- "Tried to convert 'num' to a tensor and failed. Error: None values not supported." HOT 1
- Can I get Korean Text from Image? Using keras-ocr HOT 1
- Open Source License HOT 1
- Adding an example for fine-tuning both detector & recognizer using an your own dataset HOT 4
- Detecting vertical text with craft HOT 3
- Can I extract the text color too?
- Error while import package
- How can I load the models in an offline environment? HOT 1
- Finetuning the recognizer crashes when reaching the fit_generator method
- README.md has 3 image links for running OCR. Second image is not available.
- Text bbox transform
- Train the recognizer
- Filling up RAM
- unable to load fonts. There is some issue not loading fonts while end-to-end training. HOT 1
- Small Issue With Letter Recognition
- is there a way to skip download data_generation.get_backgrounds and data_generation.get_fonts
- tensorflow is missing from requirements
- Readme.md issue
- Pipeline constructor initializing libiomp5 multiple times
- Cannot Download Pipeline: Unrecognized keyword arguments passed to Dense HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from keras-ocr.