Giter Club home page Giter Club logo

Comments (11)

faustomorales avatar faustomorales commented on July 20, 2024 1

Thank you so much for reporting this issue. Indeed the problem was that you had labels that looked like "foo bar", which made it very difficult for the model to train since it is all but impossible to distinguish five spaces from four spaces. I was able to use your dataset to train successfully using the following snippet.

The key line is ' '.join(f.read().split()) which splits the strings using whitespace and then recombines them with only a single space.

import glob
import string

import keras_ocr

def load(label_filepath):
    with open(label_filepath) as f:
        label = ' '.join(f.read().split())
    image_filepath = label_filepath.replace('.txt', '.jpg')
    return image_filepath, None, label

labels = list(map(load, glob.glob('sample_dataset/*.txt')))
alphabet = string.ascii_letters + string.digits + '* /.:,+-¥='
assert all(not any(t not in alphabet for t in text) for _, _, text in labels), 'An illegal character was found.'

recognizer = keras_ocr.recognition.Recognizer(alphabet=alphabet)
recognizer.compile()

image_generator = keras_ocr.datasets.get_recognizer_image_generator(alphabet=alphabet, labels=labels, height=31, width=200)
batch_generator = recognizer.get_batch_generator(image_generator=image_generator)
recognizer.training_model.fit(
    x=batch_generator,
    steps_per_epoch=10
)

I've just pushed 900f873, which adds an assertion to check for this problem. Without this issue, we probably would not have found it. Again, thanks!

from keras-ocr.

csmcallister avatar csmcallister commented on July 20, 2024

Just chiming in as someone who has been able to train the recognizer using a custom dataset and a custom alphabet that includes more than just lowercase letters and digits.

Here's my alphabet:
alphabet = ' #()-./0123456789:ABCDEGHIKLMNRSTUVWabcdeghiklmnoprstuvwyz'

I then instantiate the recognizier like this:

recognizer = keras_ocr.recognition.Recognizer(alphabet=alphabet, weights=None)

Then a training script identical to the one here in the docs works, with the only change being the kwarg for keras_ocr.datasets.get_recognizer_image_generator being changed from alphabet=recognizer.alphabet to alphabet=alphabet.

from keras-ocr.

faustomorales avatar faustomorales commented on July 20, 2024

Thanks @csmcallister! What you proposed is what I was planning to say.

I think the main problem here is that it seems you are expecting the recognizer to be able to pick up on leading / trailing spaces and newlines. The recognizer architecture, being a convolutional recurrent neural network, makes its predictions using full height vertical slices being passed sequentially to the RNN portion of the network. This architecture makes it all but impossible for the network to discern between what is actual whitespace (i.e., margin) and semantically meaningful whitespace (i.e., trailing space) when it occurs at the start or end of a sentence. Spaces embedded within a sentence can be picked up but not spaces at the start or end. This is why the .strip() is important and should not be removed.

from keras-ocr.

cheperuiz avatar cheperuiz commented on July 20, 2024

@csmcallister and @faustomorales Thank you both for your quick replies. That is essentially what I'm doing, except for the fact that I'm using actual crops from real images instead of artificially generated images. (dataset was annotated manually and used effectively to train other models.
The strip function is back in place. But the spaces embedded within sentences are not being picked up by the model, instead sending the errors to inf... I can share some code snippets below:

DEFAULT_ALPHABET = '\t\n!"#$\'/()*+.,-0123456789:;=?<>@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]_abcdefghijklmnopqrstuvwxyz{}~¢¥µñÑΩ€'`

recognizer = keras_ocr.recognition.Recognizer(alphabet=DEFAULT_ALPHABET, weights='kurapan')
`
To cleanup the sentences and introduce the start/end characters (which are useful for my application and other RNNs later in the chain), i do:


def clean_text(samples):
    new_samples = []
    for _,text,img in samples:
        text = remove_accents(text)
        text = '\t'+text+'\n'
        text = ''.join([c for c in text if c in DEFAULT_ALPHABET])
        new_samples.append((_,text,img))
    return new_samples

Then, I call your batch generator as below:

batch_size = 8
training_gen, validation_gen = [
    recognizer.get_batch_generator(
        recognizer,
        image_generator=image_generator,
        batch_size=batch_size
    )
    for image_generator in [sample_generator(images_train, texts_train),sample_generator(images_val, texts_val) ]
]

And start the training proces..
However, if the dicionary includes the space character, this is what I'm seing in the training log: 100/100 [==============================] - 5s 47ms/step - loss: inf - val_loss: inf

and in the output console:
./tensorflow/core/util/ctc/ctc_loss_calculator.h:499] No valid path found

If the alphabet doesn't include the space character, it is removed by my cleanup function...

from keras-ocr.

faustomorales avatar faustomorales commented on July 20, 2024

The example script that @csmcallister linked to does not use artificially generated images. They are crops from real images.

The recognizer architecture will not be able to detect whitespace characters like \t or \n. These must be removed. Spaces are okay, as long as they are between words (like the space between "the" and "fox" in the phrase "the fox").

To help diagnose the exploding gradient, I would suggest using a single image as a test case and see if the gradient continues to explode. If you share a sample image, I can try to take a look.

from keras-ocr.

faustomorales avatar faustomorales commented on July 20, 2024

If you need the whitespace characters to wrap the predictions, that can happen as a post-processing step after the network output, rather than including it as part of the network output.

from keras-ocr.

cheperuiz avatar cheperuiz commented on July 20, 2024

Thank you, that would be great. I can send you a few sample images privately (please let me know an email address, I can't post them here...).

About the extra characters, they are useful for a seq2seq model that come after in the chain, but for the purposes of this discussion they have been removed (with the same result).

I don't think the problem is exploding gradients per se... the message in the console leads me to believe that the ctc error calculation is aborted for some reason, because the loss changes to inf imediately after that message appears.

JM

from keras-ocr.

faustomorales avatar faustomorales commented on July 20, 2024

Sure, you can reach me at [email protected].

from keras-ocr.

cheperuiz avatar cheperuiz commented on July 20, 2024

Done. Thank you :)

from keras-ocr.

cheperuiz avatar cheperuiz commented on July 20, 2024

Hi! I just solved my issue. Turns out that some of my labels had duplicate spaces... I fixed that and now it's training flawlessly :D thanks again for your comments, they definitely pointed me in the right direction (our own data).
Cheers!

from keras-ocr.

cheperuiz avatar cheperuiz commented on July 20, 2024

Awesome! Thank you very much for this!

from keras-ocr.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.