Giter Club home page Giter Club logo

keras-ocr's Introduction

keras-ocr Documentation Status

This is a slightly polished and packaged version of the Keras CRNN implementation and the published CRAFT text detection model. It provides a high level API for training a text detection and OCR pipeline.

Please see the documentation for more examples, including for training a custom model.

Getting Started

Installation

keras-ocr supports Python >= 3.6 and TensorFlow >= 2.0.0.

# To install from master
pip install git+https://github.com/faustomorales/keras-ocr.git#egg=keras-ocr

# To install from PyPi
pip install keras-ocr

Using

The package ships with an easy-to-use implementation of the CRAFT text detection model from this repository and the CRNN recognition model from this repository.

import matplotlib.pyplot as plt

import keras_ocr

# keras-ocr will automatically download pretrained
# weights for the detector and recognizer.
pipeline = keras_ocr.pipeline.Pipeline()

# Get a set of three example images
images = [
    keras_ocr.tools.read(url) for url in [
        'https://upload.wikimedia.org/wikipedia/commons/b/bd/Army_Reserves_Recruitment_Banner_MOD_45156284.jpg',
        'https://upload.wikimedia.org/wikipedia/commons/e/e8/FseeG2QeLXo.jpg',
        'https://upload.wikimedia.org/wikipedia/commons/b/b4/EUBanana-500x112.jpg'
    ]
]

# Each list of predictions in prediction_groups is a list of
# (word, box) tuples.
prediction_groups = pipeline.recognize(images)

# Plot the predictions
fig, axs = plt.subplots(nrows=len(images), figsize=(20, 20))
for ax, image, predictions in zip(axs, images, prediction_groups):
    keras_ocr.tools.drawAnnotations(image=image, predictions=predictions, ax=ax)

example of labeled image

Comparing keras-ocr and other OCR approaches

You may be wondering how the models in this package compare to existing cloud OCR APIs. We provide some metrics below and the notebook used to compute them using the first 1,000 images in the COCO-Text validation set. We limited it to 1,000 because the Google Cloud free tier is for 1,000 calls a month at the time of this writing. As always, caveats apply:

  • No guarantees apply to these numbers -- please beware and compute your own metrics independently to verify them. As of this writing, they should be considered a very rough first draft. Please open an issue if you find a mistake. In particular, the cloud APIs have a variety of options that one can use to improve their performance and the responses can be parsed in different ways. It is possible that I made some error in configuration or parsing. Again, please open an issue if you find a mistake!
  • We ignore punctuation and letter case because the out-of-the-box recognizer in keras-ocr (provided by this independent repository) does not support either. Note that both AWS Rekognition and Google Cloud Vision support punctuation as well as upper and lowercase characters.
  • We ignore non-English text.
  • We ignore illegible text.
model latency precision recall
AWS 719ms 0.45 0.48
GCP 388ms 0.53 0.58
keras-ocr (scale=2) 417ms 0.53 0.54
keras-ocr (scale=3) 699ms 0.5 0.59
  • Precision and recall were computed based on an intersection over union of 50% or higher and a text similarity to ground truth of 50% or higher.
  • keras-ocr latency values were computed using a Tesla P4 GPU on Google Colab. scale refers to the argument provided to keras_ocr.pipelines.Pipeline() which determines the upscaling applied to the image prior to inference.
  • Latency for the cloud providers was measured with sequential requests, so you can obtain significant speed improvements by making multiple simultaneous API requests.
  • Each of the entries provides a link to the JSON file containing the annotations made on each pass. You can use this with the notebook to compute metrics without having to make the API calls yourself (though you are encoraged to replicate it independently)!

Why not compare to Tesseract? In every configuration I tried, Tesseract did very poorly on this test. Tesseract performs best on scans of books, not on incidental scene text like that in this dataset.

Advanced Configuration

By default if a GPU is available Tensorflow tries to grab almost all of the available video memory, and this sucks if you're running multiple models with Tensorflow and Pytorch. Setting any value for the environment variable MEMORY_GROWTH will force Tensorflow to dynamically allocate only as much GPU memory as is needed.

You can also specify a limit per Tensorflow process by setting the environment variable MEMORY_ALLOCATED to any float, and this value is a float ratio of VRAM to the total amount present.

To apply these changes, call keras_ocr.config.configure() at the top of your file where you import keras_ocr.

Contributing

To work on the project, start by doing the following. These instructions probably do not yet work for Windows but if a Windows user has some ideas for how to fix that it would be greatly appreciated (I don't have a Windows machine to test on at the moment).

# Install local dependencies for
# code completion, etc.
make init

# Build the Docker container to run
# tests and such.
make build
  • You can get a JupyterLab server running to experiment with using make lab.
  • To run checks before committing code, you can use make format-check type-check lint-check test.
  • To view the documentation, use make docs.

To implement new features, please first file an issue proposing your change for discussion.

To report problems, please file an issue with sample code, expected results, actual results, and a complete traceback.

Troubleshooting

  • This package is installing opencv-python-headless but I would prefer a different opencv flavor. This is due to aleju/imgaug#473. You can uninstall the unwanted OpenCV flavor after installing keras-ocr. We apologize for the inconvenience.

keras-ocr's People

Contributors

alexqwesa avatar algocompretto avatar alwinator avatar bayethiernodiop avatar dependabot[bot] avatar dsp05 avatar ezzaimsoufiane avatar faustomorales avatar foqc avatar jobu9395 avatar kymillev avatar lambdaofgod avatar muayyad-alsadi avatar neverabsolute avatar semaraugusto avatar trellixvulnteam avatar yusukem99 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

keras-ocr's Issues

Image from cv2.VideoCapture not working

I don't know if this is a bug or not but when trying to run the pipeline.recognize on an image coming from a cv2.VideoCapture I get the following error:

Traceback (most recent call last):
File "testing.py", line 106, in
predictions = pipeline.recognize([newimg])[0]
File "C:\Users\RetroHelix\Envs\test\lib\site-packages\keras_ocr\pipeline.py", line 58, in recognize
**recognition_kwargs)
File "C:\Users\RetroHelix\Envs\test\lib\site-packages\keras_ocr\recognition.py", line 437, in recognize_from_boxes
for row in self.prediction_model.predict(X, **kwargs)
File "C:\Users\RetroHelix\Envs\test\lib\site-packages\tensorflow_core\python\keras\engine\training.py", line 909, in predict
use_multiprocessing=use_multiprocessing)
File "C:\Users\RetroHelix\Envs\test\lib\site-packages\tensorflow_core\python\keras\engine\training_arrays.py", line 715, in predict
x, check_steps=True, steps_name='steps', steps=steps)
File "C:\Users\RetroHelix\Envs\test\lib\site-packages\tensorflow_core\python\keras\engine\training.py", line 2472, in _standardize_user_data
exception_prefix='input')
File "C:\Users\RetroHelix\Envs\test\lib\site-packages\tensorflow_core\python\keras\engine\training_utils.py", line 565, in standardize_input_data
'with shape ' + str(data_shape))
ValueError: Error when checking input: expected input_2 to have 4 dimensions, but got array with shape (0, 1)

The image I get from cap.read is a numpy.ndarray and the shape is the same as when using keras_ocr.tools.read so I think it should be ok to use. Any idea whats wrong?
Here is the relevant snippet of my code:

cap = cv2.VideoCapture(args.video)
while cap.isOpened():
    # read frame
    ret, frame = cap.read()
    print(frame.shape)
    print(type(frame))
    
    # predictions is a list of (text, box)
    predictions = pipeline.recognize([frame])[0]

Minor type errors in code for section "Use the model for inference"

Perhaps these two lines from the section "Use the model for inference"
pipeline = keras_ocr.pipelines.Pipeline(detector=detector, recognizer=recognizer)
image, text, lines = next(image_generators[0])

should be
pipeline = keras_ocr.pipeline.Pipeline(detector=detector, recognizer=recognizer)
image, lines = next(image_generators[0])

use recognizer on multiple images

Hello, it would be very useful to be able to use just the recognizer on multiple images like when using the pipeline.
the use case: I use YOLO to extract and crop some text area I am interested in and i want to predict the texts using just the recognizer.
Any suggestions or guides?
Thanks.

unable to install keras-ocr

$ python3 -V
Python 3.6.10

$ lsb_release -a

No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 16.04.6 LTS
Release:	16.04
Codename:	xenial

$ pip3 install keras-ocr

Defaulting to user installation because normal site-packages is not writeable
Collecting keras-ocr
  Using cached keras-ocr-0.6.3.tar.gz (165 kB)
  WARNING: Generating metadata for package keras-ocr produced metadata for project name unknown. Fix your #egg=keras-ocr fragments.
Requirement already satisfied (use --upgrade to upgrade): unknown from https://files.pythonhosted.org/packages/c9/26/97b09f82ee62d3958bc8cd2745e4ea1120b3a231d4b14ef2ee1cfff23d5f/keras-ocr-0.6.3.tar.gz#sha256=594311d7edd7e261bbc8884aec9b5aa19dfb40d451076f55680ecf2a13d2d044 in /home/stark/.local/lib/python3.6/site-packages
Building wheels for collected packages: unknown, unknown
  Building wheel for unknown (setup.py) ... done
  Created wheel for unknown: filename=UNKNOWN-0.6.3-py3-none-any.whl size=1561 sha256=3a266adc697c74c26294d415166fbf5b78281940a53ce8a39bab868703ad7ec2
  Stored in directory: /home/stark/.cache/pip/wheels/0e/3a/51/59648d8e35c96ef61a1ca90c7024bbc80d3bf533c899ed6762
  Building wheel for unknown (setup.py) ... done
  Created wheel for unknown: filename=UNKNOWN-0.6.3-py3-none-any.whl size=1561 sha256=3a266adc697c74c26294d415166fbf5b78281940a53ce8a39bab868703ad7ec2
  Stored in directory: /tmp/pip-ephem-wheel-cache-iqq8m6ge/wheels/b7/9e/31/a6d40c047ea2a4d8f43c101412ba0c98453486f7591b6900dd
Successfully built unknown unknown
$ python3
Python 3.6.10 (default, Dec 19 2019, 23:04:32) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import keras_ocr
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'keras_ocr'

Use trained model

Hello!

Is there any way to pass the model trained to the recognizer? I haven't found anything in the documentation.

Text threshold vs detection threshold

I think you might have mixed up the detection_threshold and text_threshold variables in detection.py since they differ from the original CRAFT implementation and sine there is an incongruity between a comment and the variable name in one instance.

Here you have:

_, text_score = cv2.threshold(textmap,
    thresh=text_threshold,
    maxval=1,
    type=cv2.THRESH_BINARY)

The original implementation uses the low_text parameter here (what you're calling detection_threshold). So I believe text_threshold should be replaced with detection_threshold.

And then a few lines down you have:

# If the maximum value within this connected component is less than
# text threshold, we skip it.
if np.max(textmap[labels == component_id]) < detection_threshold:
     continue

Should detection_threshold be text_threshold as the comment says and as it is in the original implementation?

If so, the fix seems easy. I could open the PR if you'd like. Or you as the maintainer can take it.

Again, thanks for your awesome work on this 👏

text detector - weights and test with efficientnet

Thanks for the amazing work.

did you try craft text-detector with efficientnet?
I saw you have implemented the detection part with two backbones (vgg) and (efficientnet)

but the part of calling the efficientnet and weights are not provided
is there any plan to include it in future?

use augmenter in `convert_image_generator_to_recognizer_input`

hello @faustomorales it would be cool to pass an augmenter to the convert_image_generator_to_recognizer_input to add some noise in the generated crop, I know it's possible to pass an augmenter to the get_image_generator but that augmenter is applied before adding the text to the background, however most of the time we want to add the background after adding the text since real-life images are in that form. if you know an existing way of having this behavior please let me know otherwise I would be glad to implement it (I have the use case)

Detector training not working

The example of fine-tuning the detector in the docs isn't working with the 0.8.0 release, although other examples, like this one, are working.

Downgrading to 0.6.3 got the example working again (intermediate versions, e.g. 0.7.x, were also failing with the same error, which is detailed below).

To reproduce, create an empty python 3.7.4 conda environment with the following installs on Windows 10:

conda install -c anaconda tensorflow-gpu
pip install keras-ocr
pip install scikit-learn
conda install -c conda-forge shapely

I then copy-pasted that fine-tuning example into train.py and got the following when running it:

python train.py
2020-04-03 13:30:07.403362: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_101.dll
Looking for .\icdar2013\Challenge2_Training_Task12_Images.zip
Downloading .\icdar2013\Challenge2_Training_Task12_Images.zip
Looking for .\icdar2013\Challenge2_Training_Task2_GT.zip
Downloading .\icdar2013\Challenge2_Training_Task2_GT.zip
Looking for C:\Users\scottmcallister\.keras-ocr\craft_mlt_25k.h5

...
...<LOTS OF TENSORFLOW GPU MESSAGES>
...

WARNING:tensorflow:sample_weight modes were coerced from
  ...
    to
  ['...']
WARNING:tensorflow:sample_weight modes were coerced from
  ...
    to
  ['...']
Train for 183 steps, validate for 46 steps
Epoch 1/1000
  1/183 [..............................] - ETA: 1:35WARNING:tensorflow:Early stopping conditioned on metric `val_loss` which is not available. Available metrics are:
  1/183 [..............................] - ETA: 2:20Traceback (most recent call last):
  File "train.py", line 67, in <module>
    validation_steps=math.ceil(len(validation) / batch_size)
  File "C:\Users\scottmcallister\anaconda3\envs\keras-ocr-test\lib\site-packages\tensorflow_core\python\util\deprecation.py", line 324, in new_func
    return func(*args, **kwargs)
  File "C:\Users\scottmcallister\anaconda3\envs\keras-ocr-test\lib\site-packages\tensorflow_core\python\keras\engine\training.py", line 1306, in fit_generator
    initial_epoch=initial_epoch)
  File "C:\Users\scottmcallister\anaconda3\envs\keras-ocr-test\lib\site-packages\tensorflow_core\python\keras\engine\training.py", line 819, in fit
    use_multiprocessing=use_multiprocessing)
  File "C:\Users\scottmcallister\anaconda3\envs\keras-ocr-test\lib\site-packages\tensorflow_core\python\keras\engine\training_v2.py", line 342, in fit
    total_epochs=epochs)
  File "C:\Users\scottmcallister\anaconda3\envs\keras-ocr-test\lib\site-packages\tensorflow_core\python\keras\engine\training_v2.py", line 128, in run_one_epoch
    batch_outs = execution_function(iterator)
  File "C:\Users\scottmcallister\anaconda3\envs\keras-ocr-test\lib\site-packages\tensorflow_core\python\keras\engine\training_v2_utils.py", line 98, in execution_function
    distributed_function(input_fn))
  File "C:\Users\scottmcallister\anaconda3\envs\keras-ocr-test\lib\site-packages\tensorflow_core\python\eager\def_function.py", line 568, in __call__
    result = self._call(*args, **kwds)
  File "C:\Users\scottmcallister\anaconda3\envs\keras-ocr-test\lib\site-packages\tensorflow_core\python\eager\def_function.py", line 615, in _call
    self._initialize(args, kwds, add_initializers_to=initializers)
  File "C:\Users\scottmcallister\anaconda3\envs\keras-ocr-test\lib\site-packages\tensorflow_core\python\eager\def_function.py", line 497, in _initialize
    *args, **kwds))
  File "C:\Users\scottmcallister\anaconda3\envs\keras-ocr-test\lib\site-packages\tensorflow_core\python\eager\function.py", line 2389, in _get_concrete_function_internal_garbage_collected
    graph_function, _, _ = self._maybe_define_function(args, kwargs)
  File "C:\Users\scottmcallister\anaconda3\envs\keras-ocr-test\lib\site-packages\tensorflow_core\python\eager\function.py", line 2703, in _maybe_define_function
    graph_function = self._create_graph_function(args, kwargs)
  File "C:\Users\scottmcallister\anaconda3\envs\keras-ocr-test\lib\site-packages\tensorflow_core\python\eager\function.py", line 2593, in _create_graph_function
    capture_by_value=self._capture_by_value),
  File "C:\Users\scottmcallister\anaconda3\envs\keras-ocr-test\lib\site-packages\tensorflow_core\python\framework\func_graph.py", line 978, in func_graph_from_py_func
    func_outputs = python_func(*func_args, **func_kwargs)
  File "C:\Users\scottmcallister\anaconda3\envs\keras-ocr-test\lib\site-packages\tensorflow_core\python\eager\def_function.py", line 439, in wrapped_fn
    return weak_wrapped_fn().__wrapped__(*args, **kwds)
  File "C:\Users\scottmcallister\anaconda3\envs\keras-ocr-test\lib\site-packages\tensorflow_core\python\keras\engine\training_v2_utils.py", line 85, in distributed_function
    per_replica_function, args=args)
  File "C:\Users\scottmcallister\anaconda3\envs\keras-ocr-test\lib\site-packages\tensorflow_core\python\distribute\distribute_lib.py", line 763, in experimental_run_v2
    return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs)
  File "C:\Users\scottmcallister\anaconda3\envs\keras-ocr-test\lib\site-packages\tensorflow_core\python\distribute\distribute_lib.py", line 1819, in call_for_each_replica
    return self._call_for_each_replica(fn, args, kwargs)
  File "C:\Users\scottmcallister\anaconda3\envs\keras-ocr-test\lib\site-packages\tensorflow_core\python\distribute\distribute_lib.py", line 2164, in _call_for_each_replica
    return fn(*args, **kwargs)
  File "C:\Users\scottmcallister\anaconda3\envs\keras-ocr-test\lib\site-packages\tensorflow_core\python\autograph\impl\api.py", line 292, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\scottmcallister\anaconda3\envs\keras-ocr-test\lib\site-packages\tensorflow_core\python\keras\engine\training_v2_utils.py", line 433, in train_on_batch
    output_loss_metrics=model._output_loss_metrics)
  File "C:\Users\scottmcallister\anaconda3\envs\keras-ocr-test\lib\site-packages\tensorflow_core\python\keras\engine\training_eager.py", line 312, in train_on_batch
    output_loss_metrics=output_loss_metrics))
  File "C:\Users\scottmcallister\anaconda3\envs\keras-ocr-test\lib\site-packages\tensorflow_core\python\keras\engine\training_eager.py", line 253, in _process_single_batch
    training=training))
  File "C:\Users\scottmcallister\anaconda3\envs\keras-ocr-test\lib\site-packages\tensorflow_core\python\keras\engine\training_eager.py", line 171, in _model_loss
    reduction=losses_utils.ReductionV2.NONE)
  File "C:\Users\scottmcallister\anaconda3\envs\keras-ocr-test\lib\site-packages\tensorflow_core\python\keras\utils\losses_utils.py", line 107, in compute_weighted_loss
    losses, sample_weight)
  File "C:\Users\scottmcallister\anaconda3\envs\keras-ocr-test\lib\site-packages\tensorflow_core\python\ops\losses\util.py", line 148, in scale_losses_by_sample_weight
    sample_weight = weights_broadcast_ops.broadcast_weights(sample_weight, losses)
  File "C:\Users\scottmcallister\anaconda3\envs\keras-ocr-test\lib\site-packages\tensorflow_core\python\ops\weights_broadcast_ops.py", line 167, in broadcast_weights
    with ops.control_dependencies((assert_broadcastable(weights, values),)):
  File "C:\Users\scottmcallister\anaconda3\envs\keras-ocr-test\lib\site-packages\tensorflow_core\python\ops\weights_broadcast_ops.py", line 103, in assert_broadcastable
    weights_rank_static, values.shape, weights.shape))
ValueError: weights can not be broadcast to values. values.rank=3. weights.rank=1. values.shape=(None, None, None). weights.shape=(None,).

The source of the error can probably be uncovered here, likely within detection.py. I'd try to uncover myself, but your familiarity with the source might be more expeditious.

Combining word boxes into lines or paragraphs

Is there anyway (maybe any built-in function) to detect EOL chars in a large text? Or, maybe it must be done by the client by comparing the words position vector.
Thanks in advance.

Batch Processing

Hi,

Thanks for this work.

Is there any way we can perform batch processing during inference instead of passing single image every time?

ValueError: operands could not be broadcast together with shapes

Hi, I was just running the old example code from some versions before with the current release (this time under windows):

import matplotlib.pyplot as plt

import keras_ocr

# keras-ocr will automatically download pretrained
# weights for the detector and recognizer.
pipeline = keras_ocr.pipeline.Pipeline()

image = keras_ocr.tools.read('test7.png')

# Predictions is a list of (text, box) tuples.
predictions = pipeline.recognize(image)

# Plot the results.
fig, ax = plt.subplots()
ax.imshow(keras_ocr.tools.drawBoxes(image, predictions, boxes_format='predictions'))
for text, box in predictions:
    ax.annotate(s=text, xy=box[0], xytext=box[0] - 50, arrowprops={'arrowstyle': '->'})
    print(text)
plt.show()

It seems to not work anymore. It errors out on line 12 (predictions = ...) with the error:

File "bla.py", line 12, in
predictions = pipeline.recognize(image)
File "C:\Users\RetroHelix\Envs\test\lib\site-packages\keras_ocr\pipeline.py", line 55, in recognize
box_groups = self.detector.detect(images=images, **detection_kwargs)
File "C:\Users\RetroHelix\Envs\test\lib\site-packages\keras_ocr\detection.py", line 647, in detect
images = [compute_input(tools.read(image)) for image in images]
File "C:\Users\RetroHelix\Envs\test\lib\site-packages\keras_ocr\detection.py", line 647, in
images = [compute_input(tools.read(image)) for image in images]
File "C:\Users\RetroHelix\Envs\test\lib\site-packages\keras_ocr\detection.py", line 40, in compute_input
image -= mean * 255
ValueError: operands could not be broadcast together with shapes (1280,6) (3,) (1280,6)

I guess something changed and I have to alter the code a bit. Any idea whats wrong?

PS. It would be nice to have super easy examples. Something like "Detect Text Using Pretrained Model", "Recognize Text Using Pretrained Model" and both together. Just with one image.

Undesired behavior when detecting and recognizing numbers with commas

Hello. First of all, thanks for this library, it is really useful and even with the problem I'm going to describe, I find the results very good overall

I am working with scanned documents.

I have found that in most scenarios where there's a number with a comma (decimal number), the text detection separates the number at the left of the comma and the one at the right of the comma. I have also found that when this happens, the comma either:

  • Falls in the left text box, and it is usually recognized as a "1", or
  • Is not detected at all

In this example you can find both behaviors:

imagen

Maybe this is because training data for text detection is not contemplating these cases?

advice on building dataset for generique OCR

Hi everyone, some guidelines or links on how to generate a dataset for a generic OCR, I am concerned more in the background part. Also, what would be the "maximum" mean edit distance of an OCR model that is used in production.

End to end training tensorflow.python.framework.errors_impl.ResourceExhaustedError

I encountered following error when I ran end to end training code.
File "", line 3, in raise_from
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[1,64,160,160] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[node model_1/upconv3.conv.3/Conv2D (defined at train2.py:140) ]]

Train2.py contains a slightly modified version of your end to end training software. I had to modify your code to suit my Windows environment e.g. Windows file names cannot contain characters like colon and to support re-run.

Error occurs at following line:

detector.model.fit_generator(
generator=detection_train_generator,
steps_per_epoch=math.ceil(len(background_splits[0]) / detector_batch_size),
epochs=1000,
workers=0,
callbacks=[
tf.keras.callbacks.EarlyStopping(restore_best_weights=True, patience=5),
tf.keras.callbacks.CSVLogger(f'{detector_basepath}.csv'),
tf.keras.callbacks.ModelCheckpoint(filepath=f'{detector_basepath}.h5')
],
validation_data=detection_val_generator,
validation_steps=math.ceil(len(background_splits[1]) / detector_batch_size)
)

I have attached the file containing the source code below.
train2py.pdf

I have also attached all of the messages displayed at the Windows command line when I ran
python train2.py.

log1.log

I have also attached the ouput of my pip list so that you can see what python modules and versions that I have installed.

pip-list.txt

My gpu is just a Nvidia GeForce MX150 with 4GB RAM. My PC has 16GB RAM.

opencv-contrib-python-headless VS opencv-contrib-python

Would be nice if keras-ocr would use opencv-contrib-python instead of the headless version. Because the headless version does not have any GUI functionality. This way a user don't has to uninstall and install the desired opencv packages if he wants to have opencv gui functionalities.
What do you think?

Custom fine-tune model

Is there currently any way of fine-tunning on a different model than the default one? (like passing a .ht file to the recognizer)

Also is there any way of training the recognizer with multiple gpus?

Thank you :)

Performance on CPU

Thanks for this wonderful work! I've run sample code(use pretrained model) on mac with intel i7 + 16G memory, the result is pretty well, but the performance of pipeline is really poor(over 10min on a 3968x2976 phone image). 
Also every core of CPU is fully used.Compared with model's performance on both two origin repo, I guess that there should be bottleneck in the pipeline. 
I wonder if you were tested it on similar system @faustomorales .

CUDNN_STATUS_EXECUTION_FAILED

I'm running the text detection and the recognition on every frame of a video to extract hardcoded subtitles (on windows). This works quite well although its a bit slow. But letting my program run for some minutes (the time differs) I always get this error: CUDNN_STATUS_EXECUTION_FAILED
I don't think its a bug of keras-ocr but I don't have a clue how to resolve this error or were to ask. From what I found by searching the internet it could be a driver issue... Any idea?

Here is the full log:

2020-04-09 17:08:52.593789: E tensorflow/stream_executor/dnn.cc:588] CUDNN_STATUS_EXECUTION_FAILED
in tensorflow/stream_executor/cuda/cuda_dnn.cc(1796): 'cudnnRNNForwardTraining( cudnn.handle(), rnn_desc.handle(), model_dims.max_seq_length, input_desc.handles(), input_data.opaque(), input_h_desc.handle(), input_h_data.opaque(), input_c_desc.handle(), input_c_data.opaque(), rnn_desc.params_handle(), params.opaque(), output_desc.handles(), output_data->opaque(), output_h_desc.handle(), output_h_data->opaque(), output_c_desc.handle(), output_c_data->opaque(), workspace.opaque(), workspace.size(), reserve_space.opaque(), reserve_space.size())'
2020-04-09 17:08:52.605393: W tensorflow/core/framework/op_kernel.cc:1622] OP_REQUIRES failed at cudnn_rnn_ops.cc:1498 : Internal: Failed to call ThenRnnForward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]: 2, 0, 0 , [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size, cell_num_units]: [1, 128, 128, 1, 50, 4, 128]
2020-04-09 17:08:52.612678: W tensorflow/core/common_runtime/base_collective_executor.cc:216] BaseCollectiveExecutor::StartAbort Internal: Failed to call ThenRnnForward with model config: [rnn_mode, rnn_input_mode, rnn_direction_mode]: 2, 0, 0 , [num_layers, input_size, num_units, dir_count, max_seq_length, batch_size, cell_num_units]: [1, 128, 128, 1, 50, 4, 128]
[[{{node CudnnRNN}}]]
2020-04-09 17:08:52.621696: W tensorflow/core/common_runtime/base_collective_executor.cc:216] BaseCollectiveExecutor::StartAbort Cancelled: [Derived]RecvAsync is cancelled.
[[{{node decode/PadV2/paddings/_78}}]]
[[decode/Shape_1/_76]]
2020-04-09 17:08:52.624991: W tensorflow/core/common_runtime/base_collective_executor.cc:216] BaseCollectiveExecutor::StartAbort Cancelled: [Derived]RecvAsync is cancelled.
[[{{node decode/PadV2/paddings/_78}}]]
Traceback (most recent call last):
File "VideoSubDetect.py", line 199, in
recognizedtext = recognizer.recognize_from_boxes([frame], [sorted_box_group])
File "C:\Users\RetroHelix\Envs\test\lib\site-packages\keras_ocr\recognition.py", line 439, in recognize_from_boxes
for row in self.prediction_model.predict(X, **kwargs)
File "C:\Users\RetroHelix\Envs\test\lib\site-packages\tensorflow_core\python\keras\engine\training.py", line 909, in predict
use_multiprocessing=use_multiprocessing)
File "C:\Users\RetroHelix\Envs\test\lib\site-packages\tensorflow_core\python\keras\engine\training_arrays.py", line 722, in predict
callbacks=callbacks)
File "C:\Users\RetroHelix\Envs\test\lib\site-packages\tensorflow_core\python\keras\engine\training_arrays.py", line 393, in model_iteration
batch_outs = f(ins_batch)
File "C:\Users\RetroHelix\Envs\test\lib\site-packages\tensorflow_core\python\keras\backend.py", line 3740, in call
outputs = self._graph_fn(*converted_inputs)
File "C:\Users\RetroHelix\Envs\test\lib\site-packages\tensorflow_core\python\eager\function.py", line 1081, in call
return self._call_impl(args, kwargs)
File "C:\Users\RetroHelix\Envs\test\lib\site-packages\tensorflow_core\python\eager\function.py", line 1121, in _call_impl
return self._call_flat(args, self.captured_inputs, cancellation_manager)
File "C:\Users\RetroHelix\Envs\test\lib\site-packages\tensorflow_core\python\eager\function.py", line 1224, in _call_flat
ctx, args, cancellation_manager=cancellation_manager)
File "C:\Users\RetroHelix\Envs\test\lib\site-packages\tensorflow_core\python\eager\function.py", line 511, in call
ctx=ctx)
File "C:\Users\RetroHelix\Envs\test\lib\site-packages\tensorflow_core\python\eager\execute.py", line 67, in quick_execute
six.raise_from(core._status_to_exception(e.code, message), None)
File "", line 3, in raise_from
tensorflow.python.framework.errors_impl.CancelledError: [Derived]RecvAsync is cancelled.
[[{{node decode/PadV2/paddings/_78}}]] [Op:__inference_keras_scratch_graph_15223]

error terminate called after throwing an instance of 'std::bad_alloc'

i use your demo code, and my app was crash. It show

2019-11-19 16:05:38.112665: W tensorflow/core/framework/allocator.cc:107] Allocation of 3121348608 exceeds 10% of system memory.
2019-11-19 16:05:40.014946: W tensorflow/core/framework/allocator.cc:107] Allocation of 3121348608 exceeds 10% of system memory.
2019-11-19 16:05:40.014951: W tensorflow/core/framework/allocator.cc:107] Allocation of 3121348608 exceeds 10% of system memory.
2019-11-19 16:05:45.262924: W tensorflow/core/framework/allocator.cc:107] Allocation of 3121348608 exceeds 10% of system memory.
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc

I do not have GPU, my memory is 32GB.
the code i use

import matplotlib.pyplot as plt

import keras_ocr

detector = keras_ocr.detection.Detector(pretrained=True)
image = keras_ocr.tools.read('tests/test_image.jpg')

boxes = detector.detect(images=[image])[0]
canvas = keras_ocr.detection.drawBoxes(image, boxes)
plt.imshow(canvas)

How can i run this?

Hi, I got an Error when I import keras_ocr, could you help me solve the problem?

I have tensorflow 2.0.0 python 3.7
When I import keras_ocr
I got the error:
2020-03-11 08:32:13.540050: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_100.dll
Traceback (most recent call last):
File "", line 1, in
File "C:\Users\jim\Anaconda3\Lib\site-packages\keras_ocr_init_.py", line 1, in
from . import (detection, recognition, tools, data_generation, pipeline, evaluation, datasets,
File "C:\Users\jim\Anaconda3\Lib\site-packages\keras_ocr\detection.py", line 31, in
from . import tools
File "C:\Users\jim\Anaconda3\Lib\site-packages\keras_ocr\tools.py", line 14, in
from shapely import geometry
File "C:\Users\jim\Anaconda3\Lib\site-packages\shapely\geometry_init_.py", line 4, in
from .base import CAP_STYLE, JOIN_STYLE
File "C:\Users\jim\Anaconda3\Lib\site-packages\shapely\geometry\base.py", line 18, in
from shapely.coords import CoordinateSequence
File "C:\Users\jim\Anaconda3\Lib\site-packages\shapely\coords.py", line 8, in
from shapely.geos import lgeos
File "C:\Users\jim\Anaconda3\Lib\site-packages\shapely\geos.py", line 145, in
lgeos = CDLL(os.path.join(sys.prefix, 'Library', 'bin', 'geos_c.dll'))
File "C:\Users\jim\Anaconda3\lib\ctypes_init
.py", line 356, in init
self._handle = _dlopen(self._name, mode)
OSError: [WinError 126] 找不到指定的模块。

recognition improvement

hi, how can i improve the recognition accuracy of the noisy images and text with same background, i have attached some examples here

tyre
tyre1

TensorFlow warnings about unnecessary retracing

The following minimal example (main.py)

import keras_ocr

pipeline = keras_ocr.pipeline.Pipeline()

image_urls = [
    "https://i.imgur.com/euIw5Dt.png",
    "https://i.imgur.com/fAT6keX.png",
    "https://i.imgur.com/RlxBrvX.png",
    "https://i.imgur.com/pWBX9z5.png",
    "https://i.imgur.com/tzfitxz.png",
    "https://i.imgur.com/VPPpRJg.png"
]

for image_url in image_urls:
    image = keras_ocr.tools.read(image_url)
    predictions = pipeline.recognize([image])

produces there TensorFlow warnings:

WARNING:tensorflow:5 out of the last 5 calls to <function _make_execution_function.<locals>.distributed_function at 0x7f0284309cb0> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings is likely due to passing python objects instead of tensors. Also, tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. Please refer to https://www.tensorflow.org/tutorials/customization/performance#python_or_tensor_args and https://www.tensorflow.org/api_docs/python/tf/function for more details.
WARNING:tensorflow:6 out of the last 6 calls to <function _make_execution_function.<locals>.distributed_function at 0x7f0284309cb0> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings is likely due to passing python objects instead of tensors. Also, tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. Please refer to https://www.tensorflow.org/tutorials/customization/performance#python_or_tensor_args and https://www.tensorflow.org/api_docs/python/tf/function for more details.

It seems this can negatively impact performance.

The situation can be reproduced by running

docker build -t deleteme .

with the following Dockerfile:

FROM python:3.7

ENV CUDA_VISIBLE_DEVICES="-1"
RUN pip install tensorflow==2.1.0 keras-ocr==0.8.3

# Disable the Docker cache from this stage on, see https://stackoverflow.com/a/58801213/1866775
ADD "https://www.random.org/cgi-bin/randbyte?nbytes=10&format=h" skipcache

ADD ./main.py /
RUN python /main.py

Jupyter notebook crashing

Whenever the Pipeline function is activated in two different kernels, one of them crashes.

Reproducible exemple:

import keras_ocr

pipeline = keras_ocr.pipeline.Pipeline()

Try adding this cell to 2 different kernels and run them.

Using trained model for predictions

Once we've finished a training and generated the .h5 file, how do we create a new detector (or recognizer) object that will use that model and the new calculated weights?

build_params is ignored in Recognizer init method

Hi,
I think I found a simple bug, build_params is ignored when calling

Recognizer(weights=None, alphabet=alphabet, build_params=build_params)

Changing the default of the build_params kwarg from None to {} and this change to line 315 would do it, allowing to pass only some keys and keep the defaults for the rest.

build_params = {
                k: build_params.get(k, DEFAULT_BUILD_PARAMS[k]) for k, v in DEFAULT_BUILD_PARAMS.items()
            }

TY
...or a cleaner version of that ;)
Cheers!

Unsupported depth of input image

I just ran the example from:

https://keras-ocr.readthedocs.io/en/latest/examples/using_pretrained_models.html

with several png and jpg files and always got the following error:

(keraspypi) retrohelix@retrohelix-P64-HJ-HK1:~/.virtualenvs/test$ python bla.py
Looking for /home/retrohelix/.keras-ocr/craft_mlt_25k.h5
2020-01-04 16:40:15.855346: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-01-04 16:40:15.877424: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2808000000 Hz
2020-01-04 16:40:15.878135: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x3dfe2c0 executing computations on platform Host. Devices:
2020-01-04 16:40:15.878149: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): Host, Default Version
WARNING:tensorflow:From /home/retrohelix/.virtualenvs/keraspypi/lib/python3.6/site-packages/tensorflow_core/python/keras/backend.py:5783: sparse_to_dense (from tensorflow.python.ops.sparse_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Create a tf.sparse.SparseTensor and use tf.sparse.to_dense instead.
Looking for /home/retrohelix/.keras-ocr/crnn_kurapan.h5
Traceback (most recent call last):
File "bla.py", line 12, in
predictions = pipeline.recognize(image=image)
File "/home/retrohelix/.virtualenvs/keraspypi/lib/python3.6/site-packages/keras_ocr/pipeline.py", line 49, in recognize
**recognition_kwargs)
File "/home/retrohelix/.virtualenvs/keraspypi/lib/python3.6/site-packages/keras_ocr/recognition.py", line 398, in recognize_from_boxes
[cv2.cvtColor(crop, cv2.COLOR_RGB2GRAY)[..., np.newaxis] for crop in crops])
File "/home/retrohelix/.virtualenvs/keraspypi/lib/python3.6/site-packages/keras_ocr/recognition.py", line 398, in
[cv2.cvtColor(crop, cv2.COLOR_RGB2GRAY)[..., np.newaxis] for crop in crops])
cv2.error: OpenCV(4.1.2) /io/opencv/modules/imgproc/src/color.simd_helpers.hpp:94: error: (-2:Unspecified error) in function 'cv::impl::{anonymous}::CvtHelper<VScn, VDcn, VDepth, sizePolicy>::CvtHelper(cv::InputArray, cv::OutputArray, int) [with VScn = cv::impl::{anonymous}::Set<3, 4>; VDcn = cv::impl::{anonymous}::Set<1>; VDepth = cv::impl::{anonymous}::Set<0, 2, 5>; cv::impl::{anonymous}::SizePolicy sizePolicy = (cv::impl::::SizePolicy)2u; cv::InputArray = const cv::_InputArray&; cv::OutputArray = const cv::_OutputArray&]'

Unsupported depth of input image:
'VDepth::contains(depth)'
where
'depth' is 4 (CV_32S)

Any idea what I can do about this?

Multi gpu Training

Hello, how would someone use multiple gpus to train, since with tf2 the recomended way is using tf.distribute.Strategy but this suppose you do the model definition and compilation in the strategy context, do we still use multi_gpu_model from keras on the recognizer.training_model for example : recognizer.training_model = multi_gpu_model(recognizer.training_model, gpus=4)

Non-English Characters Problem

I am trying to train recognizer model with my custom alphabet:

alphabet = ''.join(
    [ 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'R', 'S', 'T', 'U', 'V', 'W', 'X',
     'Y', 'Z',  '[', '_', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'k', 'l', 'm', 'n', 'o', 'p', 'r', 's', 't', 'u', 'v', 'w',
     'x', 'y', 'z', '|', 'Ç', 'Ö', 'Ü', 'ç', 'ö', 'ü', 'Ğ', 'ğ', 'İ', 'ı', 'Ş', 'ş'])
recognizer_alphabet = ''.join(sorted(set(alphabet)))

And I got the error below:
for c in ''.join(sentences)), 'Found illegal characters in sentence.'
AssertionError: Found illegal characters in sentence.

And these are sentences:
sentences: ['gıda', 'inş. iç v', 'e dış. ti', 'c. ltd.', 'a', 'zie baina', 'a', 'dr']

Very slow detector inference speed

It seems the getBoxes method in the detector is very slow and the main bottleneck for good inference speeds. I was able to modify the detection part to work on batches using the tf.data.datasets API, which made the detection part quite fast. I used a batch size of 8 on a V100 GPU.

Time for detection (8660 images 1000x1000): 1032 s
Time for getBoxes of those images: 11 729 s

Any ideas how to improve the performance of the getBoxes step? I assume it is slow cause it has to process each result one at a time, get connectedcomponents and then process those one at a time.

I'm gonna try some things with Numba to speed up the loop over each connectedcomponent.

cairo vs opencv

Hello, thanks for this great tool, In the original example on Keras Cairo was used to generate datasets, I just want to know why change to OpenCV

ValueError : Too many values to unpack ( OpenCV issue )

The error is related to OpenCV findContours meethod call.

My Env:

Python : 3.6..4
OpenCV: 3.4.2.16

Code to reproduce:

image = keras_ocr.tools.read('X00016469612.jpg')

# Predictions is a list of (text, box) tuples.
predictions = pipeline.recognize(image=image)

Error:

ValueError: too many values to unpack (expected 2)

Points to getBoxes method:

contours, _ = cv2.findContours(segmap.astype('uint8'),mode=cv2.RETR_TREE, method=cv2.CHAIN_APPROX_SIMPLE)

Solution:

findContours returns 3 values. Changing the above line to below should fix it.

__ , contours , __ = cv2.findContours(segmap.astype('uint8'),mode=cv2.RETR_TREE, method=cv2.CHAIN_APPROX_SIMPLE)

Struggling to train with custom alphabet including spaces...

Hi, Fausto
First of all, thanks for sharing this implementation, I have actually started using your codebase recently because it's much nicer and cleaner than what I did a few months ago! :)

I'm trying to train the model using a custom dataset, and a custom alphabet including all letters, some special symbols and the space ' ' character. Everything works great if I ommit the space character from the sequences. When the ' ' (or whatever replacement I'm using) I can't get the model to train, I see 2 issues arise:

  1. loss is inf.
  2. a message in the console showing an error in the ctc funcion (./tensorflow/core/util/ctc/ctc_loss_calculator.h:499] No valid path found.)
    [2] happens some times during normal trianing but it's not very common and the model keeps training correctly.

I've tried freezing the backbone layers with no success... (It actually achieves much better results when training the whole network).

I tried adding a start sequence character '\t' and an end sequence '\n' and that can be learned on a sencence level, but not on a word level...

I also changed the get_batch_generator function to remove the .strip() calls to each sentence.

Do you have any ideas that I could try or changes to the code that might prevent this issue?

Thank you!
Chepe

Do keras OCR suited for scanned documents ?

HI,
I'm new to the universe of OCR and document analysis. So to discover that, I'm working on large documents like scanned invoices with a lot of noises. Do you think that using keras-ocr instead of tesseract would be possible on this type of data ? Or there are another tools build for OCR on these documents exist ?

Script fails with Warning: Allocation of xxx exceeds 10% of system memory.

Issue with memory usage, when running demo script.

Windows 10
RAM: 16GB
CPU: i7-9750H
GPU: GeForce GTX 1660 Ti
GPU computeCapability: 7.5

The following lines of demo code run successfully:

import matplotlib.pyplot as plt
import keras_ocr

# keras-ocr will automatically download pretrained
# weights for the detector and recognizer.
pipeline = keras_ocr.pipeline.Pipeline()

# Get a set of three example images
images = [
    keras_ocr.tools.read(url) for url in [
        'https://upload.wikimedia.org/wikipedia/commons/b/bd/Army_Reserves_Recruitment_Banner_MOD_45156284.jpg',
        'https://upload.wikimedia.org/wikipedia/commons/e/e8/FseeG2QeLXo.jpg',
        'https://upload.wikimedia.org/wikipedia/commons/b/b4/EUBanana-500x112.jpg'
    ]
]

The following line of code cause an error:

prediction_groups = pipeline.recognize(images)

Error:

Looking for C:\Users\...\.keras-ocr\crnn_kurapan.h5
2020-02-25 16:33:00.010608: W tensorflow/core/framework/cpu_allocator_impl.cc:81] Allocation of 1006632960 exceeds 10% of system memory.
2020-02-25 16:33:00.356063: W tensorflow/core/framework/cpu_allocator_impl.cc:81] Allocation of 1006632960 exceeds 10% of system memory.

The script then fails.

I have read (https://www.raspberrypi.org/forums/viewtopic.php?t=242471) that this should be a warning, and shouldn't cause the script to fail.

I got an error,who can help me

E:\Anaconda\python.exe E:/keras-ocr/keras_ocr_texst.py
Traceback (most recent call last):
File "E:/keras-ocr/keras_ocr_texst.py", line 1, in
import keras_ocr
File "E:\Anaconda\lib\site-packages\keras_ocr_init_.py", line 1, in
from . import (detection, recognition, tools, data_generation, pipeline, evaluation, datasets,
File "E:\Anaconda\lib\site-packages\keras_ocr\detection.py", line 31, in
from . import tools
File "E:\Anaconda\lib\site-packages\keras_ocr\tools.py", line 14, in
from shapely import geometry
File "E:\Anaconda\lib\site-packages\shapely\geometry_init_.py", line 4, in
from .base import CAP_STYLE, JOIN_STYLE
File "E:\Anaconda\lib\site-packages\shapely\geometry\base.py", line 18, in
from shapely.coords import CoordinateSequence
File "E:\Anaconda\lib\site-packages\shapely\coords.py", line 8, in
from shapely.geos import lgeos
File "E:\Anaconda\lib\site-packages\shapely\geos.py", line 145, in
lgeos = CDLL(os.path.join(sys.prefix, 'Library', 'bin', 'geos_c.dll'))
File "E:\Anaconda\lib\ctypes_init
.py", line 364, in init
self._handle = _dlopen(self._name, mode)
OSError: [WinError 126] 找不到指定的模块。

Process finished with exit code 1

windows10
tensorflow-cpu =2.0
python= 3.7

Version

Hi, Can you please specify the versions of Python, Tensorflow and Keras compatible with this API?

Weakly-Supervised Training for Detection

The CRAFT authors used a weakly-supervised training method to handle the fact that most datasets don't annotate at the character level. I saw in your docs that a future release will support weakly-supervised training of the detector model, presumably following section 3.2.2 of the original paper. Have you made a start on this and, if so, do you have an idea of when this would be released? I might have time to try this myself, but figured I'd ask first.

Also, kudos and thanks for this cool project!

Error when running

Hi,

When I run the program with my business card test image, getting this error

TypeError: resize_bilinear() got an unexpected keyword argument 'half_pixel_centers'

What does it mean? What am I doing wrong? Are there any example programs I can try?

Thanks,
Suyash

bus-card

Question: Purpose of `rnn_steps_to_discard` in recognition

Hi!

Thanks for the good library.

I'm looking at the code for the detector, namely the last Lamda function of the Recognition model.
https://github.com/faustomorales/keras-ocr/blob/master/keras_ocr/recognition.py#L285
x = keras.layers.Lambda(lambda x: x[:, rnn_steps_to_discard:])(x)

What is the purpose of the Lamda function that discards the first few steps of the RNN? I would have thought that all the RNN steps are needed - what are the advantages of ignoring them?

Kind regards indeed, Franco

Issue with Clean Installation

I have been having an issue with a clean installation.

Supported versions (https://pypi.org/project/keras-ocr/)

keras-ocr supports Python >= 3.6 and TensorFlow >= 2.0.0.

My environment:

  • Windows 10
  • Python 3.7.4
  • Anaconda

Installation steps:

  1. Create a new virtual environment
mkdir venv
cd venv
mkdir project
python -m venv project
project\Scripts\activate.bat
  1. Install keras-ocr, and tensorflow
python -m pip install --upgrade pip
pip install keras-ocr
pip install tensorflow

Pip freeze:

absl-py==0.9.0
astor==0.8.1
cachetools==4.0.0
certifi==2019.11.28
chardet==3.0.4
cycler==0.10.0
decorator==4.4.1
editdistance==0.5.3
efficientnet==1.0.0
essential-generators==0.9.2
fonttools==4.4.0
gast==0.2.2
google-auth==1.11.2
google-auth-oauthlib==0.4.1
google-pasta==0.1.8
grpcio==1.27.2
h5py==2.10.0
idna==2.9
imageio==2.8.0
imgaug==0.4.0
Keras-Applications==1.0.8
keras-ocr==0.6.2
Keras-Preprocessing==1.1.0
kiwisolver==1.1.0
Markdown==3.2.1
matplotlib==3.1.3
networkx==2.4
numpy==1.18.1
oauthlib==3.1.0
opencv-python==4.2.0.32
opt-einsum==3.1.0
Pillow==7.0.0
protobuf==3.11.3
pyasn1==0.4.8
pyasn1-modules==0.2.8
pyclipper==1.1.0.post3
pyparsing==2.4.6
python-dateutil==2.8.1
PyWavelets==1.1.1
requests==2.23.0
requests-oauthlib==1.3.0
rsa==4.0
scikit-image==0.16.2
scipy==1.4.1
Shapely==1.7.0
six==1.14.0
tensorboard==2.1.0
tensorflow==2.1.0
tensorflow-estimator==2.1.0
termcolor==1.1.0
tqdm==4.43.0
urllib3==1.25.8
validators==0.14.2
Werkzeug==1.0.0
wrapt==1.12.0

Run example script (https://pypi.org/project/keras-ocr/):

import matplotlib.pyplot as plt

import keras_ocr

# keras-ocr will automatically download pretrained
# weights for the detector and recognizer.
pipeline = keras_ocr.pipeline.Pipeline()

# Get a set of three example images
images = [
    keras_ocr.tools.read(url) for url in [
        'https://upload.wikimedia.org/wikipedia/commons/b/bd/Army_Reserves_Recruitment_Banner_MOD_45156284.jpg',
        'https://upload.wikimedia.org/wikipedia/commons/e/e8/FseeG2QeLXo.jpg',
        'https://upload.wikimedia.org/wikipedia/commons/b/b4/EUBanana-500x112.jpg'
    ]
]

# Each list of predictions in prediction_groups is a list of
# (word, box) tuples.
prediction_groups = pipeline.recognize(images)

# Plot the predictions
fig, axs = plt.subplots(nrows=len(images), figsize=(20, 20))
for ax, image, predictions in zip(axs, images, prediction_groups):
    keras_ocr.tools.drawAnnotations(image=image, predictions=predictions, ax=ax)

Error:

(project) (base) C:\Users\...\Documents>python main.py

2020-02-23 15:03:06.027923: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'cudart64_101.dll'; dlerror: cudart64_101.dll not found
2020-02-23 15:03:06.031828: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.

Traceback (most recent call last):
  File "main.py", line 3, in <module>
    import keras_ocr
  File "C:\Users\...\Documents\venv\project\lib\site-packages\keras_ocr\__init__.py", line 1, in <module>
    from . import (detection, recognition, tools, data_generation, pipeline, evaluation, datasets,
  File "C:\Users\...\Documents\venv\project\lib\site-packages\keras_ocr\detection.py", line 31, in <module>
    from . import tools
  File "C:\Users\...\Documents\venv\project\lib\site-packages\keras_ocr\tools.py", line 14, in <module>
    from shapely import geometry
  File "C:\Users\...\Documents\venv\project\lib\site-packages\shapely\geometry\__init__.py", line 4, in <module>
    from .base import CAP_STYLE, JOIN_STYLE
  File "C:\Users\...\Documents\venv\project\lib\site-packages\shapely\geometry\base.py", line 18, in <module>
    from shapely.coords import CoordinateSequence
  File "C:\Users\...\Documents\venv\project\lib\site-packages\shapely\coords.py", line 8, in <module>
    from shapely.geos import lgeos
  File "C:\Users\...\Documents\venv\project\lib\site-packages\shapely\geos.py", line 145, in <module>
    _lgeos = CDLL(os.path.join(sys.prefix, 'Library', 'bin', 'geos_c.dll'))
  File "C:\Users\...\Anaconda3\lib\ctypes\__init__.py", line 364, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: [WinError 126] The specified module could not be found

What am I doing wrong?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.