pythonlessons / mltu Goto Github PK

View Code? Open in Web Editor NEW

164.0 164.0 100.0 2.02 MB

Machine Learning Training Utilities (for TensorFlow and PyTorch)

License: MIT License

Python 99.96% Shell 0.04%

machine-learning object-detection ocr pytorch speech-recognition speech-to-text tensorflow yolov8

mltu's People

Contributors

Stargazers

Watchers

Forkers

vkrtsind iamlory xerocopy souravcodes1080 alialemimatinpour samansj1377 tamanna18 ankur2606 hoahoa1808 zeus-salazar seimon-ohh yas1e2r shylxsh mohit-potato zaladevdeep wok1chz icedragneel sellouk juviz138 ranoobi sh1d0w olenkan rifatullah102 rafaelfn1230 mhhamdan tinyx3k pustaibogdan johancuda zwyeo zahamed mrviper111 jedrzejewski-andrzej freshcabbage123 tedigom52 legion911 rand-h looker2zip sleipnir029 heetvekariya 2219sha siddhant1309 spidartist suryateja0311 iamvaibhavrathore imdavidsantiago arifafandi trisha2601 vuongvmu agilepass varshini-2007 rishabh26shah dora-ken namlp198 yass-99 jhaabhiiishek theperplexedguy rajpututpal zivby86 newmke nastgc cthulhuswing akilsadik surya203 felathirr akshat01112001 shivesh96 skyontop duccloud s-t-a-1 huytruong99 mhhabdelwahab amit66944 lenusic jantzla markgir sachintha443 trananh1992 linscomt wicky2001 linuxfjb danielharven fchang-smith tuyendam00 ai-training-projects ankmrao1225 angrau seidnerj kimtech-hub badanimator raaghavbhyana

mltu's Issues

Impossible to solve contradiction in dependencies on Mac M1.

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
tf2onnx 1.14.0 requires flatbuffers<3.0,>=1.12, but you have flatbuffers 23.5.26 which is incompatible.

Tried different version of mltu and TensorFlow for M1 (https://developer.apple.com/metal/tensorflow-plugin/)

With others TensorFlow versions have others issues.

Missing 1 required positional argument: "padding_token" (in CWERMetric)

In "train.py", when I want to execute these lines of code:

model.compile(
optimizer=tf.keras.optimizers.Adam(learning_rate=configs.learning_rate),
loss=CTCloss(),
metrics=[CWERMetric()],
run_eagerly=False
)

I'm having this trouble:
"TypeError: init() missing 1 required positional argument: "padding_token""

I can see that "CWERMetrics()" came from "mltu.tensorflow.metrics".

In "metrics.py", I can see this in the "init" arguments: "(self, padding_token, name='CWER', **kwargs)"

Also I read:

Store the padding token as an attribute

    self.padding_token = padding_token

And after that, in "update_state" method, I can prove that "self.padding_token" is used for the next line of code:

Retain only the non-padding elements in the true labels tensor

    true_labels_sparse = tf.sparse.retain(true_labels_sparse, tf.not_equal(true_labels_sparse.values, self.padding_token))

How can I solve this problem? Thank you!

Not able to find words.txt in the IAM website - urg

I am unable to find the words.txt annotation file in the IAM website. As mentioned in the earlier issues, the link was not working so I went directly to the IAM website. I am able to find the image folders but not the annotation text file. Without it, I don't know how to run the code. I am in a little hurry as this project would allow me to finish another similar project. It would be great if you could upload the txt file. Kindly respond as soon as possible.

No module named mltu

Traceback (most recent call last):
File "c:\Users\91986\Desktop\DEVELOPMENT\Mini Project sem 5\text-recog\train.py", line 7, in
from mltu.preprocessors import ImageReader
ModuleNotFoundError: No module named 'mltu'

Error while testing model using referenceModel

import cv2
import typing
import numpy as np

from mltu.inferenceModel import OnnxInferenceModel
from mltu.utils.text_utils import ctc_decoder, get_cer, get_wer
from mltu.transformers import ImageResizer

class ImageToWordModel(OnnxInferenceModel):
def init(self, char_list: typing.Union[str, list], *args, **kwargs):
super().init(*args, **kwargs)
self.char_list = char_list

def predict(self, image: np.ndarray):
    image = ImageResizer.resize_maintaining_aspect_ratio(image, *self.input_shape[:2][::-1])

    image_pred = np.expand_dims(image, axis=0).astype(np.float32)

    preds = self.model.run(None, {self.input_name: image_pred})[0]

    text = ctc_decoder(preds, self.char_list)[0]

    return text

if name == "main":
import pandas as pd
from tqdm import tqdm
from mltu.configs import BaseModelConfigs

configs = BaseModelConfigs.load("Models/04_sentence_recognition/202301131202/configs.yaml")

model = ImageToWordModel(model_path=configs.model_path, char_list=configs.vocab)

df = pd.read_csv("Models/04_sentence_recognition/202301131202/val.csv").values.tolist()

accum_cer, accum_wer = [], []
for image_path, label in tqdm(df):
    image = cv2.imread(image_path)

    prediction_text = model.predict(image)

    cer = get_cer(prediction_text, label)
    wer = get_wer(prediction_text, label)
    print("Image: ", image_path)
    print("Label:", label)
    print("Prediction: ", prediction_text)
    print(f"CER: {cer}; WER: {wer}")

    accum_cer.append(cer)
    accum_wer.append(wer)

    cv2.imshow(prediction_text, image)
    cv2.waitKey(0)
    cv2.destroyAllWindows()

print(f"Average CER: {np.average(accum_cer)}, Average WER: {np.average(accum_wer)}")

When i run this code i get this error this code is the same as your tutorial on sentence recognition (inferenceModel.py)

The module 'Models' could not be loaded.

PS C:\Users\şule meşe\Downloads\mltu-main\mltu-main> Models\03_handwriting_recognition\202402160440\logs
Models\03_handwriting_recognition\202402160440\logs : The module 'Models' could not be loaded. For more information, run 'Import-Module Models'.
At line:1 char:1

Models\03_handwriting_recognition\202402160440\logs

  + CategoryInfo          : ObjectNotFound: (Models\03_handw...2402160440\logs:String) [], CommandNotFoundException
  + FullyQualifiedErrorId : CouldNotAutoLoadModule

Can you help me please :)

Custom Captcha To Text Model for Node.js

I was able to save the the model under HDF5, .ONNX, and SavedModel (using tf.saved_model.save)

However, I'm still unable to find the answer for how to run the model on Node.js backend

tensorflowjs_converter doesn't seem to work with model has Lambda layer, can you help me please...

ValueError: Failed to find data adapter that can handle input: <class 'mltu.dataProvider.DataProvider'>, <class 'NoneType'>

hey
i get stucked with this issue when i doing the training

my dataset :

path :

folders :

maybe this can help you for understanding:

please help me !!

captcha images name issue for training

Hi!
FIrst of all thanks alot. Now question is that I am having my 800 images with having image name captch1,captcha2,captcha3 and so on. I want to ask that while training the model, do I need to give the images the solved captcha characters as you gave like x223g,ss23d and 7d7df etc.

Failed to find data adapter that can handle input

Hi @pythonlessons,
I'm trying to use the image to word Tutorial.

I changed the train.py a bit in order to read my images and labels better. The only change was in def read_annotation_file:

Old Code:

def read_annotation_file(annotation_path):
   dataset, vocab, max_len = [], set(), 0
    with open(annotation_path, "r") as f:
        for line in tqdm(f.readlines()):
            line = line.split()
            image_path = data_path + line[0][1:]
            label = line[0].split("_")[1]
            dataset.append([image_path, label])
            vocab.update(list(label))
            max_len = max(max_len, len(label))
    return dataset, sorted(vocab), max_len

New Code:

def read_annotation_file(annotation_path):
    dataset, vocab, max_len = [], set(), 0
    with open(annotation_path, "r") as f:
        for line in tqdm(f.readlines()):
            line = line.split(' ')
            image_path = data_path + line[0]
            label = line[1]
            dataset.append([image_path, label])
            vocab.update(list(label))
            max_len = max(max_len, len(label))
    return dataset, sorted(vocab), max_len

I also changed something in line 91 due to an error:

Old Code:
metrics=[CWERMetric()],

New Code:
metrics=[CWERMetric('accuracy')],

This is my error:

Traceback (most recent call last):
File "/path/to/mltu/Tutorials/01_image_to_word/train.py", line 111, in
model.fit(
File "/path/to/.local/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 70, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/path/to/.local/lib/python3.10/site-packages/keras/engine/data_adapter.py", line 1083, in select_data_adapter
raise ValueError(
ValueError: Failed to find data adapter that can handle input: <class 'mltu.dataProvider.DataProvider'>, <class 'NoneType'>

How do I need to change train_data_provider or train_dataset or is it a version problem ?

train.py giving error on custom dataset

Hi! I am trying to fine-tune the wav2vec2 model from your "10_wav2vec2_torch" tutorial. As far as I know, my dataset is in a similar format to the LJ Speech Dataset that you are using as an example. There is a 'wavs' folder which contains the audio files, and a 'metadata.csv' file that has rows of pipe-separated transcriptions. I have been able to successfully run the train.py script on the default dataset (LJ Speech Dataset), but when I use my own dataset, I get this output on the terminal. Am I missing something?

Some weights of the model checkpoint at facebook/wav2vec2-base-960h were not used when initializing Wav2Vec2ForCTC: ['wav2vec2.encoder.pos_conv_embed.conv.weight_v', 'wav2vec2.encoder.pos_conv_embed.conv.weight_g']
- This IS expected if you are initializing Wav2Vec2ForCTC from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing Wav2Vec2ForCTC from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of Wav2Vec2ForCTC were not initialized from the model checkpoint at facebook/wav2vec2-base-960h and are newly initialized: ['wav2vec2.encoder.pos_conv_embed.conv.parametrizations.weight.original1', 'wav2vec2.masked_spec_embed', 'wav2vec2.encoder.pos_conv_embed.conv.parametrizations.weight.original0']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Some weights of Wav2Vec2ForCTC were not initialized from the model checkpoint at facebook/wav2vec2-base-960h and are newly initialized because the shapes did not match:
- lm_head.bias: found shape torch.Size([32]) in the checkpoint and torch.Size([29]) in the model instantiated
- lm_head.weight: found shape torch.Size([32, 768]) in the checkpoint and torch.Size([29, 768]) in the model instantiated
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Cuda Device Available.
INFO:WarmupCosineDecay:Epoch 1 - Learning Rate: 1e-08
  0%|                                                                                                                  | 0/18 [00:00<?, ?it/s]/home/ee/anaconda3/lib/python3.9/site-packages/mltu/transformers.py:234: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
  return padded_audios, np.array(label)
Epoch 1 - loss: 25.1576 - CER: 4.2681 - WER: 1.0000: 100%|████████████████████████████████████████████████████| 18/18 [00:08<00:00,  2.06it/s]
  0%|                                                                                                                   | 0/2 [00:00<?, ?it/s]Exception in thread Thread-19:
Exception in thread Thread-15:
Traceback (most recent call last):
  File "/home/ee/anaconda3/lib/python3.9/threading.py", line 980, in _bootstrap_inner
Exception in thread Thread-16:
Traceback (most recent call last):
  File "/home/ee/anaconda3/lib/python3.9/threading.py", line 980, in _bootstrap_inner
Exception in thread Thread-14:
Traceback (most recent call last):
  File "/home/ee/anaconda3/lib/python3.9/threading.py", line 980, in _bootstrap_inner
    self.run()
  File "/home/ee/anaconda3/lib/python3.9/threading.py", line 917, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ee/anaconda3/lib/python3.9/site-packages/mltu/torch/dataProvider.py", line 245, in worker_function
    result = self.function(data_index)
  File "/home/ee/anaconda3/lib/python3.9/site-packages/mltu/dataProvider.py", line 287, in __getitem__
    batch_data, batch_annotations = batch_postprocessor(batch_data, batch_annotations)
  File "/home/ee/anaconda3/lib/python3.9/site-packages/mltu/transformers.py", line 222, in __call__
    max_len = max([len(a) for a in audio])
ValueError: max() arg is an empty sequence
    self.run()
  File "/home/ee/anaconda3/lib/python3.9/threading.py", line 917, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ee/anaconda3/lib/python3.9/site-packages/mltu/torch/dataProvider.py", line 245, in worker_function
    result = self.function(data_index)
  File "/home/ee/anaconda3/lib/python3.9/site-packages/mltu/dataProvider.py", line 287, in __getitem__
    batch_data, batch_annotations = batch_postprocessor(batch_data, batch_annotations)
  File "/home/ee/anaconda3/lib/python3.9/site-packages/mltu/transformers.py", line 222, in __call__
    max_len = max([len(a) for a in audio])
ValueError: max() arg is an empty sequence
    self.run()
  File "/home/ee/anaconda3/lib/python3.9/threading.py", line 917, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ee/anaconda3/lib/python3.9/site-packages/mltu/torch/dataProvider.py", line 245, in worker_function
    result = self.function(data_index)
  File "/home/ee/anaconda3/lib/python3.9/site-packages/mltu/dataProvider.py", line 287, in __getitem__
    batch_data, batch_annotations = batch_postprocessor(batch_data, batch_annotations)
  File "/home/ee/anaconda3/lib/python3.9/site-packages/mltu/transformers.py", line 222, in __call__
    max_len = max([len(a) for a in audio])
ValueError: max() arg is an empty sequence
Traceback (most recent call last):
  File "/home/ee/anaconda3/lib/python3.9/threading.py", line 980, in _bootstrap_inner
    self.run()
  File "/home/ee/anaconda3/lib/python3.9/threading.py", line 917, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ee/anaconda3/lib/python3.9/site-packages/mltu/torch/dataProvider.py", line 245, in worker_function
    result = self.function(data_index)
  File "/home/ee/anaconda3/lib/python3.9/site-packages/mltu/dataProvider.py", line 287, in __getitem__
    batch_data, batch_annotations = batch_postprocessor(batch_data, batch_annotations)
  File "/home/ee/anaconda3/lib/python3.9/site-packages/mltu/transformers.py", line 222, in __call__
    max_len = max([len(a) for a in audio])
ValueError: max() arg is an empty sequence
Exception in thread Thread-18:
Traceback (most recent call last):
  File "/home/ee/anaconda3/lib/python3.9/threading.py", line 980, in _bootstrap_inner
    self.run()
  File "/home/ee/anaconda3/lib/python3.9/threading.py", line 917, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ee/anaconda3/lib/python3.9/site-packages/mltu/torch/dataProvider.py", line 245, in worker_function
    result = self.function(data_index)
  File "/home/ee/anaconda3/lib/python3.9/site-packages/mltu/dataProvider.py", line 287, in __getitem__
    batch_data, batch_annotations = batch_postprocessor(batch_data, batch_annotations)
  File "/home/ee/anaconda3/lib/python3.9/site-packages/mltu/transformers.py", line 222, in __call__
    max_len = max([len(a) for a in audio])
ValueError: max() arg is an empty sequence
Exception in thread Thread-23:
Traceback (most recent call last):
  File "/home/ee/anaconda3/lib/python3.9/threading.py", line 980, in _bootstrap_inner
    self.run()
  File "/home/ee/anaconda3/lib/python3.9/threading.py", line 917, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ee/anaconda3/lib/python3.9/site-packages/mltu/torch/dataProvider.py", line 245, in worker_function
    result = self.function(data_index)
  File "/home/ee/anaconda3/lib/python3.9/site-packages/mltu/dataProvider.py", line 287, in __getitem__
    batch_data, batch_annotations = batch_postprocessor(batch_data, batch_annotations)
  File "/home/ee/anaconda3/lib/python3.9/site-packages/mltu/transformers.py", line 222, in __call__
    max_len = max([len(a) for a in audio])
ValueError: max() arg is an empty sequence
Exception in thread Thread-21:
Traceback (most recent call last):
  File "/home/ee/anaconda3/lib/python3.9/threading.py", line 980, in _bootstrap_inner
    self.run()
  File "/home/ee/anaconda3/lib/python3.9/threading.py", line 917, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ee/anaconda3/lib/python3.9/site-packages/mltu/torch/dataProvider.py", line 245, in worker_function
    result = self.function(data_index)
  File "/home/ee/anaconda3/lib/python3.9/site-packages/mltu/dataProvider.py", line 287, in __getitem__
    batch_data, batch_annotations = batch_postprocessor(batch_data, batch_annotations)
  File "/home/ee/anaconda3/lib/python3.9/site-packages/mltu/transformers.py", line 222, in __call__
    max_len = max([len(a) for a in audio])
ValueError: max() arg is an empty sequence
Exception in thread Thread-22:
Traceback (most recent call last):
  File "/home/ee/anaconda3/lib/python3.9/threading.py", line 980, in _bootstrap_inner
    self.run()
  File "/home/ee/anaconda3/lib/python3.9/threading.py", line 917, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ee/anaconda3/lib/python3.9/site-packages/mltu/torch/dataProvider.py", line 245, in worker_function
    result = self.function(data_index)
  File "/home/ee/anaconda3/lib/python3.9/site-packages/mltu/dataProvider.py", line 287, in __getitem__
    batch_data, batch_annotations = batch_postprocessor(batch_data, batch_annotations)
  File "/home/ee/anaconda3/lib/python3.9/site-packages/mltu/transformers.py", line 222, in __call__
    max_len = max([len(a) for a in audio])
ValueError: max() arg is an empty sequence

Changing the architecture

How to modify the architecture if my image shape is (160, 60)

Model configuration for new captcha type

Hello brother, thanks for code and the model that you have made. I'm new to neural networks and ai training.

I'm trying to train captcha solver model on this captcha types. I didn't succeed yet. First of all, because of when the training is reaching Epoch 200-350/1000, it's printing Epoch: early stopping. At first I tried to change the batch size, number of workers and training speed.
Tried 32, 64 (default), 128, 256, 270, 512 batch size and 10, 20, 30, 40 training worker amount.

I used 1000 of those captchas as a dataset and 101 different captchas for testing. The "best" result i got at batch_size=256, train_workers = 30 on a machine with 24GB RAM, used intel i5-12th CPU for training. A trained model result below:
202402212032.zip

Also thought maybe my machine was not powerful enough, so i tried a different machine with 64GB RAM, i9-13th CPU, Nvidia GeForce RTX 2080. Still got the Epoch early stopping error around 313/1000 epoch. The "trained" model folder is in below:
202402232009 i9.zip

Now, i know that i can use of of those models and continue training them.

i was hoping if you can help to configure the model architcture or configs for those captcha images. I think maybe need to make fit the image size, because while training, I was getting "libpng warning: pHYs: CRC error" a lot. I'd very grateful if you could help me with that. Thanks for your valueable time.

Wheel for v1.0.10 is missing

on PyPi (https://pypi.org/project/mltu/#history), can you please add it?

Thanks in advance!

mltu/Tutorials /01_image_to_word

Hey

I use tensorflow 2.4.1 and must change "import keras" to "import tensorflow.keras"

Now i have the follow issues in train.py "model.fit" so in model.py

Epoch 1/100
Traceback (most recent call last):
File "c:/Tutorials/TensorFlow/image_to_word/train.py", line 109, in
model.fit(
File "C:\appdata\local\programs\python\python38\lib\site-packages\tensorflow\python\keras\engine\training.py", line 1100, in fit
tmp_logs = self.train_function(iterator)
File "C:\appdata\local\programs\python\python38\lib\site-packages\tensorflow\python\eager\def_function.py", line 828, in call
result = self._call(*args, **kwds)
File "C:\appdata\local\programs\python\python38\lib\site-packages\tensorflow\python\eager\def_function.py", line 871, in _call
self._initialize(args, kwds, add_initializers_to=initializers)
File "C:\appdata\local\programs\python\python38\lib\site-packages\tensorflow\python\eager\def_function.py", line 725, in _initialize
self._stateful_fn._get_concrete_function_internal_garbage_collected( # pylint: disable=protected-access
File "C:\appdata\local\programs\python\python38\lib\site-packages\tensorflow\python\eager\function.py", line 2969, in _get_concrete_function_internal_garbage_collected
graph_function, _ = self._maybe_define_function(args, kwargs)
File "C:\appdata\local\programs\python\python38\lib\site-packages\tensorflow\python\eager\function.py", line 3361, in _maybe_define_function graph_function = self._create_graph_function(args, kwargs)
File "C:\appdata\local\programs\python\python38\lib\site-packages\tensorflow\python\eager\function.py", line 3196, in _create_graph_function func_graph_module.func_graph_from_py_func(
File "C:\appdata\local\programs\python\python38\lib\site-packages\tensorflow\python\framework\func_graph.py", line 990, in func_graph_from_py_func
func_outputs = python_func(*func_args, **func_kwargs)
File "C:\appdata\local\programs\python\python38\lib\site-packages\tensorflow\python\eager\def_function.py", line 634, in wrapped_fn
out = weak_wrapped_fn().wrapped(*args, **kwds)
File "C:\appdata\local\programs\python\python38\lib\site-packages\tensorflow\python\framework\func_graph.py", line 977, in wrapper
raise e.ag_error_metadata.to_exception(e)
TypeError: in user code:

C:\appdata\local\programs\python\python38\lib\site-packages\tensorflow\python\keras\engine\training.py:805 train_function  *
    return step_function(self, iterator)
C:e\appdata\local\programs\python\python38\lib\site-packages\tensorflow\python\keras\engine\training.py:795 step_function  **
    outputs = model.distribute_strategy.run(run_step, args=(data,))
C:\appdata\local\programs\python\python38\lib\site-packages\tensorflow\python\distribute\distribute_lib.py:1259 run
    return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs)
C:\appdata\local\programs\python\python38\lib\site-packages\tensorflow\python\distribute\distribute_lib.py:2730 call_for_each_replica
    return self._call_for_each_replica(fn, args, kwargs)
C:\appdata\local\programs\python\python38\lib\site-packages\tensorflow\python\distribute\distribute_lib.py:3417 _call_for_each_replica
    return fn(*args, **kwargs)
C:\appdata\local\programs\python\python38\lib\site-packages\tensorflow\python\keras\engine\training.py:788 run_step  **
    outputs = model.train_step(data)
C:\appdata\local\programs\python\python38\lib\site-packages\tensorflow\python\keras\engine\training.py:759 train_step
    return {m.name: m.result() for m in self.metrics}
C:\appdata\local\programs\python\python38\lib\site-packages\tensorflow\python\keras\engine\training.py:759 <dictcomp>
    return {m.name: m.result() for m in self.metrics}
C:\appdata\local\programs\python\python38\lib\site-packages\tensorflow\python\keras\utils\metrics_utils.py:122 decorated
    result_t = array_ops.identity(result_fn(*args))
C:\appdata\local\programs\python\python38\lib\site-packages\tensorflow\python\util\dispatch.py:201 wrapper
    return target(*args, **kwargs)
C:\appdata\local\programs\python\python38\lib\site-packages\tensorflow\python\ops\array_ops.py:287 identity
    ret = gen_array_ops.identity(input, name=name)
C:\appdata\local\programs\python\python38\lib\site-packages\tensorflow\python\ops\gen_array_ops.py:3941 identity
    _, _, _op, _outputs = _op_def_library._apply_op_helper(
C:\appdata\local\programs\python\python38\lib\site-packages\tensorflow\python\framework\op_def_library.py:525 _apply_op_helper
    raise err
C:\appdata\local\programs\python\python38\lib\site-packages\tensorflow\python\framework\op_def_library.py:517 _apply_op_helper
    values = ops.convert_to_tensor(
C:\appdata\local\programs\python\python38\lib\site-packages\tensorflow\python\profiler\trace.py:163 wrapped
    return func(*args, **kwargs)
C:\appdata\local\programs\python\python38\lib\site-packages\tensorflow\python\framework\ops.py:1540 convert_to_tensor
    ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
C:\appdata\local\programs\python\python38\lib\site-packages\tensorflow\python\framework\constant_op.py:339 _constant_tensor_conversion_function
    return constant(v, dtype=dtype, name=name)
C:\appdata\local\programs\python\python38\lib\site-packages\tensorflow\python\framework\constant_op.py:264 constant
    return _constant_impl(value, dtype, shape, name, verify_shape=False,
C:\appdata\local\programs\python\python38\lib\site-packages\tensorflow\python\framework\constant_op.py:281 _constant_impl
    tensor_util.make_tensor_proto(
C:\appdata\local\programs\python\python38\lib\site-packages\tensorflow\python\framework\tensor_util.py:457 make_tensor_proto
    _AssertCompatible(values, dtype)
C:\appdata\local\programs\python\python38\lib\site-packages\tensorflow\python\framework\tensor_util.py:334 _AssertCompatible
    raise TypeError("Expected any non-tensor type, got a tensor instead.")

TypeError: Expected any non-tensor type, got a tensor instead.

I can't train my private dataset

I'm trying to use captcha to text but I can't train my dataset like you.
When I tried with the dataset you gave, it worked without any problems, but when I changed my own images with yours, I had problems. A few examples from my dataset with 10129 images:

I made a change in train.py file like this:
label = os.path.splitext(file)[0] -> label = os.path.splitext(file)[0].split('-')[1].

Because the names of my images are not captcha_answer.png like yours, but md5hash-captcha_answer.png. So I made a change in this way and made it take the captcha_answer parameter in the same way.

In the config.py file, since all my images are 350x100, I changed self.height = 100 and self.width = 350. Then I got the following error. Can you help me solve this?

Transcription has no stops between sentences.

Hi! I trained the wav2vec2 model with perfect accuracy on my dataset. When I perform prediction on a full audio file, the transcriptions have no gaps between separate sentences. For example, in "transfer you to our new sales line please hold for a moment i will transfer you overthank you youre welcome stay in the linethank you for calling this call may be recorded", there should be gaps between 'over', 'thank', 'line' and 'thank'. I could just run this script for diarized segments of the original wav file, but I want to be able to transcribe the complete audio file in one go.

I am using 'mltu==1.1.7'. Here is the code for making predictions.

import numpy as np

from mltu.inferenceModel import OnnxInferenceModel
from mltu.utils.text_utils import ctc_decoder, get_cer, get_wer

import librosa
import pandas as pd
from tqdm import tqdm

class Wav2vec2(OnnxInferenceModel):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)

    def predict(self, audio: np.ndarray):

        audio = np.expand_dims(audio, axis=0).astype(np.float32)

        preds = self.model.run(None, {self.input_name: audio})[0]

        text = ctc_decoder(preds, self.metadata["vocab"])[0]

        return text

model = Wav2vec2(model_path="Models/10_wav2vec2_torch/202310311600/model.onnx")

audio_file_path = '/media/ee/New Volume/mltu/Tutorials/10_wav2vec2_torch/Datasets/comcast_xfinity_full_audios/1.wav'
audio, sr = librosa.load(audio_file_path, sr=16000)
prediction_text = model.predict(audio)

print('predicted transcript: ', prediction_text)

config.yaml

facing this issue while running the code config.yaml misssing.
configs = BaseModelConfigs.load("Models/02_captcha_to_text/202212211205/configs.yaml")

Tutorial 2 - ModuleNotFoundError: No module named 'mltu'

Heyya! Thanks for a cool repo and intro video explaining it. I'm particularly interested in the 2nd tutorial - captcha text recognition.

I'm new to the python ecosystem so sorry in advance for silly questions. After cloning the repo and installing the dependencies via:

pip install -r requirements.txt

I get the following error:

maxdonchenko@maxdonchenko mltu % python3 ./Tutorials/02_captcha_to_text/train.py
Traceback (most recent call last):
  File "/Users/maxdonchenko/mltu/./Tutorials/02_captcha_to_text/train.py", line 7, in <module>
    from mltu.tensorflow.dataProvider import DataProvider
ModuleNotFoundError: No module named 'mltu'

An interesting nuance here is that I think I have an mltu installed:

maxdonchenko@maxdonchenko mltu % pip show mltu
Name: mltu
Version: 1.0.15
Summary: Machine Learning Training Utilities (MLTU) for TensorFlow and PyTorch
Home-page: https://pylessons.com/
Author: PyLessons
Author-email: [email protected]
License: 
Location: /Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages
Requires: librosa, matplotlib, numpy, onnxruntime, opencv-python, pandas, Pillow, PyYAML, tqdm
Required-by:

I tried to explicitly specify the mltu version in the root requirements.txt:

PyYAML>=6.0
tqdm
pandas
numpy
opencv-python
Pillow>=9.4.0
onnxruntime>=1.15.0  # onnxruntime-gpu for GPU support
librosa>=0.9.2
matplotlib
# 👆 already existing in the repo
# 👇 added by me
mltu==0.1.4
tensorflow==2.10 # took versions from Tutorials/02_captcha_to_text/README.md

but, installing them threw another error, basically saying tf version 2.10 can't be installed for some reason:

ERROR: Could not find a version that satisfies the requirement tensorflow==2.10 (from versions: 2.13.0rc0, 2.13.0rc1, 2.13.0rc2, 2.13.0)
ERROR: No matching distribution found for tensorflow==2.10

Do you have any suggestions on how to get the same environment that you had during NN model training?

ValueError: The filepath provided must end in `.keras` (Keras model format). Received: filepath=Models/02_captcha_to_text/202403291006/model.h5

Hello,
Can you help me to fix this?

ValueError: The filepath provided must end in .keras (Keras model format). Received: filepath=Models/02_captcha_to_text/202403291006/model.h5

I want to increase learning_rate and train_workers, is that possible?

@pythonlessons please help me

How to change the decoder to any transformer architecture ?

I f you want to use a pre trained Transformer for the same task, how would you use it instead of LSTM here? For example I want to use a lightweight BERT model, what would ne the changes to the line in the end? Trying to grasp the knowledge of the architecture.

 squeezed = layers.Reshape((x7.shape[-3] * x7.shape[-2], x7.shape[-1]))(x7)

    blstm = layers.Bidirectional(layers.LSTM(64, return_sequences=True))(squeezed)

    output = layers.Dense(output_dim + 1, activation='softmax', name="output")(blstm)

    model = Model(inputs=inputs, outputs=output)

Empty model folder

Model folder is empty in main branch. Please help me out, I want to run captcha to image code.

outdated tensor flow version

tensor flow version 2.10 is not available on pip

No module named 'mltu.tensorflow'

I am using Pythong 3.8.17. mltu==0.1.4 and Tensorflow 2.10.0. when I try to run train.py it gives this.

raceback (most recent call last):
  File "train.py", line 7, in <module>
    from mltu.tensorflow.dataProvider import DataProvider
ModuleNotFoundError: No module named 'mltu.tensorflow'

Somethings dont line up with the read me

from configs import ModelConfigs

dataset, vocab, max_len = [], set(), 0
for file in stow.ls(stow.join('Datasets', 'captcha_images_v2')):
    dataset.append([stow.relpath(file), file.name])
    vocab.update(list(file.name))
    max_len = max(max_len, len(file.name))

configs = ModelConfigs()

# Save vocab and maximum text length to configs
configs.vocab = "".join(vocab)
configs.max_text_length = max_len
configs.save()

this thing missing there import stow

from mltu.dataProvider import DataProvider
from mltu.preprocessors import ImageReader
from mltu.transformers import ImageResizer, LabelIndexer, LabelPadding
from mltu.augmentors import RandomBrightness, RandomRotate, RandomErodeDilate

data_provider = DataProvider(
    dataset=dataset,
    skip_validation=True,
    batch_size=configs.batch_size,
    data_preprocessors=[ImageReader()],
    transformers=[
        ImageResizer(configs.width, configs.height),
        LabelIndexer(configs.vocab),
        LabelPadding(max_word_length=configs.max_text_length, padding_value=len(configs.vocab))
        ],
)

configs arent imported on basic also dataset is not a file at whole, the whole project as where the datagets created isn't complementary

also this isn't used: from mltu.augmentors import RandomBrightness, RandomRotate, RandomErodeDilate

hopping for rework

mltu/transformer.py SpectrogramPadding Bag.

Hi @pythonlessons,
I think, Class SpectrogramPadding has a bag.

Original code:
A short spectrogram data is slide to backward(tail) of padded_spectrogram.
But, I want, A short spectrogram data will be slide to forward(head) of padded_spectrogram.

class SpectrogramPadding(Transformer):
    """Pad spectrogram to max_spectrogram_length
    
    Attributes:
        max_spectrogram_length (int): Maximum length of spectrogram
        padding_value (int): Value to pad
    """
    def __init__(
        self, 
        max_spectrogram_length: int, 
        padding_value: int
        ) -> None:
        self.max_spectrogram_length = max_spectrogram_length
        self.padding_value = padding_value

    def __call__(self, spectrogram: np.ndarray, label: np.ndarray):
        padded_spectrogram = np.pad(spectrogram, ((self.max_spectrogram_length - spectrogram.shape[0], 0),(0,0)), mode="constant", constant_values=self.padding_value)

        return padded_spectrogram, label

New code:

class SpectrogramPadding(Transformer):
    """Pad spectrogram to max_spectrogram_length
    
    Attributes:
        max_spectrogram_length (int): Maximum length of spectrogram
        padding_value (int): Value to pad
    """
    def __init__(
        self, 
        max_spectrogram_length: int, 
        padding_value: int,
        append: bool = True
        ) -> None:
        self.max_spectrogram_length = max_spectrogram_length
        self.padding_value = padding_value
        self.append=append

    def __call__(self, spectrogram: np.ndarray, label: np.ndarray):
        #print('spectrogram.shape:',spectrogram.shape)
        # spectrogram.shape: (1032, 193)
        if self.append==False:
            padded_spectrogram = np.pad(spectrogram, 
                ((self.max_spectrogram_length - spectrogram.shape[0], 0),(0,0)),mode="constant",constant_values=self.padding_value)
        else:
            l,h =spectrogram.shape
            lng = self.max_spectrogram_length - l
            if lng > 0:
                a = np.full((lng,h),self.padding_value)
                padded_spectrogram = np.append(spectrogram, a, axis=0)
            else:
                padded_spectrogram = spectrogram
        return padded_spectrogram, label

Config file issue

configs = BaseModelConfigs.load("Models/02_captcha_to_text/202212211205/configs.yaml")

Epoch 51: early stopping

Hi!
Model ended training with Epoch 51: early stopping
. I am training with dataset 1040 images? I am training with my 1040 labled images given below but when I try to predict text it shows empty string.
Code ::
image = cv2.imread('./6wf4ef.jpg')
prediction_text = model.predict(image) // return empty string
print(f"Predicted Text: {prediction_text}")

Prediction is Empty like this :

Augmentors are replacing original examples instead of adding more examples?

I am looking a the process_data(self, batch_data) function under the DataProvider class, there it can be seen that for each "batch data", i.e. a labeled example, all augmentors are applied in order then all transformers are applied in order:

    # Then augment, transform and postprocess the batch data
    for objects in [self._augmentors, self._transformers]:
        for object in objects:
            data, annotation = object(data, annotation)

Isn't the purpose of augmentors to add more examples to increase the training set? i.e., for each example, it should add to the training set both the original example, as well as the an "augmented" variation, preferably multiple augmented versions per a single example?

I am I misunderstanding?

Seems like process_data(self, batch_data) should look something like this:

def process_data(self, batch_data):
    """ Process data batch of data """
    if self._use_cache and batch_data[0] in self._cache:
        data, annotation = copy.deepcopy(self._cache[batch_data[0]])
    else:
        data, annotation = batch_data
        for preprocessor in self._data_preprocessors:
            data, annotation = preprocessor(data, annotation)

        if data is None or annotation is None:
            self.logger.warning("Data or annotation is None, marking for removal on epoch end.")
            self._on_epoch_end_remove.append(batch_data)
            return None, None

        if self._use_cache and batch_data[0] not in self._cache:
            self._cache[batch_data[0]] = (copy.deepcopy(data), copy.deepcopy(annotation))

    # Then transform, augment and postprocess the batch data
    for transformer in self._transformers:
        data, annotation = transformer(data, annotation)

    augmented_data_list = []
    if len(self._augmentors) > 0:
        for i in range(self._variation_count):  # generate multiple variations using specified augmentors
            augmented_data = data
            for augmentor in self._augmentors:
                augmented_data, annotation = augmentor(augmented_data, annotation)

            augmented_data_list.append((augmented_data, annotation))

    all_data_list = []
    for data, annotation in [(data, annotation)] + augmented_data_list:

        # Convert to numpy array if not already
        if not isinstance(data, np.ndarray):
            data = data.numpy()

        # Convert to numpy array if not already
        # TODO: This is a hack, need to fix this
        if not isinstance(annotation, (np.ndarray, int, float, str, np.uint8, float)):
            annotation = annotation.numpy()

        all_data_list.append((data, annotation))

    return all_data_list

With getitem(self, index: int) looking something like this:

def __getitem__(self, index: int):
    """ Returns a batch of data by batch index"""
    dataset_batch = self.get_batch_annotations(index)

    # First read and preprocess the batch data
    batch_data, batch_annotations = [], []
    for index, batch in enumerate(dataset_batch):
        for data, annotation in self.process_data(batch):
            if data is None or annotation is None:
                self.logger.warning("Data or annotation is None, skipping.")
                continue

            batch_data.append(data)
            batch_annotations.append(annotation)

    return np.array(batch_data), np.array(batch_annotations)

Epoch 11: val_CER did not improve from 1.00000

so all i get this output
Epoch 11: val_CER did not improve from 1.00000

am i doing something wrong
i'm testing on thins captcha type
any advises regarding this

Unable to execute Train.py file in Captcha to text project

Getting this error

n
model.fit(
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/keras/src/utils/traceback_utils.py", line 122, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/mltu/tensorflow/metrics.py", line 58, in update_state
self.batch_counter.assign_add(len(y_true))
^^^^^^^^^^^
TypeError: len is not well defined for a symbolic Tensor (data_1:0). Please call x.shape rather than len(x) for shape information.

is this code compatible with current versions of tensorflow and mltu?
@pythonlessons I hope you reply on this Thank you

Problem with fit function

Hi, thanks a lot for helping me, I'm really struggling with this homework.

I'm a CS student with pretty mediocre coding abilities and also new to deep learning, so I asked for help from some classmates who watched your tutorial and succeeded with your method, to work on this Captcha recognition project, but this is an issue that none of them have encountered.

I'm running my code on google colab, and here are some of details of my implementation:

training code: https://colab.research.google.com/drive/1scQlm4hHoxGjS74537kAcELKGrGdLKxA?usp=sharing
mltu folder: https://drive.google.com/drive/folders/1V1ozlK1CmoYH8vSHuZaIeaHkII3NeioQ?usp=sharing
dataset: https://drive.google.com/drive/folders/1o49WI0O4x1HIU54eFuhovvS1g0aK5UTo?usp=sharing

I uploaded the dataset provided by my professor and the mltu-1.0.8 folder to my google drive
I used these two lines of code for my colab notebook to gain access to my google drive:
from google.colab import drive
drive.mount('/content/drive/', force_remount=True)
My classmates were using a linux environment with python version=3.9.16, and after some trial and error, they found that some additional libraries have to be installed, therefore the two cells in my training.ipynb file:

!pip install PyYAML>=6.0
!pip install tqdm
!pip install pandas
!pip install numpy
!pip install opencv-python
!pip install onnxruntime
!pip install librosa==0.9.2
!pip install matplotlib
!pip install onnx==1.12.0
!pip install tensorflow==2.10
!pip install tf2onnx

!apt-get install python3.9
!ln -sf /usr/bin/python3.9 /usr/local/bin/python
!python --version

The rest are some minor changes to the file paths(point to my goodle drive folder) and config parameters (self.vocab, self.height, self.width, etc.)
All cells can run without encountering any errors until the last cell:

model.fit(
train_data_provider,
validation_data=val_data_provider,
epochs=configs.train_epochs,
callbacks=[earlystopper, checkpoint, trainLogger, reduceLROnPlat, tb_callback, model2onnx],
workers=configs.train_workers
)

where the error popped up:
ValueError: Failed to find data adapter that can handle input: <class 'drive.MyDrive.mltu.dataProvider.DataProvider'>, <class 'NoneType'>

I'm guessing the problem came from environmental issues (eg. I didn'tproperly change the version of python, or the versions of python, tensorflow and keras are not compatible), but I'm really not sure (sorry for my lack of skills).

If you need any other details of my implementation to find out what caused the error, please let me know. I really appreciate the help since the deadline of this homework is near.

Dropout with Batch Normalization Disharmony

I have read somewhere that using Dropout and Batch Normalization together leads to a worse performance. I have noticed in your code that you do that. What is your opinion and your experience on this?

I have issue with model.fit class NoneType

Hi, thanks a lot for helping me, I'm really struggling with this homework.

I'm running my code on my laptop. I am using your dataset in the Tutorial in 5. Speech to text.
here are some of details of my implementation:

training code in google colab: https://drive.google.com/file/d/1UGk49m0qeAb8XMEFCeJq6n8hOEvfg_da/view?usp=sharing
training code in laptop: https://drive.google.com/file/d/1OxuDo-rvNJM2j5kjUMdJpqKk1YpCT5CB/view?usp=drive_link
I uploaded the dataset provided by my professor and the mltu just update today

All cells can run without encountering any errors until the last cell:

where the error popped up:
ValueError: Failed to find data adapter that can handle input: <class 'mltu.torch.dataProvider.DataProvider'>, <class 'NoneType'>

If you need any other details of my implementation to find out what caused the error, please let me know. I really appreciate the help.

model.onnx file is not creating

hi! first of all when I am creating model, onnx file is not generating. And while training model, do we need to name the image with captcha having characters ???????????????????????????????????????????????????????????????????????????????

Saving and Loading model errors

Hi,
I am trying to train my model on my database according to the tutorial and sometimes the training takes quite a long time so I wanted to load the model saved by callback using this code:

        if os.path.exists("Model/model.h5"):
            HTR_Model = load_model("Model/model.h5")
            new_model = False
        else:
            img_shape = (self.height, self.width, 3)
            HTR_Model = self.HTR_Model(img_shape, characters_num, vocab)
            HTR_Model.compile_model()
            HTR_Model.summary(line_length=110)
            new_model = True

And then continue training with this code:

        earlystopper = EarlyStopping(monitor='val_CER', patience=20, verbose=1, mode='min')
        checkpoint = ModelCheckpoint("Model/model.h5", monitor='val_CER', verbose=1, save_best_only=True, mode='min')
        trainLogger = TrainLogger("Model")
        tb_callback = TensorBoard('Model/logs', update_freq=1)
        reduceLROnPlat = ReduceLROnPlateau(monitor='val_CER', factor=0.9, min_delta=1e-10, patience=10, verbose=1,
                                           mode='auto')
        model2onnx = Model2onnx("Model/model.h5")

        if new_model is True:
            HTR_Model.train(training_data,
                            val_data,
                            epochs=1000,
                            workers=20,
                            callbacks=[earlystopper, checkpoint, trainLogger, reduceLROnPlat, tb_callback, model2onnx])
        else:
            HTR_Model.fit(training_data,
                          validation_data=val_data,
                          epochs=1000,
                          workers=20,
                          callbacks=[earlystopper, checkpoint, trainLogger, reduceLROnPlat, tb_callback, model2onnx],
                          )

Unfortunately I encountered the following error:
ValueError: Unknown loss function: CTCloss. Please ensure this object is passed to the custom_objects argument.

So I tried to add this argument like this:

HTR_Model = load_model("Model/model.h5", custom_objects={'CTCloss': CTCloss})

But It didn't work and I got this error:
TypeError: CTCloss.__init__() got an unexpected keyword argument reduction

I couldn't solve it so I started looking for other ways to load the model. This time I tried to do it by saving the file in .tf format and load it without custom_objects argument and it caused an error:
Unable to restore custom object of type _tf_keras_metric. Please make sure that any custom layers are included in the custom_objects arg when calling load_model() and make sure that all layers implement get_config and from_config.

After that I added argument like this:

HTR_Model = load_model("Model/model.tf", custom_objects={'CERMetric': CERMetric(vocabulary=vocab), 'WERMetric': WERMetric(vocabulary=vocab)})

And the error was
TypeError: CERMetric.__init__() missing 1 required positional argument: 'vocabulary'
Even though I used this argument. The only thing that works is this code:

HTR_Model = load_model("Model/model.h5", compile=False)
HTR_Model.compile(loss=CTCloss(), metrics=[CERMetric(vocabulary=vocab), WERMetric(vocabulary=vocab)], run_eagerly=False)

But it doesn't seem to be loading all these weights. I also tried using BackupAndRestore and picked up where I left off but still couldn't see if it saves those weights and continues using them. So Is it possible to somehow load a saved model while training is interrupted and continue training it so that it stays in accordance with the tutorial? (For example, I have epoch 53 /1000 and I see that the best value yet was saved to the model.h5 file at 52 epoch so I stop learning and then I want to load the saved model at epoch 52 and continue from there)

Not Able to Download the Dataset

If I try downloading the dataset using the URL it shows Internal Server Error

dataset_path = stow.join('Datasets', 'IAM_Words')
if not stow.exists(dataset_path):
download_and_unzip('https://git.io/J0fjL', extract_to='Datasets')

file = tarfile.open(stow.join(dataset_path, "words.tgz"))
file.extractall(stow.join(dataset_path, "words"))

This segment of code also shows error

about your augmentators ultility

I saw that your augmentators ultility has no distortion technique, can you somehow add it?
Thank you so much!

no module named tf2onnx

it seems that using 'pip install mltu' doesn't install tf2onnx

from mltu.tensorflow.callbacks import Model2onnx
#callbacks
model2onnx = Model2onnx(f"{configs.model_path}/model.h5")

cause after the train process has been done, it will return error: No module named tf2onnx

so now i only have model.h5, but since I use CTCloss and CWERMetric and stuff like this, I have a hard time trying to export it to onnx file
is there any function in this module that can do this for me? I can only find callback function not direct tf to onnx.

Compatibility Issue with Protobuf Versions in TensorFlow ONNX Conversion

I'm new to Machine Learning and I'm currently encountering an issue during my machine learning training related to TensorFlow and ONNX.

When installing TensorFlow, I encounter an error related to onnxconverter-common and protobuf version compatibility. The specific error message is:
onnxconverter-common 1.14.0 requires protobuf==3.20.2, but you have protobuf 4.25.1 which is incompatible.

Attempting to resolve this by installing protobuf 3.20.2 leads to another issue where TensorFlow requires protobuf 4.25.1. Furthermore, I'm experiencing an error that states: 'FuncGraph' object has no attribute '_captures'. This occurs during the training process, specifically at Epoch 122 with the message 'early stopping'.

Current Environment:
mltu-1.1.7
tensorflow-2.12.0
Python 3.9.18
Win 11

I've tried adjusting the versions of protobuf to meet the requirements of both tensorflow and onnxconverter-common, but this leads to a conflict where either of the two doesn't function properly.

Can someone guide me on how to resolve these compatibility issues? Any suggestions on how to correctly configure my environment or alternative approaches to avoid these conflicts would be greatly appreciated.