pythonlessons / mltu Goto Github PK
View Code? Open in Web Editor NEWMachine Learning Training Utilities (for TensorFlow and PyTorch)
License: MIT License
Machine Learning Training Utilities (for TensorFlow and PyTorch)
License: MIT License
tensor flow version 2.10 is not available on pip
import cv2
import typing
import numpy as np
from mltu.inferenceModel import OnnxInferenceModel
from mltu.utils.text_utils import ctc_decoder, get_cer, get_wer
from mltu.transformers import ImageResizer
class ImageToWordModel(OnnxInferenceModel):
def init(self, char_list: typing.Union[str, list], *args, **kwargs):
super().init(*args, **kwargs)
self.char_list = char_list
def predict(self, image: np.ndarray):
image = ImageResizer.resize_maintaining_aspect_ratio(image, *self.input_shape[:2][::-1])
image_pred = np.expand_dims(image, axis=0).astype(np.float32)
preds = self.model.run(None, {self.input_name: image_pred})[0]
text = ctc_decoder(preds, self.char_list)[0]
return text
if name == "main":
import pandas as pd
from tqdm import tqdm
from mltu.configs import BaseModelConfigs
configs = BaseModelConfigs.load("Models/04_sentence_recognition/202301131202/configs.yaml")
model = ImageToWordModel(model_path=configs.model_path, char_list=configs.vocab)
df = pd.read_csv("Models/04_sentence_recognition/202301131202/val.csv").values.tolist()
accum_cer, accum_wer = [], []
for image_path, label in tqdm(df):
image = cv2.imread(image_path)
prediction_text = model.predict(image)
cer = get_cer(prediction_text, label)
wer = get_wer(prediction_text, label)
print("Image: ", image_path)
print("Label:", label)
print("Prediction: ", prediction_text)
print(f"CER: {cer}; WER: {wer}")
accum_cer.append(cer)
accum_wer.append(wer)
cv2.imshow(prediction_text, image)
cv2.waitKey(0)
cv2.destroyAllWindows()
print(f"Average CER: {np.average(accum_cer)}, Average WER: {np.average(accum_wer)}")
When i run this code i get this error this code is the same as your tutorial on sentence recognition (inferenceModel.py)
hi! first of all when I am creating model, onnx file is not generating. And while training model, do we need to name the image with captcha having characters ???????????????????????????????????????????????????????????????????????????????
it seems that using 'pip install mltu' doesn't install tf2onnx
from mltu.tensorflow.callbacks import Model2onnx
#callbacks
model2onnx = Model2onnx(f"{configs.model_path}/model.h5")
cause after the train process has been done, it will return error: No module named tf2onnx
so now i only have model.h5, but since I use CTCloss and CWERMetric and stuff like this, I have a hard time trying to export it to onnx file
is there any function in this module that can do this for me? I can only find callback function not direct tf to onnx.
Traceback (most recent call last):
File "c:\Users\91986\Desktop\DEVELOPMENT\Mini Project sem 5\text-recog\train.py", line 7, in
from mltu.preprocessors import ImageReader
ModuleNotFoundError: No module named 'mltu'
Hi @pythonlessons,
I'm trying to use the image to word Tutorial.
I changed the train.py a bit in order to read my images and labels better. The only change was in def read_annotation_file:
Old Code:
def read_annotation_file(annotation_path):
dataset, vocab, max_len = [], set(), 0
with open(annotation_path, "r") as f:
for line in tqdm(f.readlines()):
line = line.split()
image_path = data_path + line[0][1:]
label = line[0].split("_")[1]
dataset.append([image_path, label])
vocab.update(list(label))
max_len = max(max_len, len(label))
return dataset, sorted(vocab), max_len
New Code:
def read_annotation_file(annotation_path):
dataset, vocab, max_len = [], set(), 0
with open(annotation_path, "r") as f:
for line in tqdm(f.readlines()):
line = line.split(' ')
image_path = data_path + line[0]
label = line[1]
dataset.append([image_path, label])
vocab.update(list(label))
max_len = max(max_len, len(label))
return dataset, sorted(vocab), max_len
I also changed something in line 91 due to an error:
Old Code:
metrics=[CWERMetric()],
New Code:
metrics=[CWERMetric('accuracy')],
This is my error:
Traceback (most recent call last):
File "/path/to/mltu/Tutorials/01_image_to_word/train.py", line 111, in
model.fit(
File "/path/to/.local/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 70, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/path/to/.local/lib/python3.10/site-packages/keras/engine/data_adapter.py", line 1083, in select_data_adapter
raise ValueError(
ValueError: Failed to find data adapter that can handle input: <class 'mltu.dataProvider.DataProvider'>, <class 'NoneType'>
How do I need to change train_data_provider or train_dataset or is it a version problem ?
I f you want to use a pre trained Transformer
for the same task, how would you use it instead of LSTM
here? For example I want to use a lightweight BERT
model, what would ne the changes to the line in the end? Trying to grasp the knowledge of the architecture.
squeezed = layers.Reshape((x7.shape[-3] * x7.shape[-2], x7.shape[-1]))(x7)
blstm = layers.Bidirectional(layers.LSTM(64, return_sequences=True))(squeezed)
output = layers.Dense(output_dim + 1, activation='softmax', name="output")(blstm)
model = Model(inputs=inputs, outputs=output)
Hi, thanks a lot for helping me, I'm really struggling with this homework.
I'm a CS student with pretty mediocre coding abilities and also new to deep learning, so I asked for help from some classmates who watched your tutorial and succeeded with your method, to work on this Captcha recognition project, but this is an issue that none of them have encountered.
I'm running my code on google colab, and here are some of details of my implementation:
training code: https://colab.research.google.com/drive/1scQlm4hHoxGjS74537kAcELKGrGdLKxA?usp=sharing
mltu folder: https://drive.google.com/drive/folders/1V1ozlK1CmoYH8vSHuZaIeaHkII3NeioQ?usp=sharing
dataset: https://drive.google.com/drive/folders/1o49WI0O4x1HIU54eFuhovvS1g0aK5UTo?usp=sharing
I uploaded the dataset provided by my professor and the mltu-1.0.8 folder to my google drive
I used these two lines of code for my colab notebook to gain access to my google drive:
from google.colab import drive
drive.mount('/content/drive/', force_remount=True)
My classmates were using a linux environment with python version=3.9.16, and after some trial and error, they found that some additional libraries have to be installed, therefore the two cells in my training.ipynb file:
!pip install PyYAML>=6.0
!pip install tqdm
!pip install pandas
!pip install numpy
!pip install opencv-python
!pip install onnxruntime
!pip install librosa==0.9.2
!pip install matplotlib
!pip install onnx==1.12.0
!pip install tensorflow==2.10
!pip install tf2onnx
!apt-get install python3.9
!ln -sf /usr/bin/python3.9 /usr/local/bin/python
!python --version
The rest are some minor changes to the file paths(point to my goodle drive folder) and config parameters (self.vocab, self.height, self.width, etc.)
All cells can run without encountering any errors until the last cell:
model.fit(
train_data_provider,
validation_data=val_data_provider,
epochs=configs.train_epochs,
callbacks=[earlystopper, checkpoint, trainLogger, reduceLROnPlat, tb_callback, model2onnx],
workers=configs.train_workers
)
where the error popped up:
ValueError: Failed to find data adapter that can handle input: <class 'drive.MyDrive.mltu.dataProvider.DataProvider'>, <class 'NoneType'>
I'm guessing the problem came from environmental issues (eg. I didn'tproperly change the version of python, or the versions of python, tensorflow and keras are not compatible), but I'm really not sure (sorry for my lack of skills).
If you need any other details of my implementation to find out what caused the error, please let me know. I really appreciate the help since the deadline of this homework is near.
on PyPi (https://pypi.org/project/mltu/#history), can you please add it?
Thanks in advance!
Hey
I use tensorflow 2.4.1 and must change "import keras" to "import tensorflow.keras"
Now i have the follow issues in train.py "model.fit" so in model.py
Epoch 1/100
Traceback (most recent call last):
File "c:/Tutorials/TensorFlow/image_to_word/train.py", line 109, in
model.fit(
File "C:\appdata\local\programs\python\python38\lib\site-packages\tensorflow\python\keras\engine\training.py", line 1100, in fit
tmp_logs = self.train_function(iterator)
File "C:\appdata\local\programs\python\python38\lib\site-packages\tensorflow\python\eager\def_function.py", line 828, in call
result = self._call(*args, **kwds)
File "C:\appdata\local\programs\python\python38\lib\site-packages\tensorflow\python\eager\def_function.py", line 871, in _call
self._initialize(args, kwds, add_initializers_to=initializers)
File "C:\appdata\local\programs\python\python38\lib\site-packages\tensorflow\python\eager\def_function.py", line 725, in _initialize
self._stateful_fn._get_concrete_function_internal_garbage_collected( # pylint: disable=protected-access
File "C:\appdata\local\programs\python\python38\lib\site-packages\tensorflow\python\eager\function.py", line 2969, in _get_concrete_function_internal_garbage_collected
graph_function, _ = self._maybe_define_function(args, kwargs)
File "C:\appdata\local\programs\python\python38\lib\site-packages\tensorflow\python\eager\function.py", line 3361, in _maybe_define_function graph_function = self._create_graph_function(args, kwargs)
File "C:\appdata\local\programs\python\python38\lib\site-packages\tensorflow\python\eager\function.py", line 3196, in _create_graph_function func_graph_module.func_graph_from_py_func(
File "C:\appdata\local\programs\python\python38\lib\site-packages\tensorflow\python\framework\func_graph.py", line 990, in func_graph_from_py_func
func_outputs = python_func(*func_args, **func_kwargs)
File "C:\appdata\local\programs\python\python38\lib\site-packages\tensorflow\python\eager\def_function.py", line 634, in wrapped_fn
out = weak_wrapped_fn().wrapped(*args, **kwds)
File "C:\appdata\local\programs\python\python38\lib\site-packages\tensorflow\python\framework\func_graph.py", line 977, in wrapper
raise e.ag_error_metadata.to_exception(e)
TypeError: in user code:
C:\appdata\local\programs\python\python38\lib\site-packages\tensorflow\python\keras\engine\training.py:805 train_function *
return step_function(self, iterator)
C:e\appdata\local\programs\python\python38\lib\site-packages\tensorflow\python\keras\engine\training.py:795 step_function **
outputs = model.distribute_strategy.run(run_step, args=(data,))
C:\appdata\local\programs\python\python38\lib\site-packages\tensorflow\python\distribute\distribute_lib.py:1259 run
return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs)
C:\appdata\local\programs\python\python38\lib\site-packages\tensorflow\python\distribute\distribute_lib.py:2730 call_for_each_replica
return self._call_for_each_replica(fn, args, kwargs)
C:\appdata\local\programs\python\python38\lib\site-packages\tensorflow\python\distribute\distribute_lib.py:3417 _call_for_each_replica
return fn(*args, **kwargs)
C:\appdata\local\programs\python\python38\lib\site-packages\tensorflow\python\keras\engine\training.py:788 run_step **
outputs = model.train_step(data)
C:\appdata\local\programs\python\python38\lib\site-packages\tensorflow\python\keras\engine\training.py:759 train_step
return {m.name: m.result() for m in self.metrics}
C:\appdata\local\programs\python\python38\lib\site-packages\tensorflow\python\keras\engine\training.py:759 <dictcomp>
return {m.name: m.result() for m in self.metrics}
C:\appdata\local\programs\python\python38\lib\site-packages\tensorflow\python\keras\utils\metrics_utils.py:122 decorated
result_t = array_ops.identity(result_fn(*args))
C:\appdata\local\programs\python\python38\lib\site-packages\tensorflow\python\util\dispatch.py:201 wrapper
return target(*args, **kwargs)
C:\appdata\local\programs\python\python38\lib\site-packages\tensorflow\python\ops\array_ops.py:287 identity
ret = gen_array_ops.identity(input, name=name)
C:\appdata\local\programs\python\python38\lib\site-packages\tensorflow\python\ops\gen_array_ops.py:3941 identity
_, _, _op, _outputs = _op_def_library._apply_op_helper(
C:\appdata\local\programs\python\python38\lib\site-packages\tensorflow\python\framework\op_def_library.py:525 _apply_op_helper
raise err
C:\appdata\local\programs\python\python38\lib\site-packages\tensorflow\python\framework\op_def_library.py:517 _apply_op_helper
values = ops.convert_to_tensor(
C:\appdata\local\programs\python\python38\lib\site-packages\tensorflow\python\profiler\trace.py:163 wrapped
return func(*args, **kwargs)
C:\appdata\local\programs\python\python38\lib\site-packages\tensorflow\python\framework\ops.py:1540 convert_to_tensor
ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
C:\appdata\local\programs\python\python38\lib\site-packages\tensorflow\python\framework\constant_op.py:339 _constant_tensor_conversion_function
return constant(v, dtype=dtype, name=name)
C:\appdata\local\programs\python\python38\lib\site-packages\tensorflow\python\framework\constant_op.py:264 constant
return _constant_impl(value, dtype, shape, name, verify_shape=False,
C:\appdata\local\programs\python\python38\lib\site-packages\tensorflow\python\framework\constant_op.py:281 _constant_impl
tensor_util.make_tensor_proto(
C:\appdata\local\programs\python\python38\lib\site-packages\tensorflow\python\framework\tensor_util.py:457 make_tensor_proto
_AssertCompatible(values, dtype)
C:\appdata\local\programs\python\python38\lib\site-packages\tensorflow\python\framework\tensor_util.py:334 _AssertCompatible
raise TypeError("Expected any non-tensor type, got a tensor instead.")
TypeError: Expected any non-tensor type, got a tensor instead.
Model folder is empty in main branch. Please help me out, I want to run captcha to image code.
Hi!
FIrst of all thanks alot. Now question is that I am having my 800 images with having image name captch1,captcha2,captcha3 and so on. I want to ask that while training the model, do I need to give the images the solved captcha characters as you gave like x223g,ss23d and 7d7df etc.
I have read somewhere that using Dropout and Batch Normalization together leads to a worse performance. I have noticed in your code that you do that. What is your opinion and your experience on this?
I am looking a the process_data(self, batch_data) function under the DataProvider class, there it can be seen that for each "batch data", i.e. a labeled example, all augmentors are applied in order then all transformers are applied in order:
# Then augment, transform and postprocess the batch data
for objects in [self._augmentors, self._transformers]:
for object in objects:
data, annotation = object(data, annotation)
Isn't the purpose of augmentors to add more examples to increase the training set? i.e., for each example, it should add to the training set both the original example, as well as the an "augmented" variation, preferably multiple augmented versions per a single example?
I am I misunderstanding?
Seems like process_data(self, batch_data) should look something like this:
def process_data(self, batch_data):
""" Process data batch of data """
if self._use_cache and batch_data[0] in self._cache:
data, annotation = copy.deepcopy(self._cache[batch_data[0]])
else:
data, annotation = batch_data
for preprocessor in self._data_preprocessors:
data, annotation = preprocessor(data, annotation)
if data is None or annotation is None:
self.logger.warning("Data or annotation is None, marking for removal on epoch end.")
self._on_epoch_end_remove.append(batch_data)
return None, None
if self._use_cache and batch_data[0] not in self._cache:
self._cache[batch_data[0]] = (copy.deepcopy(data), copy.deepcopy(annotation))
# Then transform, augment and postprocess the batch data
for transformer in self._transformers:
data, annotation = transformer(data, annotation)
augmented_data_list = []
if len(self._augmentors) > 0:
for i in range(self._variation_count): # generate multiple variations using specified augmentors
augmented_data = data
for augmentor in self._augmentors:
augmented_data, annotation = augmentor(augmented_data, annotation)
augmented_data_list.append((augmented_data, annotation))
all_data_list = []
for data, annotation in [(data, annotation)] + augmented_data_list:
# Convert to numpy array if not already
if not isinstance(data, np.ndarray):
data = data.numpy()
# Convert to numpy array if not already
# TODO: This is a hack, need to fix this
if not isinstance(annotation, (np.ndarray, int, float, str, np.uint8, float)):
annotation = annotation.numpy()
all_data_list.append((data, annotation))
return all_data_list
With getitem(self, index: int) looking something like this:
def __getitem__(self, index: int):
""" Returns a batch of data by batch index"""
dataset_batch = self.get_batch_annotations(index)
# First read and preprocess the batch data
batch_data, batch_annotations = [], []
for index, batch in enumerate(dataset_batch):
for data, annotation in self.process_data(batch):
if data is None or annotation is None:
self.logger.warning("Data or annotation is None, skipping.")
continue
batch_data.append(data)
batch_annotations.append(annotation)
return np.array(batch_data), np.array(batch_annotations)
@pythonlessons please help me
I am using Pythong 3.8.17. mltu==0.1.4 and Tensorflow 2.10.0. when I try to run train.py it gives this.
raceback (most recent call last):
File "train.py", line 7, in <module>
from mltu.tensorflow.dataProvider import DataProvider
ModuleNotFoundError: No module named 'mltu.tensorflow'
Hi @pythonlessons,
I think, Class SpectrogramPadding has a bag.
Original code:
A short spectrogram data is slide to backward(tail) of padded_spectrogram.
But, I want, A short spectrogram data will be slide to forward(head) of padded_spectrogram.
class SpectrogramPadding(Transformer):
"""Pad spectrogram to max_spectrogram_length
Attributes:
max_spectrogram_length (int): Maximum length of spectrogram
padding_value (int): Value to pad
"""
def __init__(
self,
max_spectrogram_length: int,
padding_value: int
) -> None:
self.max_spectrogram_length = max_spectrogram_length
self.padding_value = padding_value
def __call__(self, spectrogram: np.ndarray, label: np.ndarray):
padded_spectrogram = np.pad(spectrogram, ((self.max_spectrogram_length - spectrogram.shape[0], 0),(0,0)), mode="constant", constant_values=self.padding_value)
return padded_spectrogram, label
New code:
class SpectrogramPadding(Transformer):
"""Pad spectrogram to max_spectrogram_length
Attributes:
max_spectrogram_length (int): Maximum length of spectrogram
padding_value (int): Value to pad
"""
def __init__(
self,
max_spectrogram_length: int,
padding_value: int,
append: bool = True
) -> None:
self.max_spectrogram_length = max_spectrogram_length
self.padding_value = padding_value
self.append=append
def __call__(self, spectrogram: np.ndarray, label: np.ndarray):
#print('spectrogram.shape:',spectrogram.shape)
# spectrogram.shape: (1032, 193)
if self.append==False:
padded_spectrogram = np.pad(spectrogram,
((self.max_spectrogram_length - spectrogram.shape[0], 0),(0,0)),mode="constant",constant_values=self.padding_value)
else:
l,h =spectrogram.shape
lng = self.max_spectrogram_length - l
if lng > 0:
a = np.full((lng,h),self.padding_value)
padded_spectrogram = np.append(spectrogram, a, axis=0)
else:
padded_spectrogram = spectrogram
return padded_spectrogram, label
Hi! I am trying to fine-tune the wav2vec2 model from your "10_wav2vec2_torch" tutorial. As far as I know, my dataset is in a similar format to the LJ Speech Dataset that you are using as an example. There is a 'wavs' folder which contains the audio files, and a 'metadata.csv' file that has rows of pipe-separated transcriptions. I have been able to successfully run the train.py script on the default dataset (LJ Speech Dataset), but when I use my own dataset, I get this output on the terminal. Am I missing something?
Some weights of the model checkpoint at facebook/wav2vec2-base-960h were not used when initializing Wav2Vec2ForCTC: ['wav2vec2.encoder.pos_conv_embed.conv.weight_v', 'wav2vec2.encoder.pos_conv_embed.conv.weight_g']
- This IS expected if you are initializing Wav2Vec2ForCTC from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing Wav2Vec2ForCTC from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of Wav2Vec2ForCTC were not initialized from the model checkpoint at facebook/wav2vec2-base-960h and are newly initialized: ['wav2vec2.encoder.pos_conv_embed.conv.parametrizations.weight.original1', 'wav2vec2.masked_spec_embed', 'wav2vec2.encoder.pos_conv_embed.conv.parametrizations.weight.original0']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Some weights of Wav2Vec2ForCTC were not initialized from the model checkpoint at facebook/wav2vec2-base-960h and are newly initialized because the shapes did not match:
- lm_head.bias: found shape torch.Size([32]) in the checkpoint and torch.Size([29]) in the model instantiated
- lm_head.weight: found shape torch.Size([32, 768]) in the checkpoint and torch.Size([29, 768]) in the model instantiated
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Cuda Device Available.
INFO:WarmupCosineDecay:Epoch 1 - Learning Rate: 1e-08
0%| | 0/18 [00:00<?, ?it/s]/home/ee/anaconda3/lib/python3.9/site-packages/mltu/transformers.py:234: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
return padded_audios, np.array(label)
Epoch 1 - loss: 25.1576 - CER: 4.2681 - WER: 1.0000: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββ| 18/18 [00:08<00:00, 2.06it/s]
0%| | 0/2 [00:00<?, ?it/s]Exception in thread Thread-19:
Exception in thread Thread-15:
Traceback (most recent call last):
File "/home/ee/anaconda3/lib/python3.9/threading.py", line 980, in _bootstrap_inner
Exception in thread Thread-16:
Traceback (most recent call last):
File "/home/ee/anaconda3/lib/python3.9/threading.py", line 980, in _bootstrap_inner
Exception in thread Thread-14:
Traceback (most recent call last):
File "/home/ee/anaconda3/lib/python3.9/threading.py", line 980, in _bootstrap_inner
self.run()
File "/home/ee/anaconda3/lib/python3.9/threading.py", line 917, in run
self._target(*self._args, **self._kwargs)
File "/home/ee/anaconda3/lib/python3.9/site-packages/mltu/torch/dataProvider.py", line 245, in worker_function
result = self.function(data_index)
File "/home/ee/anaconda3/lib/python3.9/site-packages/mltu/dataProvider.py", line 287, in __getitem__
batch_data, batch_annotations = batch_postprocessor(batch_data, batch_annotations)
File "/home/ee/anaconda3/lib/python3.9/site-packages/mltu/transformers.py", line 222, in __call__
max_len = max([len(a) for a in audio])
ValueError: max() arg is an empty sequence
self.run()
File "/home/ee/anaconda3/lib/python3.9/threading.py", line 917, in run
self._target(*self._args, **self._kwargs)
File "/home/ee/anaconda3/lib/python3.9/site-packages/mltu/torch/dataProvider.py", line 245, in worker_function
result = self.function(data_index)
File "/home/ee/anaconda3/lib/python3.9/site-packages/mltu/dataProvider.py", line 287, in __getitem__
batch_data, batch_annotations = batch_postprocessor(batch_data, batch_annotations)
File "/home/ee/anaconda3/lib/python3.9/site-packages/mltu/transformers.py", line 222, in __call__
max_len = max([len(a) for a in audio])
ValueError: max() arg is an empty sequence
self.run()
File "/home/ee/anaconda3/lib/python3.9/threading.py", line 917, in run
self._target(*self._args, **self._kwargs)
File "/home/ee/anaconda3/lib/python3.9/site-packages/mltu/torch/dataProvider.py", line 245, in worker_function
result = self.function(data_index)
File "/home/ee/anaconda3/lib/python3.9/site-packages/mltu/dataProvider.py", line 287, in __getitem__
batch_data, batch_annotations = batch_postprocessor(batch_data, batch_annotations)
File "/home/ee/anaconda3/lib/python3.9/site-packages/mltu/transformers.py", line 222, in __call__
max_len = max([len(a) for a in audio])
ValueError: max() arg is an empty sequence
Traceback (most recent call last):
File "/home/ee/anaconda3/lib/python3.9/threading.py", line 980, in _bootstrap_inner
self.run()
File "/home/ee/anaconda3/lib/python3.9/threading.py", line 917, in run
self._target(*self._args, **self._kwargs)
File "/home/ee/anaconda3/lib/python3.9/site-packages/mltu/torch/dataProvider.py", line 245, in worker_function
result = self.function(data_index)
File "/home/ee/anaconda3/lib/python3.9/site-packages/mltu/dataProvider.py", line 287, in __getitem__
batch_data, batch_annotations = batch_postprocessor(batch_data, batch_annotations)
File "/home/ee/anaconda3/lib/python3.9/site-packages/mltu/transformers.py", line 222, in __call__
max_len = max([len(a) for a in audio])
ValueError: max() arg is an empty sequence
Exception in thread Thread-18:
Traceback (most recent call last):
File "/home/ee/anaconda3/lib/python3.9/threading.py", line 980, in _bootstrap_inner
self.run()
File "/home/ee/anaconda3/lib/python3.9/threading.py", line 917, in run
self._target(*self._args, **self._kwargs)
File "/home/ee/anaconda3/lib/python3.9/site-packages/mltu/torch/dataProvider.py", line 245, in worker_function
result = self.function(data_index)
File "/home/ee/anaconda3/lib/python3.9/site-packages/mltu/dataProvider.py", line 287, in __getitem__
batch_data, batch_annotations = batch_postprocessor(batch_data, batch_annotations)
File "/home/ee/anaconda3/lib/python3.9/site-packages/mltu/transformers.py", line 222, in __call__
max_len = max([len(a) for a in audio])
ValueError: max() arg is an empty sequence
Exception in thread Thread-23:
Traceback (most recent call last):
File "/home/ee/anaconda3/lib/python3.9/threading.py", line 980, in _bootstrap_inner
self.run()
File "/home/ee/anaconda3/lib/python3.9/threading.py", line 917, in run
self._target(*self._args, **self._kwargs)
File "/home/ee/anaconda3/lib/python3.9/site-packages/mltu/torch/dataProvider.py", line 245, in worker_function
result = self.function(data_index)
File "/home/ee/anaconda3/lib/python3.9/site-packages/mltu/dataProvider.py", line 287, in __getitem__
batch_data, batch_annotations = batch_postprocessor(batch_data, batch_annotations)
File "/home/ee/anaconda3/lib/python3.9/site-packages/mltu/transformers.py", line 222, in __call__
max_len = max([len(a) for a in audio])
ValueError: max() arg is an empty sequence
Exception in thread Thread-21:
Traceback (most recent call last):
File "/home/ee/anaconda3/lib/python3.9/threading.py", line 980, in _bootstrap_inner
self.run()
File "/home/ee/anaconda3/lib/python3.9/threading.py", line 917, in run
self._target(*self._args, **self._kwargs)
File "/home/ee/anaconda3/lib/python3.9/site-packages/mltu/torch/dataProvider.py", line 245, in worker_function
result = self.function(data_index)
File "/home/ee/anaconda3/lib/python3.9/site-packages/mltu/dataProvider.py", line 287, in __getitem__
batch_data, batch_annotations = batch_postprocessor(batch_data, batch_annotations)
File "/home/ee/anaconda3/lib/python3.9/site-packages/mltu/transformers.py", line 222, in __call__
max_len = max([len(a) for a in audio])
ValueError: max() arg is an empty sequence
Exception in thread Thread-22:
Traceback (most recent call last):
File "/home/ee/anaconda3/lib/python3.9/threading.py", line 980, in _bootstrap_inner
self.run()
File "/home/ee/anaconda3/lib/python3.9/threading.py", line 917, in run
self._target(*self._args, **self._kwargs)
File "/home/ee/anaconda3/lib/python3.9/site-packages/mltu/torch/dataProvider.py", line 245, in worker_function
result = self.function(data_index)
File "/home/ee/anaconda3/lib/python3.9/site-packages/mltu/dataProvider.py", line 287, in __getitem__
batch_data, batch_annotations = batch_postprocessor(batch_data, batch_annotations)
File "/home/ee/anaconda3/lib/python3.9/site-packages/mltu/transformers.py", line 222, in __call__
max_len = max([len(a) for a in audio])
ValueError: max() arg is an empty sequence
Hi, thanks a lot for helping me, I'm really struggling with this homework.
I'm running my code on my laptop. I am using your dataset in the Tutorial in 5. Speech to text.
here are some of details of my implementation:
training code in google colab: https://drive.google.com/file/d/1UGk49m0qeAb8XMEFCeJq6n8hOEvfg_da/view?usp=sharing
training code in laptop: https://drive.google.com/file/d/1OxuDo-rvNJM2j5kjUMdJpqKk1YpCT5CB/view?usp=drive_link
I uploaded the dataset provided by my professor and the mltu just update today
All cells can run without encountering any errors until the last cell:
model.fit(
train_data_provider,
validation_data=val_data_provider,
epochs=configs.train_epochs,
callbacks=[earlystopper, checkpoint, trainLogger, reduceLROnPlat, tb_callback, model2onnx],
workers=configs.train_workers
)
where the error popped up:
ValueError: Failed to find data adapter that can handle input: <class 'mltu.torch.dataProvider.DataProvider'>, <class 'NoneType'>
If you need any other details of my implementation to find out what caused the error, please let me know. I really appreciate the help.
Getting this error
n
model.fit(
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/keras/src/utils/traceback_utils.py", line 122, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/mltu/tensorflow/metrics.py", line 58, in update_state
self.batch_counter.assign_add(len(y_true))
^^^^^^^^^^^
TypeError: len is not well defined for a symbolic Tensor (data_1:0). Please call x.shape
rather than len(x)
for shape information.
is this code compatible with current versions of tensorflow and mltu?
@pythonlessons I hope you reply on this Thank you
configs = BaseModelConfigs.load("Models/02_captcha_to_text/202212211205/configs.yaml")
How to modify the architecture if my image shape is (160, 60)
Hello,
Can you help me to fix this?
ValueError: The filepath provided must end in .keras
(Keras model format). Received: filepath=Models/02_captcha_to_text/202403291006/model.h5
PS C:\Users\Εule meΕe\Downloads\mltu-main\mltu-main> Models\03_handwriting_recognition\202402160440\logs
Models\03_handwriting_recognition\202402160440\logs : The module 'Models' could not be loaded. For more information, run 'Import-Module Models'.
At line:1 char:1
+ CategoryInfo : ObjectNotFound: (Models\03_handw...2402160440\logs:String) [], CommandNotFoundException
+ FullyQualifiedErrorId : CouldNotAutoLoadModule
Can you help me please :)
facing this issue while running the code config.yaml misssing.
configs = BaseModelConfigs.load("Models/02_captcha_to_text/202212211205/configs.yaml")
from configs import ModelConfigs
dataset, vocab, max_len = [], set(), 0
for file in stow.ls(stow.join('Datasets', 'captcha_images_v2')):
dataset.append([stow.relpath(file), file.name])
vocab.update(list(file.name))
max_len = max(max_len, len(file.name))
configs = ModelConfigs()
# Save vocab and maximum text length to configs
configs.vocab = "".join(vocab)
configs.max_text_length = max_len
configs.save()
this thing missing there import stow
from mltu.dataProvider import DataProvider
from mltu.preprocessors import ImageReader
from mltu.transformers import ImageResizer, LabelIndexer, LabelPadding
from mltu.augmentors import RandomBrightness, RandomRotate, RandomErodeDilate
data_provider = DataProvider(
dataset=dataset,
skip_validation=True,
batch_size=configs.batch_size,
data_preprocessors=[ImageReader()],
transformers=[
ImageResizer(configs.width, configs.height),
LabelIndexer(configs.vocab),
LabelPadding(max_word_length=configs.max_text_length, padding_value=len(configs.vocab))
],
)
configs arent imported on basic also dataset is not a file at whole, the whole project as where the datagets created isn't complementary
also this isn't used: from mltu.augmentors import RandomBrightness, RandomRotate, RandomErodeDilate
hopping for rework
If I try downloading the dataset using the URL it shows Internal Server Error
dataset_path = stow.join('Datasets', 'IAM_Words')
if not stow.exists(dataset_path):
download_and_unzip('https://git.io/J0fjL', extract_to='Datasets')
file = tarfile.open(stow.join(dataset_path, "words.tgz"))
file.extractall(stow.join(dataset_path, "words"))
This segment of code also shows error
In "train.py", when I want to execute these lines of code:
model.compile(
optimizer=tf.keras.optimizers.Adam(learning_rate=configs.learning_rate),
loss=CTCloss(),
metrics=[CWERMetric()],
run_eagerly=False
)
I'm having this trouble:
"TypeError: init() missing 1 required positional argument: "padding_token""
I can see that "CWERMetrics()" came from "mltu.tensorflow.metrics".
In "metrics.py", I can see this in the "init" arguments: "(self, padding_token, name='CWER', **kwargs)"
Also I read:
self.padding_token = padding_token
And after that, in "update_state" method, I can prove that "self.padding_token" is used for the next line of code:
true_labels_sparse = tf.sparse.retain(true_labels_sparse, tf.not_equal(true_labels_sparse.values, self.padding_token))
How can I solve this problem? Thank you!
I am unable to find the words.txt annotation file in the IAM website. As mentioned in the earlier issues, the link was not working so I went directly to the IAM website. I am able to find the image folders but not the annotation text file. Without it, I don't know how to run the code. I am in a little hurry as this project would allow me to finish another similar project. It would be great if you could upload the txt file. Kindly respond as soon as possible.
Hello brother, thanks for code and the model that you have made. I'm new to neural networks and ai training.
I'm trying to train captcha solver model on this captcha types. I didn't succeed yet. First of all, because of when the training is reaching Epoch 200-350/1000, it's printing Epoch: early stopping. At first I tried to change the batch size, number of workers and training speed.
Tried 32, 64 (default), 128, 256, 270, 512 batch size and 10, 20, 30, 40 training worker amount.
I used 1000 of those captchas as a dataset and 101 different captchas for testing. The "best" result i got at batch_size=256, train_workers = 30 on a machine with 24GB RAM, used intel i5-12th CPU for training. A trained model result below:
202402212032.zip
Also thought maybe my machine was not powerful enough, so i tried a different machine with 64GB RAM, i9-13th CPU, Nvidia GeForce RTX 2080. Still got the Epoch early stopping error around 313/1000 epoch. The "trained" model folder is in below:
202402232009 i9.zip
Now, i know that i can use of of those models and continue training them.
i was hoping if you can help to configure the model architcture or configs for those captcha images. I think maybe need to make fit the image size, because while training, I was getting "libpng warning: pHYs: CRC error" a lot. I'd very grateful if you could help me with that. Thanks for your valueable time.
I'm new to Machine Learning and I'm currently encountering an issue during my machine learning training related to TensorFlow and ONNX.
When installing TensorFlow, I encounter an error related to onnxconverter-common and protobuf version compatibility. The specific error message is:
onnxconverter-common 1.14.0 requires protobuf==3.20.2, but you have protobuf 4.25.1 which is incompatible.
Attempting to resolve this by installing protobuf 3.20.2 leads to another issue where TensorFlow requires protobuf 4.25.1. Furthermore, I'm experiencing an error that states: 'FuncGraph' object has no attribute '_captures'. This occurs during the training process, specifically at Epoch 122 with the message 'early stopping'.
Current Environment:
mltu-1.1.7
tensorflow-2.12.0
Python 3.9.18
Win 11
I've tried adjusting the versions of protobuf to meet the requirements of both tensorflow and onnxconverter-common, but this leads to a conflict where either of the two doesn't function properly.
Can someone guide me on how to resolve these compatibility issues? Any suggestions on how to correctly configure my environment or alternative approaches to avoid these conflicts would be greatly appreciated.
I saw that your augmentators ultility has no distortion technique, can you somehow add it?
Thank you so much!
Hi,
I am trying to train my model on my database according to the tutorial and sometimes the training takes quite a long time so I wanted to load the model saved by callback using this code:
if os.path.exists("Model/model.h5"):
HTR_Model = load_model("Model/model.h5")
new_model = False
else:
img_shape = (self.height, self.width, 3)
HTR_Model = self.HTR_Model(img_shape, characters_num, vocab)
HTR_Model.compile_model()
HTR_Model.summary(line_length=110)
new_model = True
And then continue training with this code:
earlystopper = EarlyStopping(monitor='val_CER', patience=20, verbose=1, mode='min')
checkpoint = ModelCheckpoint("Model/model.h5", monitor='val_CER', verbose=1, save_best_only=True, mode='min')
trainLogger = TrainLogger("Model")
tb_callback = TensorBoard('Model/logs', update_freq=1)
reduceLROnPlat = ReduceLROnPlateau(monitor='val_CER', factor=0.9, min_delta=1e-10, patience=10, verbose=1,
mode='auto')
model2onnx = Model2onnx("Model/model.h5")
if new_model is True:
HTR_Model.train(training_data,
val_data,
epochs=1000,
workers=20,
callbacks=[earlystopper, checkpoint, trainLogger, reduceLROnPlat, tb_callback, model2onnx])
else:
HTR_Model.fit(training_data,
validation_data=val_data,
epochs=1000,
workers=20,
callbacks=[earlystopper, checkpoint, trainLogger, reduceLROnPlat, tb_callback, model2onnx],
)
Unfortunately I encountered the following error:
ValueError: Unknown loss function: CTCloss. Please ensure this object is passed to the custom_objects argument.
So I tried to add this argument like this:
HTR_Model = load_model("Model/model.h5", custom_objects={'CTCloss': CTCloss})
But It didn't work and I got this error:
TypeError: CTCloss.__init__() got an unexpected keyword argument reduction
I couldn't solve it so I started looking for other ways to load the model. This time I tried to do it by saving the file in .tf format and load it without custom_objects argument and it caused an error:
Unable to restore custom object of type _tf_keras_metric. Please make sure that any custom layers are included in the custom_objects arg when calling load_model() and make sure that all layers implement get_config and from_config.
After that I added argument like this:
HTR_Model = load_model("Model/model.tf", custom_objects={'CERMetric': CERMetric(vocabulary=vocab), 'WERMetric': WERMetric(vocabulary=vocab)})
And the error was
TypeError: CERMetric.__init__() missing 1 required positional argument: 'vocabulary'
Even though I used this argument. The only thing that works is this code:
HTR_Model = load_model("Model/model.h5", compile=False)
HTR_Model.compile(loss=CTCloss(), metrics=[CERMetric(vocabulary=vocab), WERMetric(vocabulary=vocab)], run_eagerly=False)
But it doesn't seem to be loading all these weights. I also tried using BackupAndRestore and picked up where I left off but still couldn't see if it saves those weights and continues using them. So Is it possible to somehow load a saved model while training is interrupted and continue training it so that it stays in accordance with the tutorial? (For example, I have epoch 53 /1000 and I see that the best value yet was saved to the model.h5 file at 52 epoch so I stop learning and then I want to load the saved model at epoch 52 and continue from there)
Hi! I trained the wav2vec2 model with perfect accuracy on my dataset. When I perform prediction on a full audio file, the transcriptions have no gaps between separate sentences. For example, in "transfer you to our new sales line please hold for a moment i will transfer you overthank you youre welcome stay in the linethank you for calling this call may be recorded", there should be gaps between 'over', 'thank', 'line' and 'thank'. I could just run this script for diarized segments of the original wav file, but I want to be able to transcribe the complete audio file in one go.
I am using 'mltu==1.1.7'. Here is the code for making predictions.
import numpy as np
from mltu.inferenceModel import OnnxInferenceModel
from mltu.utils.text_utils import ctc_decoder, get_cer, get_wer
import librosa
import pandas as pd
from tqdm import tqdm
class Wav2vec2(OnnxInferenceModel):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
def predict(self, audio: np.ndarray):
audio = np.expand_dims(audio, axis=0).astype(np.float32)
preds = self.model.run(None, {self.input_name: audio})[0]
text = ctc_decoder(preds, self.metadata["vocab"])[0]
return text
model = Wav2vec2(model_path="Models/10_wav2vec2_torch/202310311600/model.onnx")
audio_file_path = '/media/ee/New Volume/mltu/Tutorials/10_wav2vec2_torch/Datasets/comcast_xfinity_full_audios/1.wav'
audio, sr = librosa.load(audio_file_path, sr=16000)
prediction_text = model.predict(audio)
print('predicted transcript: ', prediction_text)
I'm trying to use captcha to text but I can't train my dataset like you.
When I tried with the dataset you gave, it worked without any problems, but when I changed my own images with yours, I had problems. A few examples from my dataset with 10129 images:
I made a change in train.py
file like this:
label = os.path.splitext(file)[0]
-> label = os.path.splitext(file)[0].split('-')[1]
.
Because the names of my images are not captcha_answer.png like yours, but md5hash-captcha_answer.png. So I made a change in this way and made it take the captcha_answer parameter in the same way.
In the config.py
file, since all my images are 350x100, I changed self.height = 100
and self.width = 350
. Then I got the following error. Can you help me solve this?
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
tf2onnx 1.14.0 requires flatbuffers<3.0,>=1.12, but you have flatbuffers 23.5.26 which is incompatible.
Tried different version of mltu and TensorFlow for M1 (https://developer.apple.com/metal/tensorflow-plugin/)
With others TensorFlow versions have others issues.
Hi!
Model ended training with Epoch 51: early stopping
. I am training with dataset 1040 images? I am training with my 1040 labled images given below but when I try to predict text it shows empty string.
Code ::
image = cv2.imread('./6wf4ef.jpg')
prediction_text = model.predict(image) // return empty string
print(f"Predicted Text: {prediction_text}")
Heyya! Thanks for a cool repo and intro video explaining it. I'm particularly interested in the 2nd tutorial - captcha text recognition.
I'm new to the python ecosystem so sorry in advance for silly questions. After cloning the repo and installing the dependencies via:
pip install -r requirements.txt
I get the following error:
maxdonchenko@maxdonchenko mltu % python3 ./Tutorials/02_captcha_to_text/train.py
Traceback (most recent call last):
File "/Users/maxdonchenko/mltu/./Tutorials/02_captcha_to_text/train.py", line 7, in <module>
from mltu.tensorflow.dataProvider import DataProvider
ModuleNotFoundError: No module named 'mltu'
An interesting nuance here is that I think I have an mltu
installed:
maxdonchenko@maxdonchenko mltu % pip show mltu
Name: mltu
Version: 1.0.15
Summary: Machine Learning Training Utilities (MLTU) for TensorFlow and PyTorch
Home-page: https://pylessons.com/
Author: PyLessons
Author-email: [email protected]
License:
Location: /Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages
Requires: librosa, matplotlib, numpy, onnxruntime, opencv-python, pandas, Pillow, PyYAML, tqdm
Required-by:
I tried to explicitly specify the mltu
version in the root requirements.txt
:
PyYAML>=6.0
tqdm
pandas
numpy
opencv-python
Pillow>=9.4.0
onnxruntime>=1.15.0 # onnxruntime-gpu for GPU support
librosa>=0.9.2
matplotlib
# π already existing in the repo
# π added by me
mltu==0.1.4
tensorflow==2.10 # took versions from Tutorials/02_captcha_to_text/README.md
but, installing them threw another error, basically saying tf version 2.10 can't be installed for some reason:
ERROR: Could not find a version that satisfies the requirement tensorflow==2.10 (from versions: 2.13.0rc0, 2.13.0rc1, 2.13.0rc2, 2.13.0)
ERROR: No matching distribution found for tensorflow==2.10
Do you have any suggestions on how to get the same environment that you had during NN model training?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. πππ
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google β€οΈ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.