cyprienruffino / ctcmodel Goto Github PK

View Code? Open in Web Editor NEW

77.0 77.0 31.0 10.47 MB

Easy-to-use Connectionnist Temporal Classification in Keras

License: MIT License

Python 100.00%

ctcmodel's People

Contributors

Stargazers

Watchers

ctcmodel's Issues

_predict_loop error

Hello,
I am trying to run ctcmodel with its dataset, but I got this error during running predict_loop function:
TypeError: Cannot convert 0.0 to EagerTensor of dtype int32
this error raise from line batch_outs = f(ins_batch)
and because of [0.] added to x in predict function.
I replaced [ins[-1]] at line ' ins_batch = _slice_arrays(ins[:-1], batch_ids) + [ins[-1]]' with [0] but then i got empty array in the result of prediction.

I use tensorflow 2, maybe it brings problems here, because i made no changes in CTCmodel.py and example.py.
I just tried to run it.

thanks in advance.

ModuleNotFoundError: No module named 'keras.engine'

I have tensorflow 2.13, keras==2.13, and keras-ctcmodel installed (using pip in a conda environment). When I try to

from keras_ctcmodel.CTCModel import CTCModel

I get this exception:

    from keras_ctcmodel.CTCModel import CTCModel
  File "/home/adam/anaconda3/envs/simplehtr1/lib/python3.9/site-packages/keras_ctcmodel/CTCModel.py", line 7, in <module>
    from keras.engine import Model
ModuleNotFoundError: No module named 'keras.engine'

Does the package structure used in CTCModel need to be updated?

InvalidArgumentError: sequence_length(0) <= 32 [[{{node CTCloss_10/CTCLoss}}]]

Hello!

I am trying to use the CTCModel at the end of a network containing 5 CNNs and 2 BLSTMs. The model compiles, but when I try to run fit it fails with the error InvalidArgumentError: sequence_length(0) <= 32 [[{{node CTCloss_10/CTCLoss}}]] .

This is how I run fit:
model.fit(x=[xs_train_pad, ys_train_pad, xs_train_len, ys_train_len], y=np.zeros(nb_train), \ batch_size=PARAM_BATCH_SIZE, epochs=PARAM_EPOCHS)

I am trying to train on a subset of the IAM dataset (9000 images).
My set of training images is padded and is a numpy array of the following shape:
xs_train_pad.shape=(9000, 128, 32, 1) # 9000 128x32 grayscale images.

The labels are also padded and the words are converted to rows of float64s representing the ASCII codes of the characters.
These are the shapes of the rest of the arguments:
ys_train_pad.shape=(9000, 18)
xs_train_len.shape=(9000,)
ys_train_len.shape=(9000,)

This is the network architecture (layer type and output shape):
`
(InputLayer) (None, 128, 32, 1)
(Conv2D (None, 128, 32, 32)
(BatchNormalization) (None, 128, 32, 32)
(ReLU) (None, 128, 32, 32)
(MaxPooling2D) (None, 64, 16, 32)
(Conv2D) (None, 64, 16, 64)
(BatchNormalization) (None, 64, 16, 64)
(ReLU) (None, 64, 16, 64)
(MaxPooling2D) (None, 32, 8, 64)
(Conv2D) (None, 32, 8, 128)
(BatchNormalization) (None, 32, 8, 128)
(ReLU) (None, 32, 8, 128)
(MaxPooling2D) (None, 32, 4, 128)
(Conv2D) (None, 32, 4, 128)
(BatchNormalization) (None, 32, 4, 128)
(ReLU) (None, 32, 4, 128)
(MaxPooling2D) (None, 32, 2, 128)
(Conv2D) (None, 32, 2, 256)
(BatchNormalization) (None, 32, 2, 256)
(ReLU) (None, 32, 2, 256)
(MaxPooling2D) (None, 32, 1, 256)
(Reshape) (None, 32, 256)
(BidirectionalLSTM) (None, 32, 512)
(BidirectionalLSTM) (None, 32, 512)
(TimeDistributeDense (None, 32, 80)
(ActivationSoftMax) (None, 32, 80)
labels (InputLayer) (None, None)
input_length (InputLayer) (None, 1)
label_length (InputLayer) (None, 1)

CTCloss (Lambda) (None, 1) SoftMax[0][0]
labels[0][0]
input_length[0][0]
label_length[0][0]
`

Any help would be greatly appreciated!
Thank you!

TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType

X_train_len = Input(name='input_length', shape=X_train_length.shape[0], dtype=dtype) # unpadded len of all x_sequences in batch
Y_train_len =Input(name='label_length', shape=label_length.shape[0], dtype=dtype) # unpadded len of all y_sequences in batch
y_train=np.ndarray(shape=(num_files,max_y_length),
dtype=np.float32)
y_train.fill(1)
##X_train shape is (3,208,224,224,3)
model.fit(x=[X_train,y_train,X_train_len,Y_train_len], y=np.ones((208,30)), batch_size=1)

Im getting error "TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType'"
Not sure which argument is causing the issue.

Import Error

Hello,

I successfully installed this library using pip but it fails to import: ModuleNotFoundError: No module named 'keras_ctcmodel.CTCModel'

When I check the directory of pip library installations, only this file is there for ctc: keras_ctcmodel-1.1.0.dist-info. The complete library doesn't seem to be installed.

I even tried manually building but no luck. Did anyone face a similar issue?

model.load_model()

I want know can i save the model?when if trained the CTCModel,i want keep the model file,but there rise an error,somebody could help me?(I know the reason,it is because there no save module in CTCMolel)

predict function error

When I use the prediction function it returns empty lists

Prediction : []  -- Label :  [3. 3. 1. 4.]
Prediction : []  -- Label :  [1. 1.]
Prediction : []  -- Label :  [0. 7. 2.]
Prediction : []  -- Label :  [6. 4.]
Prediction : []  -- Label :  [5. 8.]
Prediction : []  -- Label :  [0. 6. 9.]
Prediction : []  -- Label :  [6. 8. 6. 6.]
Prediction : []  -- Label :  [4. 1.]
Prediction : []  -- Label :  [8. 4.]
Prediction : []  -- Label :  [2. 3. 3. 1.]

I Train the model on at least 10 eopchs. I am actually working on audio data but I encounter the same issue with the data provided in the example !

AttributeError: module 'tensorflow.python.keras.engine.data_adapter' has no attribute 'DataHandler'

Hi, I got a problem when running CTCModel:
CTCModel-master/keras_ctcmodel/CTCModel.py", line 732, in predict
data_handler = data_adapter.DataHandler(
AttributeError: module 'tensorflow.python.keras.engine.data_adapter' has no attribute 'DataHandler'
It seems that DataHandler cannot be imported sucessfully.
I'm using keras==2.3.1 ,tensorflow==2.1.0, python3.6. I think this might be caused by some compatibility issues.
Thank you!

Invalid argument: Saw a non-null label (index >= num_classes - 1) following a null label

Hello.

I've been trying to use this library to solve a speech recognition problem. My inputs are 28539 grayscale images with shape (100, 300) and my output is an integer array with 97 positions padded with zeros, where each integer represents a word. I'm getting the following error :

Train on 28539 samples, validate on 2385 samples
Epoch 1/2

---------------------------------------------------------------------------

InvalidArgumentError                      Traceback (most recent call last)

<ipython-input-54-b29b69e334a4> in <module>()
     13               validation_data=([X_test_gs, y_test_flattened, X_test_len, y_test_len], np.zeros(len(X_test_gs))),
     14               epochs = 2,
---> 15               batch_size = 16)

5 frames

/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py in __call__(self, *args, **kwargs)
   1456         ret = tf_session.TF_SessionRunCallable(self._session._session,
   1457                                                self._handle, args,
-> 1458                                                run_metadata_ptr)
   1459         if run_metadata:
   1460           proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

InvalidArgumentError: 2 root error(s) found.
  (0) Invalid argument: Saw a non-null label (index >= num_classes - 1) following a null label, batch: 0 num_classes: 97 labels: 100,965,1937,67,506,100,1365,1185,480,4847,125,4848,469,2011,805,476,558,111,4849,77,476,4850,761,77,100,49,375,1632,46,4849,480,3681,25,46,2866,67,476,68,12,515,131,480,833,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 labels seen so far: 
	 [[{{node CTCloss_9/CTCLoss}}]]
  (1) Invalid argument: Saw a non-null label (index >= num_classes - 1) following a null label, batch: 0 num_classes: 97 labels: 100,965,1937,67,506,100,1365,1185,480,4847,125,4848,469,2011,805,476,558,111,4849,77,476,4850,761,77,100,49,375,1632,46,4849,480,3681,25,46,2866,67,476,68,12,515,131,480,833,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 labels seen so far: 
	 [[{{node CTCloss_9/CTCLoss}}]]
	 [[training_6/RMSprop/gradients/CTCloss_9/CTCLoss_grad/mul/_1710]]
0 successful operations.
0 derived errors ignored.

And this is the code :

from keras.layers import LSTM, TimeDistributed, Dense, Activation, Input
from keras.optimizers import Adam

input_layer = Input(shape = (100, 300))
lstm0 = LSTM(128, return_sequences=True)(input_layer)
lstm1 = LSTM(128, return_sequences=True)(lstm0)
dense = TimeDistributed(Dense(seq_size))(lstm1)
output_layer = Activation("softmax")(dense)

model_ctc = CTCModel([input_layer], [output_layer])
model_ctc.compile(Adam(lr=0.001))
model_ctc.fit([X_train_gs, y_train_flattened, X_train_len, y_train_len], np.zeros(len(X_train_gs)), 
              validation_data=([X_test_gs, y_test_flattened, X_test_len, y_test_len], np.zeros(len(X_test_gs))), 
              epochs = 2, 
              batch_size = 16)

Problem in transfer learning

Hi,

I am trying to use this model for doing the following-

Fine tune the model further by loading a previously saved model: I am not understanding how to access the layers individually as model.get_layers('layer_name') isn't working. I am trying to add dropout layers in between the layers of the previously saved model.
I am trying to further add more layers at the end by removing the Dense layer at the end: The include_top parameter doesn't seem to work
Could you please provide this functionality(specially point 1) or guide me how to do so.

How to use this in CRNN.

As, RNN receives the CNN output before FC layer. How to use the CTC model which takes input but i unable to get that. I am working on Handwritten Text Recognition Project.
Please see the code below:

    from keras.models import Sequential
   from keras.layers import Dense, Flatten
   from keras.layers import Conv2D, MaxPooling2D
    from keras.layers import LSTM, TimeDistributed, Dense, Activation, Input
   from keras.optimizers import Adam
   from numpy import zeros
   from CTCModel import CTCModel
    # create model
    model = Sequential()


    # add model CNN layers

    #  First Layer: Conv (5x5) + Pool (2x2) - Output size: 400 x 32 x 64
    model.add(Conv2D(64, kernel_size=(5,5),kernel_initializer='truncated_normal', strides=(1,1), activation='linear',padding='same', input_shape=(800,64,1), name="Conv1"))
    model.add(LeakyReLU(alpha=0.1)) ## relu units can die
    model.add(MaxPooling2D((2,2), strides=(2, 2), padding='valid'))


    # Second Layer: Conv(5*5) - Output size; 400 * 32 * 128
    model.add(Conv2D(128, kernel_size=(5,5), kernel_initializer='truncated_normal', strides=(1,1), activation='linear', padding='same', name='Conv2'))
    model.add(LeakyReLU(alpha=0.1))



    #Third Layer: Conv (3x3) + Pool (2x2) + Simple Batch Norm - Output size: 200 x 16 x 128

    model.add(Conv2D(128, kernel_size=(3,3), kernel_initializer='truncated_normal', strides=(1,1), activation='linear', padding='same', name='Conv3'))
    model.add(BatchNormalization(scale=None, epsilon=0.001)) # batch mean=0, SD=1
    model.add(LeakyReLU(alpha=0.1))
    model.add(MaxPooling2D((2,2), strides=(2, 2),padding='valid'))

    # Fourth Layer: Conv (3x3) - Output size: 200 x 16 x 256
    model.add(Conv2D(256, kernel_size=(3,3), kernel_initializer='truncated_normal', strides=(1,1), activation='linear', padding='same', name='Conv4'))
    model.add(LeakyReLU(alpha=0.1))

    # Fifth Layer: Conv (3x3) - Output size: 200 x 16 x 256
    model.add(Conv2D(256, kernel_size=(3,3), kernel_initializer='truncated_normal', strides=(1,1), activation='linear', padding='same', name='Conv5'))
    model.add(LeakyReLU(alpha=0.1))

    # Sixth Layer: Conv (3x3) + Simple Batch Norm - Output size: 200 x 16 x 512

    model.add(Conv2D(512, kernel_size=(3,3), kernel_initializer='truncated_normal', strides=(1,1), activation='linear', padding='same', name='Conv6'))
    model.add(BatchNormalization(scale=None, epsilon=0.001)) # batch mean=0, SD=1
    model.add(LeakyReLU(alpha=0.1))

    # Seventh Layer: Conv (3x3) + Pool (2x2) - Output size: 100 x 8 x 512

    model.add(Conv2D(512, kernel_size=(3,3), kernel_initializer='truncated_normal', strides=(1,1), activation='linear',padding='same', name='Conv7'))
    model.add(LeakyReLU(alpha=0.1)) ## relu units can die
    model.add(MaxPooling2D((2,2), strides=(2, 2), padding='valid', name='Pool_7'))


    model.add(Reshape((100,4096), input_shape=(100,8,512)))
    model.add(Dense(512, input_shape=(100, 4096), activation=LeakyReLU(alpha=0.1)))
    #model.summary()

    ## Setup RNN
    n_timesteps = 100
    num_hidden = 512

    model.add(Bidirectional(LSTM(units=num_hidden, return_sequences=True),merge_mode='concat', name='blstm_1'))

    ''' LSTM(units=no.of Hidden Nodes,batch_input_shape=(batch_size, timesteps, features))
        return_sequences=True means every hidden cell output is obtained'''
    # model.add(BatchNormalization())

    model.add(TimeDistributed(Dense(80, name='dense_2')))
    y_pred = model.add(Activation('softmax', name='softmax_blstm'))

Problem is I can't take the Last layer CNN output and give input to the CTCModel(), how can i make it, Is there any different approach to do that.
Model summary is:

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
Conv1 (Conv2D)               (None, 800, 64, 64)       1664      
_________________________________________________________________
leaky_re_lu_1 (LeakyReLU)    (None, 800, 64, 64)       0         
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 400, 32, 64)       0         
_________________________________________________________________
Conv2 (Conv2D)               (None, 400, 32, 128)      204928    
_________________________________________________________________
leaky_re_lu_2 (LeakyReLU)    (None, 400, 32, 128)      0         
_________________________________________________________________
Conv3 (Conv2D)               (None, 400, 32, 128)      147584    
_________________________________________________________________
batch_normalization_1 (Batch (None, 400, 32, 128)      384       
_________________________________________________________________
leaky_re_lu_3 (LeakyReLU)    (None, 400, 32, 128)      0         
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 200, 16, 128)      0         
_________________________________________________________________
Conv4 (Conv2D)               (None, 200, 16, 256)      295168    
_________________________________________________________________
leaky_re_lu_4 (LeakyReLU)    (None, 200, 16, 256)      0         
_________________________________________________________________
Conv5 (Conv2D)               (None, 200, 16, 256)      590080    
_________________________________________________________________
leaky_re_lu_5 (LeakyReLU)    (None, 200, 16, 256)      0         
_________________________________________________________________
Conv6 (Conv2D)               (None, 200, 16, 512)      1180160   
_________________________________________________________________
batch_normalization_2 (Batch (None, 200, 16, 512)      1536      
_________________________________________________________________
leaky_re_lu_6 (LeakyReLU)    (None, 200, 16, 512)      0         
_________________________________________________________________
Conv7 (Conv2D)               (None, 200, 16, 512)      2359808   
_________________________________________________________________
leaky_re_lu_7 (LeakyReLU)    (None, 200, 16, 512)      0         
_________________________________________________________________
Pool_7 (MaxPooling2D)        (None, 100, 8, 512)       0         
_________________________________________________________________
reshape_1 (Reshape)          (None, 100, 4096)         0         
_________________________________________________________________
dense_1 (Dense)              (None, 100, 512)          2097664   
_________________________________________________________________
blstm_1 (Bidirectional)      (None, 100, 1024)         4198400   
_________________________________________________________________
time_distributed_1 (TimeDist (None, 100, 80)           82000     
_________________________________________________________________
softmax_blstm (Activation)   (None, 100, 80)           0         
=================================================================
Total params: 11,159,376
Trainable params: 11,158,096
Non-trainable params: 1,280

how to use CTC in keras

Hi,

Could you please do me a favor and help on this issue
try to implement a simple model with CNN-GRU for speech recognition but it generate lot's of error
last one is shape mismatch as below

AssertionError: Could not compute output Tensor("ctc_loss/ctc_loss_1/Identity:0", shape=(None, 36, 27), dtype=float32)

Colab Link

Keras version

Hi,

Thanks for this beautiful work. Just a question : what is the version of keras used for this, please ?

Agathe

Error when applying model.compile(optimizer=Adam())

Hi,

I am getting the following error when I execute the code model.compile(optimizer=Adam())

`model = CTCModel(input_data, y_pred)

model.compile(optimizer=Adam())
Traceback (most recent call last):

File "", line 1, in
model.compile(optimizer=Adam())

File "C:\Gireesh\Handwriting-CNN-LSTM\CTCModel.py", line 87, in compile
self.outputs + [labels, input_length, label_length])

File "C:\ProgramData\Anaconda3\envs\python36\lib\site-packages\keras\engine\base_layer.py", line 457, in call
output = self.call(inputs, **kwargs)

File "C:\ProgramData\Anaconda3\envs\python36\lib\site-packages\keras\layers\core.py", line 687, in call
return self.function(inputs, **arguments)

File "C:\Gireesh\Handwriting-CNN-LSTM\CTCModel.py", line 827, in ctc_loss_lambda_func
y_pred, labels, input_length, label_length = args

File "C:\ProgramData\Anaconda3\envs\python36\lib\site-packages\tensorflow\python\framework\ops.py", line 442, in iter
"Tensor objects are only iterable when eager execution is "

TypeError: Tensor objects are only iterable when eager execution is enabled. To iterate over this tensor use tf.map_fn.`

TensorBoard Callback not working

Hi,

I get this below error when I use TensorBoard Callback in the network.fit() method
ValueError: An operation has None for gradient. Please make sure that all of your ops have a gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval

But if I remove it it works fine. do you suggest any method to use TensorBoard with fit() method?

I have set validation_split=0.1 in model.fit() method, val_loss is calculated but get the error when I use TensorBoard CallBack

loaded model can not work well

Hi,when i load the trained model,it can not work well.i used loaded=CTCModel(None,None),loaded.load_model(path, optimizer) to load my model,and,loaded.predict([x_test_pad, x_test_len], batch_size, max_value) to get the prediction, but it can not work well ,just like a did not trained model,the prediction was absolutely wrong.I do not know how to deal with it, could you please help me?

The model is not working with audio data.

When I am trying to use the model with audio data it is predicting the same class again and again.

Which versions of TF/Keras to make it work?

Hello,
I've tried the example but I get some errors. I suspect I am not using the exact same TF/Keras versions as you do. Can you indicate the versions you used?
Best

About top-k predictions

I would like to use the top_paths feature to implement a top-k predictor.
I have not succeeded in connecting all of the right things to make this work.

Please add an example of top_paths to your example.py.

Thank you!

update in way to import dependencies?

Hi, I've tried to use the CTCModel and got an import error:

cannot import name 'Model' from 'keras.engine'

I'm new to programming so maybe there is away around to fix this but I can understand is that from your CTCModel Model has to be imported as from keras import Model instead from keras.engine import Model?

Hope you can help me with this, I'm trying to implement CTC and your solution is way to easy to use that others I have seen or directly from keras

Thank you

AttributeError: pred = network.predict([x_test_pad, x_test_len])

Hi
I was trying to explore the example and everything works fine but the last lines:

`# predict label sequences

pred = network.predict([x_test_pad, x_test_len], batch_size=batch_size, max_value=padding_value)

for i in range(10): # print the 10 first predictions
print("Prediction :", [j for j in pred[i] if j!=-1], " -- Label : ", y_test[i])`

face the following error:

`AttributeError Traceback (most recent call last)
in ()
1 # predict label sequences
2 #pred = network.predict([x_test_pad, x_test_len], batch_size=batch_size, max_value=padding_value)
----> 3 pred = network.predict([x_test_pad, x_test_len])
4 for i in range(10): # print the 10 first predictions
5 print("Prediction :", [j for j in pred[i] if j!=-1], " -- Label : ", y_test[i])

~/Workspace/jupyter/Speech/CTCModel/CTCModel.py in predict(self, x, batch_size, verbose, steps, max_len, max_value)
737 f = self.model_pred.predict_function
738 out = self._predict_loop(f, ins, batch_size=batch_size, max_value=max_value,
--> 739 verbose=verbose, steps=steps, max_len=max_len)
740
741 out_decode = [dec_data[:list(dec_data).index(max_value)] if max_value in dec_data else dec_data for i,dec_data in enumerate(out)]

~/Workspace/jupyter/Speech/CTCModel/CTCModel.py in _predict_loop(self, f, ins, max_len, max_value, batch_size, verbose, steps)
761 (if the model has multiple outputs).
762 """
--> 763 num_samples = self.model_pred._check_num_samples(ins, batch_size,
764 steps,
765 'steps')

AttributeError: 'Model' object has no attribute '_check_num_samples'`

Error when using a 3 demensional input

I'm using a three dimensional input along with conv2ds in my model. I'm getting this error when trying to call the fit function:

tensorflow.python.framework.errors_impl.InvalidArgumentError: transpose expects a vector of size 2. But input(1) is a vector of size 3

I noticed the example program uses a 2 dimensional input and no convolution layers. Does this mean the class does not work with 3 dimensional inputs?

cyprienruffino / ctcmodel Goto Github PK

ctcmodel's People

Contributors

Stargazers

Watchers

Forkers

ctcmodel's Issues

Recommend Projects

Recommend Topics

Recommend Org