The keras_attention from tsterbak

Hi there 👋

Currently developing the OpenAndroidInstaller to make installing alternative Android distributions nice and easy. I work as a freelance machine learning engineer and data scientist. Read my blog or follow me on Mastodon.

Recently on my blog

Causal graphs and the back-door criterion - A practical test on deconfounding - 2023-02-19
How to calculate shapley values from scratch - 2022-07-19
How to add new tokens to huggingface transformers vocabulary - 2022-05-12
Learning unsupervised embeddings for textual similarity with transformers - 2021-05-24
The missing guide on data preparation for language modeling - 2020-09-25
Data augmentation with transformer models for named entity recognition - 2020-08-23
How to approach almost any real-world NLP problem - 2020-07-07
Data validation for NLP applications with topic models - 2020-06-03
Latent Dirichlet allocation from scratch - 2020-05-20

More on www.depends-on-the-definition.com/

keras_attention's People

Contributors

Stargazers

Watchers

keras_attention's Issues

One to One keras model with Attention in Keras #25

Hello,
I have a keras model that has sequence of inputs and sequence of outputs where each input has an associated output(Label). lets say (part of speech tagging (POS tagging)

Seq_in[0][0:3]
array([[15],[28], [23]])
Seq_out[0][0:3]

array([[0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0.]],
dtype=float32)

I want to build attention on top of the lstm layer. I am following this work " Attention-Based Bidirectional Long Short-Term Memory Networks for Relation Classification " Zhou et al, 2016

X_train, X_val, Y_train, Y_val = train_test_split(Seq_in,Seq_out, test_size=0.20)

TIME_STEPS = 500
INPUT_DIM = 1
lstm_units = 256

inputs = Input(shape=(TIME_STEPS, INPUT_DIM))
activations = Bidirectional(LSTM(lstm_units, return_sequences=True))(inputs) # First laer bidirictional
activations = Dropout(0.2)(activations)
activations = Bidirectional(LSTM(lstm_units, return_sequences=True))(activations) # Second layer bidirectional
activations = Dropout(0.2)(activations)
attention = Dense(1,activation='tanh')(activations) # This is equation (9) in the paper. Squashing each output state vector to a scaler.
attention = Flatten()(attention)
attention = Activation('softmax')(attention) # This is equation (10) in the paper.
attention = RepeatVector(512)(attention) # Repeating the softmax vector to have the same dimintion as the output state vector (512)
attention = Permute([2,1])(attention) # permute
sent_representation = multiply([activations,attention]) # multiply the attention vector with the output state vector element-wise.
sent_representation = Lambda(lambda xin: K.sum(xin, axis=-1))(sent_representation) # summation of all output state vectors
sent_representation = RepeatVector(TIME_STEPS)(sent_representation) # Repeat vector to be the same diminsion as the time steps
sent_representation = concatenate([activations,sent_representation]) # concatenate the sentence representation to the output states
output = Dense(15, activation='softmax')(sent_representation)#(out_attention_mul) # Find the softmax for the current label

model = Model(inputs=inputs, outputs=output)
sgd = optimizers.SGD(lr=.1,momentum=0.9,decay=1e-3,nesterov=True)
model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
model.fit(X_train,Y_train,epochs=2, validation_data=(X_val, Y_val),verbose=1)

Layer (type) Output Shape Param # Connected to
input_1 (InputLayer) (None, 500, 1) 0

bidirectional_1 (Bidirectional) (None, 500, 512) 528384 input_1[0][0]

dropout_1 (Dropout) (None, 500, 512) 0 bidirectional_1[0][0]

bidirectional_2 (Bidirectional) (None, 500, 512) 1574912 dropout_1[0][0]

dropout_2 (Dropout) (None, 500, 512) 0 bidirectional_2[0][0]

dense_1 (Dense) (None, 500, 1) 513 dropout_2[0][0]

flatten_1 (Flatten) (None, 500) 0 dense_1[0][0]

activation_1 (Activation) (None, 500) 0 flatten_1[0][0]

repeat_vector_1 (RepeatVector) (None, 512, 500) 0 activation_1[0][0]

permute_1 (Permute) (None, 500, 512) 0 repeat_vector_1[0][0]

multiply_1 (Multiply) (None, 500, 512) 0 dropout_2[0][0]
permute_1[0][0]

lambda_1 (Lambda) (None, 500) 0 multiply_1[0][0]

repeat_vector_2 (RepeatVector) (None, 500, 500) 0 lambda_1[0][0]

concatenate_1 (Concatenate) (None, 500, 1012) 0 dropout_2[0][0]
repeat_vector_2[0][0]

dense_2 (Dense) (None, 500, 15) 15195 concatenate_1[0][0]
Total params: 2,119,004
Trainable params: 2,119,004
Non-trainable params: 0

I think this code performs what the paper does, except that the concatenate step merges the attention weights to all the output state vectors and do not change them for each time step so for each output label.
So I think, for each time step output, I have to do something so the attention weights differ. Am I right?
Any help is appreciated
Thanks in advance

Error when drawing attention map

in sparse_categorical_crossentropy
logits = tf.reshape(output, [-1, int(output_shape[-1])])
TypeError: int returned non-int (type NoneType)

if I print the output of attention layer:
[<tf.Tensor 'attention_weighted_average_2/Tanh_1:0' shape=(?, 256) dtype=float32>, <tf.Tensor 'attention_weighted_average_2/truediv:0' shape=(?, ?) dtype=float32>]
The second output is not in shape (None, Input_length)

Recommend Projects

tsterbak / keras_attention Goto Github PK

keras_attention's Introduction

Hi there 👋

Recently on my blog

keras_attention's People

Contributors

Stargazers

Watchers

Forkers

keras_attention's Issues

One to One keras model with Attention in Keras #25

Error when drawing attention map

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent