Giter Club home page Giter Club logo

keras_attention's Introduction

keras_attention's People

Contributors

tsterbak avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

keras_attention's Issues

One to One keras model with Attention in Keras #25

Hello,
I have a keras model that has sequence of inputs and sequence of outputs where each input has an associated output(Label). lets say (part of speech tagging (POS tagging)

Seq_in[0][0:3]
array([[15],[28], [23]])
Seq_out[0][0:3]

array([[0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0.]],
dtype=float32)

I want to build attention on top of the lstm layer. I am following this work " Attention-Based Bidirectional Long Short-Term Memory Networks for Relation Classification " Zhou et al, 2016

X_train, X_val, Y_train, Y_val = train_test_split(Seq_in,Seq_out, test_size=0.20)

TIME_STEPS = 500
INPUT_DIM = 1
lstm_units = 256

inputs = Input(shape=(TIME_STEPS, INPUT_DIM))
activations = Bidirectional(LSTM(lstm_units, return_sequences=True))(inputs) # First laer bidirictional
activations = Dropout(0.2)(activations)
activations = Bidirectional(LSTM(lstm_units, return_sequences=True))(activations) # Second layer bidirectional
activations = Dropout(0.2)(activations)
attention = Dense(1,activation='tanh')(activations) # This is equation (9) in the paper. Squashing each output state vector to a scaler.
attention = Flatten()(attention)
attention = Activation('softmax')(attention) # This is equation (10) in the paper.
attention = RepeatVector(512)(attention) # Repeating the softmax vector to have the same dimintion as the output state vector (512)
attention = Permute([2,1])(attention) # permute
sent_representation = multiply([activations,attention]) # multiply the attention vector with the output state vector element-wise.
sent_representation = Lambda(lambda xin: K.sum(xin, axis=-1))(sent_representation) # summation of all output state vectors
sent_representation = RepeatVector(TIME_STEPS)(sent_representation) # Repeat vector to be the same diminsion as the time steps
sent_representation = concatenate([activations,sent_representation]) # concatenate the sentence representation to the output states
output = Dense(15, activation='softmax')(sent_representation)#(out_attention_mul) # Find the softmax for the current label

model = Model(inputs=inputs, outputs=output)
sgd = optimizers.SGD(lr=.1,momentum=0.9,decay=1e-3,nesterov=True)
model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
model.fit(X_train,Y_train,epochs=2, validation_data=(X_val, Y_val),verbose=1)

Layer (type) Output Shape Param # Connected to
input_1 (InputLayer) (None, 500, 1) 0

bidirectional_1 (Bidirectional) (None, 500, 512) 528384 input_1[0][0]

dropout_1 (Dropout) (None, 500, 512) 0 bidirectional_1[0][0]

bidirectional_2 (Bidirectional) (None, 500, 512) 1574912 dropout_1[0][0]

dropout_2 (Dropout) (None, 500, 512) 0 bidirectional_2[0][0]

dense_1 (Dense) (None, 500, 1) 513 dropout_2[0][0]

flatten_1 (Flatten) (None, 500) 0 dense_1[0][0]

activation_1 (Activation) (None, 500) 0 flatten_1[0][0]

repeat_vector_1 (RepeatVector) (None, 512, 500) 0 activation_1[0][0]

permute_1 (Permute) (None, 500, 512) 0 repeat_vector_1[0][0]

multiply_1 (Multiply) (None, 500, 512) 0 dropout_2[0][0]
permute_1[0][0]

lambda_1 (Lambda) (None, 500) 0 multiply_1[0][0]

repeat_vector_2 (RepeatVector) (None, 500, 500) 0 lambda_1[0][0]

concatenate_1 (Concatenate) (None, 500, 1012) 0 dropout_2[0][0]
repeat_vector_2[0][0]

dense_2 (Dense) (None, 500, 15) 15195 concatenate_1[0][0]
Total params: 2,119,004
Trainable params: 2,119,004
Non-trainable params: 0

I think this code performs what the paper does, except that the concatenate step merges the attention weights to all the output state vectors and do not change them for each time step so for each output label.
So I think, for each time step output, I have to do something so the attention weights differ. Am I right?
Any help is appreciated
Thanks in advance

Error when drawing attention map

in sparse_categorical_crossentropy
logits = tf.reshape(output, [-1, int(output_shape[-1])])
TypeError: int returned non-int (type NoneType)

if I print the output of attention layer:
[<tf.Tensor 'attention_weighted_average_2/Tanh_1:0' shape=(?, 256) dtype=float32>, <tf.Tensor 'attention_weighted_average_2/truediv:0' shape=(?, ?) dtype=float32>]
The second output is not in shape (None, Input_length)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.