shyamupa / snli-entailment Goto Github PK

attention model for entailment on SNLI corpus implemented in Tensorflow and Keras

Python 100.00%

snli-entailment's Introduction

Implementations of a attention model for entailment from this paper in keras and tensorflow.

Compatible with keras v1.0.6 and tensorflow 0.11.0rc2

I implemented the model to learn the APIs for keras and tensorflow, so I have not really tuned on the performance. The models implemented in keras is a little different, as keras does not expose a method to set a LSTMs state.

To train,

Download snli dataset.
Create train, dev, test files with tab separated text, hypothesis and label (example file train10.txt). You can find some snippet in reader.py for this, if you are lazy.
Train by either running,

python amodel.py -train <TRAIN> -dev <DEV> -test <TEST>

for using the keras implementation, or

python tf_model.py -train <TRAIN> -dev <DEV> -test <TEST>

for using the tensorflow implementation. Look at the get_params() method in both scripts to see how to specify different parameters.

Log is written out in *.log file with callback for accuracy.

For comments, improvements, bug-reports and suggestions for tuning, email [email protected]

snli-entailment's People

Contributors

Stargazers

Watchers

snli-entailment's Issues

Missing word by word attention?

Thanks for your good implementation, but i do not see the word by word attention.

Visualizing attention

How can we visualize attention?

Can you post some experiment results in Readme?

I tried to run the code, it seems needs a long time. I think it would be helpful to post some experiment results. So that we can compare with the paper. Thank you!

Bidirectional bug?

I think you should have added before this https://github.com/shyamupa/snli-entailment/blob/master/amodel.py#L101 a reverse lambda (see https://github.com/fchollet/keras/pull/3495/files for correct bidirectional lstm in keras) :)

encounter an error

Why did I encounter such an error?

  r = Reshape((k, ), name="r")(r_)
  File "/usr/local/lib/python2.7/site-packages/Keras-1.0.3-py2.7.egg/keras/engine/topology.py", line 485, in __call__
    self.add_inbound_node(inbound_layers, node_indices, tensor_indices)
  File "/usr/local/lib/python2.7/site-packages/Keras-1.0.3-py2.7.egg/keras/engine/topology.py", line 543, in add_inbound_node
    Node.create_node(self, inbound_layers, node_indices, tensor_indices)
  File "/usr/local/lib/python2.7/site-packages/Keras-1.0.3-py2.7.egg/keras/engine/topology.py", line 151, in create_node
    output_shapes = to_list(outbound_layer.get_output_shape_for(input_shapes[0]))
  File "/usr/local/lib/python2.7/site-packages/Keras-1.0.3-py2.7.egg/keras/layers/core.py", line 206, in get_output_shape_for
    return (input_shape[0],) + self._fix_unknown_dimension(input_shape[1:], self.target_shape)
  File "/usr/local/lib/python2.7/site-packages/Keras-1.0.3-py2.7.egg/keras/layers/core.py", line 201, in _fix_unknown_dimension
    raise ValueError(msg)
ValueError: total size of new array must be unchanged

Does TimeDistributed(Dense) use different weights in Keras?

At https://github.com/shyamupa/snli-entailment/blob/master/amodel.py#L110
WY = TimeDistributed(Dense(k, W_regularizer=l2(0.01)), name="WY")(Y)

I believe this will use the same dense layer (with same weights) for all timesteps, thus resulting in multiplying Y with the same vector of length k, instead of a matrix of shape k x k.

@shyamupa Can you please check and confirm?

TimeDistributed Implementation
https://github.com/fchollet/keras/blob/master/keras/layers/wrappers.py#L91

Saving model generated at each epoch

How can the model at each epoch be saved and used for future test set?

Attention

Would you be able to explain in the wiki the logic of how you made the attention mechanism in keras?

My email alternatively is [email protected]

doubt in output_shape of lambda layer

I am a bit confused about dimensions. As per my understanding from here, you will get output of dimension (nb_samples, timesteps, 2 * opts.lstm_units). Then in this line you are choosing last element from time dimension, which should result in an output dimension of (nb_samples, 2 * opts.lstm_units), but you mentioned output_shape=(k,).

It would be really nice if you can help me understand that. Thanks

The alpha value is not expected!

Hi @shyamupa ,
Thanks for your attention model!!
I can get the alpha value to visualize the machine attention level for my task.
But I found a strange phenomenon about alpha value.
The following picture is the heatmap output of "flat_alpha" layer:

It looks well!!!
But I exported the output of "alpha" layer (through softmax), I got this follwoing result:

I know softmax will sharpen and normalize the result, but I also used flat_alpha data to do softmax function in my local and the following result is different from the output of "alpha" layer:

The heatmap shape is (20, 200), there are 20 sentences and every sentence length is 200.
Do you have any suggestion for this?

'module' object has no attribute 'control_flow_ops'

I get this error when trying to run amodel.py - I use keras with tensorflow backend (keras v1.0.6) and I have tensorflow v0.12.1 (cannot get 0.11 versions now as they are no longer available)

However, I am able to run tf_model.py -> with regards to this code, how can I get the final predicted labels (neutral/entailment/contradiction) on test data (and not just the test accuracy). I am trying to do simple changes to the code to get the predictions as well (through tf,argmax(self.pred,1) and then using eval function. But I am pretty new to tensorflow and am struggling with this. I would be very grateful if you could help me with this as well. What I want is the final prediction array (0/1/2) for all the test instances.

Thank you.

shyamupa / snli-entailment Goto Github PK

snli-entailment's Introduction

snli-entailment's People

Contributors

Stargazers

Watchers

Forkers

snli-entailment's Issues

Recommend Projects

Recommend Topics

Recommend Org