Giter Club home page Giter Club logo

snli-entailment's Introduction

Implementations of a attention model for entailment from this paper in keras and tensorflow.

Compatible with keras v1.0.6 and tensorflow 0.11.0rc2

I implemented the model to learn the APIs for keras and tensorflow, so I have not really tuned on the performance. The models implemented in keras is a little different, as keras does not expose a method to set a LSTMs state.

To train,

  • Download snli dataset.
  • Create train, dev, test files with tab separated text, hypothesis and label (example file train10.txt). You can find some snippet in reader.py for this, if you are lazy.
  • Train by either running,
python amodel.py -train <TRAIN> -dev <DEV> -test <TEST>

for using the keras implementation, or

python tf_model.py -train <TRAIN> -dev <DEV> -test <TEST>

for using the tensorflow implementation. Look at the get_params() method in both scripts to see how to specify different parameters.

Log is written out in *.log file with callback for accuracy.

For comments, improvements, bug-reports and suggestions for tuning, email [email protected]

snli-entailment's People

Contributors

shyamupa avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

snli-entailment's Issues

encounter an error

Why did I encounter such an error?

  r = Reshape((k, ), name="r")(r_)
  File "/usr/local/lib/python2.7/site-packages/Keras-1.0.3-py2.7.egg/keras/engine/topology.py", line 485, in __call__
    self.add_inbound_node(inbound_layers, node_indices, tensor_indices)
  File "/usr/local/lib/python2.7/site-packages/Keras-1.0.3-py2.7.egg/keras/engine/topology.py", line 543, in add_inbound_node
    Node.create_node(self, inbound_layers, node_indices, tensor_indices)
  File "/usr/local/lib/python2.7/site-packages/Keras-1.0.3-py2.7.egg/keras/engine/topology.py", line 151, in create_node
    output_shapes = to_list(outbound_layer.get_output_shape_for(input_shapes[0]))
  File "/usr/local/lib/python2.7/site-packages/Keras-1.0.3-py2.7.egg/keras/layers/core.py", line 206, in get_output_shape_for
    return (input_shape[0],) + self._fix_unknown_dimension(input_shape[1:], self.target_shape)
  File "/usr/local/lib/python2.7/site-packages/Keras-1.0.3-py2.7.egg/keras/layers/core.py", line 201, in _fix_unknown_dimension
    raise ValueError(msg)
ValueError: total size of new array must be unchanged

Does TimeDistributed(Dense) use different weights in Keras?

At https://github.com/shyamupa/snli-entailment/blob/master/amodel.py#L110
WY = TimeDistributed(Dense(k, W_regularizer=l2(0.01)), name="WY")(Y)

I believe this will use the same dense layer (with same weights) for all timesteps, thus resulting in multiplying Y with the same vector of length k, instead of a matrix of shape k x k.

@shyamupa Can you please check and confirm?

TimeDistributed Implementation
https://github.com/fchollet/keras/blob/master/keras/layers/wrappers.py#L91

Attention

Would you be able to explain in the wiki the logic of how you made the attention mechanism in keras?

My email alternatively is [email protected]

doubt in output_shape of lambda layer

I am a bit confused about dimensions. As per my understanding from here, you will get output of dimension (nb_samples, timesteps, 2 * opts.lstm_units). Then in this line you are choosing last element from time dimension, which should result in an output dimension of (nb_samples, 2 * opts.lstm_units), but you mentioned output_shape=(k,).

It would be really nice if you can help me understand that. Thanks

The alpha value is not expected!

Hi @shyamupa ,
Thanks for your attention model!!
I can get the alpha value to visualize the machine attention level for my task.
But I found a strange phenomenon about alpha value.
The following picture is the heatmap output of "flat_alpha" layer:
attention_flat_alpha_export
It looks well!!!
But I exported the output of "alpha" layer (through softmax), I got this follwoing result:
attention_alpha_softmax_export
I know softmax will sharpen and normalize the result, but I also used flat_alpha data to do softmax function in my local and the following result is different from the output of "alpha" layer:
attention_alpha_softmax_local
The heatmap shape is (20, 200), there are 20 sentences and every sentence length is 200.
Do you have any suggestion for this?

'module' object has no attribute 'control_flow_ops'

I get this error when trying to run amodel.py - I use keras with tensorflow backend (keras v1.0.6) and I have tensorflow v0.12.1 (cannot get 0.11 versions now as they are no longer available)

However, I am able to run tf_model.py -> with regards to this code, how can I get the final predicted labels (neutral/entailment/contradiction) on test data (and not just the test accuracy). I am trying to do simple changes to the code to get the predictions as well (through tf,argmax(self.pred,1) and then using eval function. But I am pretty new to tensorflow and am struggling with this. I would be very grateful if you could help me with this as well. What I want is the final prediction array (0/1/2) for all the test instances.

Thank you.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.