Giter Club home page Giter Club logo

keras_snli's Introduction

Keras SNLI baseline example

This repository contains a simple Keras baseline to train a variety of neural networks to tackle the Stanford Natural Language Inference (SNLI) corpus.

The aim is to determine whether a premise sentence is entailed, neutral, or contradicts a hypothesis sentence - i.e. "A soccer game with multiple males playing" entails "Some men are playing a sport" while "A black race car starts up in front of a crowd of people" contradicts "A man is driving down a lonely road".

The model architecture is:

  • Extract a 300D word vector from the fixed GloVe vocabulary
  • Pass the 300D word vector through a ReLU "translation" layer
  • Encode the premise and hypothesis sentences using the same encoder (summation, GRU, LSTM, ...)
  • Concatenate the two 300D resulting sentence embeddings
  • 3 layers of 600D ReLU layers
  • 3 way softmax

Visual image description of the model

Training uses RMSProp and stops after N epochs have passed with no improvement to the validation loss. Following Liu et al. 2016, the GloVe embeddings are not updated during training. Following Munkhdalai & Yu 2016, the out of vocabulary embeddings remain zeroed out.

One of the most important aspects when using fixed Glove embeddings with summation is the "translation" layer. Bowman et al. 2016 use such a layer when moving from 300D to the lower dimensional 100D hidden state. This is likely highly important for the summation method as it allows the GloVe space to be shifted before summation. Technically when done with training the "translated" GloVe embeddings could be precomputed and this layer removed, decreasing the number of parameters, but ¯\_(ツ)_/¯

The model is relatively simple yet sits at a far higher level than other comparable baselines (specifically summation, GRU, and LSTM models) listed on the SNLI page. The summary: don't dismiss well tuned GloVe bag of words models - they can still be competitive and are far faster to train!

Model Parameters Train Validation Test
300D sum(word vectors) + 3 x 600D ReLU (this code) 1.2m 0.831 0.823 0.825
300D GRU + 3 x 600D ReLU (this code) 1.7m 0.843 0.830 0.823
300D LSTM + 3 x 600D ReLU (this code) 1.9m 0.855 0.829 0.823
300D GRU (recurrent dropout) + 3 x 600D ReLU (this code) 1.7m 0.844 0.832 0.832
300D LSTM (recurrent dropout) + 3 x 600D ReLU (this code) 1.9m 0.852 0.836 0.827
-- --- --- --- ---
300D LSTM encoders (Bowman et al. 2016) 3.0m 0.839 - 0.806
1024D GRU w/ unsupervised 'skip-thoughts' pre-training (Vendrov et al. 2015) 15m 0.988 - 0.814
300D Tree-based CNN encoders (Mou et al. 2015) 3.5m 0.833 - 0.821
300D SPINN-PI encoders (Bowman et al. 2016) 3.7m 0.892 - 0.832
600D (300+300) BiLSTM encoders (Liu et al. 2016) 3.5m 0.833 - 0.834

Only the numbers for pure sentential embedding models are shown here. The SNLI homepage shows the full list of models where attentional models perform better. If I've missed including any comparable models, submit a pull request.

All models could benefit from a more thorough evaluation and/or grid search as the existing parameters are guesstimates inspired by various papers (Bowman et al. 2015, Bowman et al. 2016, Liu et al. 2016). Only when the GRUs and LSTMs feature recurrent dropout (dropout_U) do they consistently beat the summation of word embeddings. Further work should be done exploring the hyperparameters of the GRU and LSTM.

keras_snli's People

Contributors

smerity avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

keras_snli's Issues

RNN=None causes error at line 171 in snli_rnn.py (Theano)

Verified on Mac OSX and Ubuntu. Running Keras 1.0.8 with Theano backend. RNN=recurrent.GRU works fine, but with RNN=None I get the following:

$ python snli_rnn.py
Using Theano backend.
82
62
59
55
57
30
RNN / Embed / Sent = None, 300, 300
GloVe / Trainable Word Embeddings = True, False
Build model...
Vocab size = 42391
Loading GloVe
Total number of null word embeddings:
4043
Traceback (most recent call last):
File "snli_rnn.py", line 171, in
prem = BatchNormalization()(prem)
File "/Users/bradleyallen/anaconda/envs/dlnotebook/lib/python2.7/site-packages/Keras-1.0.8-py2.7.egg/keras/engine/topology.py", line 494, in call
self.assert_input_compatibility(x)
File "/Users/bradleyallen/anaconda/envs/dlnotebook/lib/python2.7/site-packages/Keras-1.0.8-py2.7.egg/keras/engine/topology.py", line 411, in assert_input_compatibility
str(K.ndim(x)))
Exception: Input 0 is incompatible with layer batchnormalization_1: expected ndim=3, found ndim=2

AttributeError: 'module' object has no attribute 'control_flow_ops'

Missing from GloVe: surf-skiier
Missing from GloVe: Athekwl
Loading GloVe
Total number of null word embeddings:
4043
Traceback (most recent call last):
File "snli_rnn.py", line 174, in
prem = BatchNormalization()(prem)
File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 514, in call
self.add_inbound_node(inbound_layers, node_indices, tensor_indices)
File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 572, in add_inbound_node
Node.create_node(self, inbound_layers, node_indices, tensor_indices)
File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 149, in create_node
output_tensors = to_list(outbound_layer.call(input_tensors[0], mask=input_masks[0]))
File "/usr/local/lib/python2.7/dist-packages/keras/layers/normalization.py", line 160, in call
x_normed = K.in_train_phase(x_normed, x_normed_running)
File "/usr/local/lib/python2.7/dist-packages/keras/backend/tensorflow_backend.py", line 1304, in in_train_phase
x = tf.python.control_flow_ops.cond(tf.cast(_LEARNING_PHASE, 'bool'),
AttributeError: 'module' object has no attribute 'control_flow_ops'
rzai@rzai00:~/prj/keras_snli$

training = prepare_data(training) broken

training = prepare_data(training)
validation = prepare_data(validation)
test = prepare_data(test)

print('Build model...')
print('Vocab size =', VOCAB)

AttributeError Traceback (most recent call last)
in
----> 1 training = prepare_data(training)
2 validation = prepare_data(validation)
3 test = prepare_data(test)
4
5 print('Build model...')

in (data)
1 to_seq = lambda X: pad_sequences(tokenizer.texts_to_sequences(X), maxlen=MAX_LEN)
----> 2 prepare_data = lambda data: (to_seq(data[0]), to_seq(data[1]), data[2])

in (X)
----> 1 to_seq = lambda X: pad_sequences(tokenizer.texts_to_sequences(X), maxlen=MAX_LEN)
2 prepare_data = lambda data: (to_seq(data[0]), to_seq(data[1]), data[2])

~\Anaconda3\lib\site-packages\keras_preprocessing\text.py in texts_to_sequences(self, texts)
276 A list of sequences.
277 """
--> 278 return list(self.texts_to_sequences_generator(texts))
279
280 def texts_to_sequences_generator(self, texts):

~\Anaconda3\lib\site-packages\keras_preprocessing\text.py in texts_to_sequences_generator(self, texts)
307 self.filters,
308 self.lower,
--> 309 self.split)
310 vect = []
311 for w in seq:

~\Anaconda3\lib\site-packages\keras_preprocessing\text.py in text_to_word_sequence(text, filters, lower, split)
56 translate_dict = dict((c, split) for c in filters)
57 translate_map = maketrans(translate_dict)
---> 58 text = text.translate(translate_map)
59
60 seq = text.split(split)

AttributeError: 'numpy.ndarray' object has no attribute 'translate'

I realize that this is a numpy issue. As a widely used libaray, numpy does not honor back compatibility. Can you please write some other code to take care of this problem?

In java, mvn tells you which version of jar should be used. It appears to me python is an open source wide west. Can you please tell me if there is a work around?

Thanks

Questions: BatchNorm and Recurrent Dropout

This is awesome! 💯

I know you said you didn't any real grid-searching, but out of curiosity:

  1. Did you experiment w/o the BatchNormalization? I have yet to have it actually help in an RNN, so I'd be curious to know if you tried it without and if that yielded worse results.
  2. Any particular reason why you used dropout only with the input gates (dropout_W) when using the rnn models? I've always had more success when combining it with dropout on the recurrent units as well (dropout_U), but am curious if you tried the combination and found that it didn't help.

I think this is a typo

I think there is a typo in line 107:
prepare_data = lambda data: (to_seq(data[0]), to_seq(data[1]), data[2])
as data[3] has the labels.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.