smerity / keras_snli Goto Github PK

Simple Keras model that tackles the Stanford Natural Language Inference (SNLI) corpus using summation and/or recurrent neural networks

License: MIT License

Python 100.00%

keras_snli's Introduction

Keras SNLI baseline example

This repository contains a simple Keras baseline to train a variety of neural networks to tackle the Stanford Natural Language Inference (SNLI) corpus.

The aim is to determine whether a premise sentence is entailed, neutral, or contradicts a hypothesis sentence - i.e. "A soccer game with multiple males playing" entails "Some men are playing a sport" while "A black race car starts up in front of a crowd of people" contradicts "A man is driving down a lonely road".

The model architecture is:

Extract a 300D word vector from the fixed GloVe vocabulary
Pass the 300D word vector through a ReLU "translation" layer
Encode the premise and hypothesis sentences using the same encoder (summation, GRU, LSTM, ...)
Concatenate the two 300D resulting sentence embeddings
3 layers of 600D ReLU layers
3 way softmax

Training uses RMSProp and stops after N epochs have passed with no improvement to the validation loss. Following Liu et al. 2016, the GloVe embeddings are not updated during training. Following Munkhdalai & Yu 2016, the out of vocabulary embeddings remain zeroed out.

One of the most important aspects when using fixed Glove embeddings with summation is the "translation" layer. Bowman et al. 2016 use such a layer when moving from 300D to the lower dimensional 100D hidden state. This is likely highly important for the summation method as it allows the GloVe space to be shifted before summation. Technically when done with training the "translated" GloVe embeddings could be precomputed and this layer removed, decreasing the number of parameters, but ¯\_(ツ)_/¯

The model is relatively simple yet sits at a far higher level than other comparable baselines (specifically summation, GRU, and LSTM models) listed on the SNLI page. The summary: don't dismiss well tuned GloVe bag of words models - they can still be competitive and are far faster to train!

Model	Parameters	Train	Validation	Test
300D sum(word vectors) + 3 x 600D ReLU (this code)	1.2m	0.831	0.823	0.825
300D GRU + 3 x 600D ReLU (this code)	1.7m	0.843	0.830	0.823
300D LSTM + 3 x 600D ReLU (this code)	1.9m	0.855	0.829	0.823
300D GRU (recurrent dropout) + 3 x 600D ReLU (this code)	1.7m	0.844	0.832	0.832
300D LSTM (recurrent dropout) + 3 x 600D ReLU (this code)	1.9m	0.852	0.836	0.827
--	---	---	---	---
300D LSTM encoders (Bowman et al. 2016)	3.0m	0.839	-	0.806
1024D GRU w/ unsupervised 'skip-thoughts' pre-training (Vendrov et al. 2015)	15m	0.988	-	0.814
300D Tree-based CNN encoders (Mou et al. 2015)	3.5m	0.833	-	0.821
300D SPINN-PI encoders (Bowman et al. 2016)	3.7m	0.892	-	0.832
600D (300+300) BiLSTM encoders (Liu et al. 2016)	3.5m	0.833	-	0.834

Only the numbers for pure sentential embedding models are shown here. The SNLI homepage shows the full list of models where attentional models perform better. If I've missed including any comparable models, submit a pull request.

All models could benefit from a more thorough evaluation and/or grid search as the existing parameters are guesstimates inspired by various papers (Bowman et al. 2015, Bowman et al. 2016, Liu et al. 2016). Only when the GRUs and LSTMs feature recurrent dropout (dropout_U) do they consistently beat the summation of word embeddings. Further work should be done exploring the hyperparameters of the GRU and LSTM.

keras_snli's People

Contributors

Stargazers

Watchers

keras_snli's Issues

training = prepare_data(training) broken

training = prepare_data(training)
validation = prepare_data(validation)
test = prepare_data(test)

print('Build model...')
print('Vocab size =', VOCAB)

AttributeError Traceback (most recent call last)
in
----> 1 training = prepare_data(training)
2 validation = prepare_data(validation)
3 test = prepare_data(test)
4
5 print('Build model...')

in (data)
1 to_seq = lambda X: pad_sequences(tokenizer.texts_to_sequences(X), maxlen=MAX_LEN)
----> 2 prepare_data = lambda data: (to_seq(data[0]), to_seq(data[1]), data[2])

in (X)
----> 1 to_seq = lambda X: pad_sequences(tokenizer.texts_to_sequences(X), maxlen=MAX_LEN)
2 prepare_data = lambda data: (to_seq(data[0]), to_seq(data[1]), data[2])

~\Anaconda3\lib\site-packages\keras_preprocessing\text.py in texts_to_sequences(self, texts)
276 A list of sequences.
277 """
--> 278 return list(self.texts_to_sequences_generator(texts))
279
280 def texts_to_sequences_generator(self, texts):

~\Anaconda3\lib\site-packages\keras_preprocessing\text.py in texts_to_sequences_generator(self, texts)
307 self.filters,
308 self.lower,
--> 309 self.split)
310 vect = []
311 for w in seq:

~\Anaconda3\lib\site-packages\keras_preprocessing\text.py in text_to_word_sequence(text, filters, lower, split)
56 translate_dict = dict((c, split) for c in filters)
57 translate_map = maketrans(translate_dict)
---> 58 text = text.translate(translate_map)
59
60 seq = text.split(split)

AttributeError: 'numpy.ndarray' object has no attribute 'translate'

I realize that this is a numpy issue. As a widely used libaray, numpy does not honor back compatibility. Can you please write some other code to take care of this problem?

In java, mvn tells you which version of jar should be used. It appears to me python is an open source wide west. Can you please tell me if there is a work around?

Thanks

Does not converge?

Tried on different data set, but not converge with GRU or SumRNN.

AttributeError: 'module' object has no attribute 'control_flow_ops'

Missing from GloVe: surf-skiier
Missing from GloVe: Athekwl
Loading GloVe
Total number of null word embeddings:
4043
Traceback (most recent call last):
File "snli_rnn.py", line 174, in
prem = BatchNormalization()(prem)
File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 514, in call
self.add_inbound_node(inbound_layers, node_indices, tensor_indices)
File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 572, in add_inbound_node
Node.create_node(self, inbound_layers, node_indices, tensor_indices)
File "/usr/local/lib/python2.7/dist-packages/keras/engine/topology.py", line 149, in create_node
output_tensors = to_list(outbound_layer.call(input_tensors[0], mask=input_masks[0]))
File "/usr/local/lib/python2.7/dist-packages/keras/layers/normalization.py", line 160, in call
x_normed = K.in_train_phase(x_normed, x_normed_running)
File "/usr/local/lib/python2.7/dist-packages/keras/backend/tensorflow_backend.py", line 1304, in in_train_phase
x = tf.python.control_flow_ops.cond(tf.cast(_LEARNING_PHASE, 'bool'),
AttributeError: 'module' object has no attribute 'control_flow_ops'
rzai@rzai00:~/prj/keras_snli$

Questions: BatchNorm and Recurrent Dropout

This is awesome! 💯

I know you said you didn't any real grid-searching, but out of curiosity:

Did you experiment w/o the BatchNormalization? I have yet to have it actually help in an RNN, so I'd be curious to know if you tried it without and if that yielded worse results.
Any particular reason why you used dropout only with the input gates (dropout_W) when using the rnn models? I've always had more success when combining it with dropout on the recurrent units as well (dropout_U), but am curious if you tried the combination and found that it didn't help.

RNN=None causes error at line 171 in snli_rnn.py (Theano)

Verified on Mac OSX and Ubuntu. Running Keras 1.0.8 with Theano backend. RNN=recurrent.GRU works fine, but with RNN=None I get the following:

$ python snli_rnn.py
Using Theano backend.
82
62
59
55
57
30
RNN / Embed / Sent = None, 300, 300
GloVe / Trainable Word Embeddings = True, False
Build model...
Vocab size = 42391
Loading GloVe
Total number of null word embeddings:
4043
Traceback (most recent call last):
File "snli_rnn.py", line 171, in
prem = BatchNormalization()(prem)
File "/Users/bradleyallen/anaconda/envs/dlnotebook/lib/python2.7/site-packages/Keras-1.0.8-py2.7.egg/keras/engine/topology.py", line 494, in call
self.assert_input_compatibility(x)
File "/Users/bradleyallen/anaconda/envs/dlnotebook/lib/python2.7/site-packages/Keras-1.0.8-py2.7.egg/keras/engine/topology.py", line 411, in assert_input_compatibility
str(K.ndim(x)))
Exception: Input 0 is incompatible with layer batchnormalization_1: expected ndim=3, found ndim=2

I think this is a typo

I think there is a typo in line 107:
prepare_data = lambda data: (to_seq(data[0]), to_seq(data[1]), data[2])
as data[3] has the labels.

smerity / keras_snli Goto Github PK

keras_snli's Introduction

Keras SNLI baseline example

keras_snli's People

Contributors

Stargazers

Watchers

Forkers

keras_snli's Issues

training = prepare_data(training) broken

Does not converge?

AttributeError: 'module' object has no attribute 'control_flow_ops'

Questions: BatchNorm and Recurrent Dropout

RNN=None causes error at line 171 in snli_rnn.py (Theano)

I think this is a typo

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent