coetaur0 / esim Goto Github PK

View Code? Open in Web Editor NEW

367.0 8.0 103.0 60.83 MB

Implementation of the ESIM model for natural language inference with PyTorch

License: Apache License 2.0

Python 100.00%

nlp snli multinli pytorch esim-model

esim's People

Contributors

Stargazers

Watchers

Forkers

dheerajrajagopal decstionback doudoucao cooper111 webblearning zwdcs gongzishuye shihuaxing zhangyh39 jing66 zhang2010hao jind11 njmch03 jabalazs jpatrickpark helicqin wbb123 sduchh qsong4 wushanzha zengyy8 w6688j hallwoodzhang skpenn sumegh-git cccshuang rathinsingha2012 gaohuan2015 gaonnr minhson-kaist bartmelman jinyang88 daymos hbwzhsh iamdsyang grzegorzwojdyga mrzhouqifei caoyuji1986 fishredleaf leengsmile dertilo cynthia199 sjliu0920 cyprestar finder2018 algoflow19 htfhxx hitluobin isabelline sabetai shahard92 sudipta90 sudanl ancrilin kp-forks twoyoungtwosimple sunny8898 zhangheng12310 chunlinx hins arielsho kifish askintution haoyusoong haloviva fage2016 mahanswaray bzqweiyi ved2000 man-on-earth min942773 raja-ankaha jkdll xccanxin jiaqinglin greitzmann kkxcam peter-xbs jianqiang ghadiaravi13 smj0 whuuni kryptonrefugee hepansls fdumark zhengjiawei001 guobzh3 batmandao yuanhuachao shanekong aqhali catherinezhou douglaasss9 pradkalkar denis-chernetskiy mercedeslv shamepoo qiushilin seahrh lukeli97

esim's Issues

ImportError in train_snli.py

-sorry, stupid question, found it myself.-

If I try to run train_snli.py, an error occurs:

Traceback (most recent call last):
File "train_snli.py", line 19, in
from utils import train, validate
ImportError: cannot import name 'train'

This seems justified, because esim/esim/utils.py do not include train and validate.
Is this call referring to another script? Or what shall be done?

Getting Segmentation fault while training on MNLI

==================== Training ESIM model on device: cuda:0 ====================

Training epoch 1:
Avg. batch proc. time: 0.0299s, loss: 0.8712: 100%|█████████████████████████████████████████████████████████████| 49088/49088 [26:49<00:00, 30.49it/s]
Segmentation fault (core dumped)

training loss is not reduced and accuracy is not improved during training

Dear author,

When I was running train_mnli.py and train_snli.py, I met the same device problem with #15. Then I set the device of idx_range (line 40-41 in esim/utils.py) to the correct device and it was solved.

(I don't know whether this change leads to the following problem so I list the change here.)

But I met a new problem, i.e., the training loss is not reduced and accuracy is not improved during training.

Could you please help me on this? Thanks a lot!

What is the Breaking NLI (BNLI) dataset? And where can I download the BNLI dataset? I cannot find the BNLI dataset by searching "Breaking NLI (BNLI)" or "Breaking NLI (BNLI) dataset" in Google.
Thank you very much!

Test on MNLI model which was trained on SNLI

Hi,
For research purposes, I am attempting to train the ESIM model on the SNLI dataset, and then evaluate the classifier on the MNLI dataset (e.g. on the dev set). I tried to do this by calling the test_snli.py script and give it as a test-set the dev set for the (matched) MNLI dataset (after preprocessing). I get the following error:

Traceback (most recent call last):
File "test_snli.py", line 132, in
args.batch_size)
File "test_snli.py", line 113, in main
batch_time, total_time, accuracy = test(model, test_loader)
File "test_snli.py", line 54, in test
hypotheses_lengths)
File "/specific/netapp5/joberant/nlp_fall_2020/liaderez/nlp_project/ESIM/esim_env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/lib/python3.7/site-packages/esim/model.py", line 128, in forward
premises_lengths)
File "/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/lib/python3.7/site-packages/esim/layers.py", line 117, in forward
batch_first=True)
File "/lib/python3.7/site-packages/torch/nn/utils/rnn.py", line 223, in pack_padded_sequence
lengths = torch.as_tensor(lengths, dtype=torch.int64)
RuntimeError: CUDA error: device-side assert triggered

I looked at both datasets and it seems the data is formatted in the same way, do I do not know why there is an issue.
Does the code perhaps support training a model on some dataset and testing it on another? If not, any suggestions about how I should do it?

Thanks!

X

No such file or directory: worddict.pkl

Hello, genius, I just want to know where is the worddict.pkl when I run 'preprocess_bnli.py --config ...bnli_preprocessing.json'.
File "preprocess_bnli.py", line 73, in preprocess_BNLI_data
with open(worddict, 'rb') as pkl:
FileNotFoundError: [Errno 2] No such file or directory: '../data/preprocessed/SNLI/worddict.pkl'

when i run python3 train_snli.py, it happens

pytorch:1.8.0
cuda:11.1

Validation loss lower than training loss?

Throughout my first 7 epochs, loss is always lower on validation set rather than training set. Did anything go wrong or the validation set is just too easy?

`* Training epoch 2:
Avg. batch proc. time: 0.0477s, loss: 0.4858: 100%|█████████████████████████████████████████████████████████| 17168/17168 [14:15<00:00, 20.07it/s]
-> Training time: 855.3887s, loss = 0.4858, accuracy: 81.1468%

Validation for epoch 2:
-> Valid. time: 3.0217s, loss: 0.3845, accuracy: 85.4095%
Training epoch 3:
Avg. batch proc. time: 0.0476s, loss: 0.4385: 100%|█████████████████████████████████████████████████████████| 17168/17168 [14:13<00:00, 20.12it/s]
-> Training time: 853.2559s, loss = 0.4385, accuracy: 83.2263%
Validation for epoch 3:
-> Valid. time: 2.9044s, loss: 0.3668, accuracy: 86.1613%
Training epoch 4:
Avg. batch proc. time: 0.0477s, loss: 0.4120: 100%|█████████████████████████████████████████████████████████| 17168/17168 [14:14<00:00, 20.08it/s]
-> Training time: 854.8605s, loss = 0.4120, accuracy: 84.4212%
Validation for epoch 4:
-> Valid. time: 2.9331s, loss: 0.3626, accuracy: 86.4966%
Training epoch 5:
Avg. batch proc. time: 0.0477s, loss: 0.3917: 100%|█████████████████████████████████████████████████████████| 17168/17168 [14:14<00:00, 20.08it/s]
-> Training time: 854.8143s, loss = 0.3917, accuracy: 85.3156%
Validation for epoch 5:
-> Valid. time: 2.9344s, loss: 0.3559, accuracy: 86.7608%
Training epoch 6:
Avg. batch proc. time: 0.0476s, loss: 0.3766: 100%|█████████████████████████████████████████████████████████| 17168/17168 [14:14<00:00, 20.10it/s]
-> Training time: 854.1151s, loss = 0.3766, accuracy: 85.9788%
Validation for epoch 6:
-> Valid. time: 2.9510s, loss: 0.3426, accuracy: 87.2892%
Training epoch 7:
Avg. batch proc. time: 0.0477s, loss: 0.3639: 100%|█████████████████████████████████████████████████████████| 17168/17168 [14:15<00:00, 20.08it/s]
-> Training time: 855.0051s, loss = 0.3639, accuracy: 86.5372%
Validation for epoch 7:
-> Valid. time: 2.9162s, loss: 0.3464, accuracy: 87.7058%`

ModuleNotFoundError: No module named 'esim'

No matter in which way, I get the error "ModuleNotFoundError: No module named 'esim'". Thank you very much!

➜  preprocessing git:(master) pwd
~/tmp/ESIM/scripts/preprocessing
➜  preprocessing git:(master) python preprocess_snli.py
Traceback (most recent call last):
  File "preprocess_snli.py", line 12, in <module>
    from esim.data import Preprocessor
ModuleNotFoundError: No module named 'esim'

➜  scripts git:(master) pwd
~/tmp/ESIM/scripts
➜  scripts git:(master) python preprocessing/preprocess_snli.py
Traceback (most recent call last):
  File "preprocessing/preprocess_snli.py", line 12, in <module>
    from esim.data import Preprocessor
ModuleNotFoundError: No module named 'esim'

➜  ESIM git:(master) pwd
~/tmp/ESIM
➜  ESIM git:(master) python scripts/preprocessing/preprocess_snli.py
Traceback (most recent call last):
  File "scripts/preprocessing/preprocess_snli.py", line 12, in <module>
    from esim.data import Preprocessor
ModuleNotFoundError: No module named 'esim'

Prediction for new set of premise and hypothesis pair

I am having very hard time in understanding how to predict for a new set of premise, hypothesis pair?? #help

question about model parameter initialization

ESIM/esim/model.py

Line 190 in fd335c3

module.bias_hh_l0.data[hidden_size:(2*hidden_size)] = 1.0

as the code reference above, why initialize the parameter bias_hh_l0[hidden_size:(2*hidden_size)] to constant value 1.0 instead of setting all to zero ?

dataset scitail

Hi
Don't have the preprocess_scitail.py?
Didn't check the scitail dataset?

ESIM using keras

Hi
Since I don't have access to GPU, I can't execute your code, but there is another code in the github that implements your model with the keras Library . Are you confirming the following code and correct?

"""
Implementation of ESIM(Enhanced LSTM for Natural Language Inference)
https://arxiv.org/abs/1609.06038
"""
import numpy as np
from keras.layers import *
from keras.activations import softmax
from keras.models import Model

def StaticEmbedding(embedding_matrix):
in_dim, out_dim = embedding_matrix.shape
return Embedding(in_dim, out_dim, weights=[embedding_matrix], trainable=False)

def subtract(input_1, input_2):
minus_input_2 = Lambda(lambda x: -x)(input_2)
return add([input_1, minus_input_2])

def aggregate(input_1, input_2, num_dense=300, dropout_rate=0.5):
feat1 = concatenate([GlobalAvgPool1D()(input_1), GlobalMaxPool1D()(input_1)])
feat2 = concatenate([GlobalAvgPool1D()(input_2), GlobalMaxPool1D()(input_2)])
x = concatenate([feat1, feat2])
x = BatchNormalization()(x)
x = Dense(num_dense, activation='relu')(x)
x = BatchNormalization()(x)
x = Dropout(dropout_rate)(x)
x = Dense(num_dense, activation='relu')(x)
x = BatchNormalization()(x)
x = Dropout(dropout_rate)(x)
return x

def align(input_1, input_2):
attention = Dot(axes=-1)([input_1, input_2])
w_att_1 = Lambda(lambda x: softmax(x, axis=1))(attention)
w_att_2 = Permute((2,1))(Lambda(lambda x: softmax(x, axis=2))(attention))
in1_aligned = Dot(axes=1)([w_att_1, input_1])
in2_aligned = Dot(axes=1)([w_att_2, input_2])
return in1_aligned, in2_aligned

def build_model(embedding_matrix, num_class=1, max_length=30, lstm_dim=300):
q1 = Input(shape=(max_length,))
q2 = Input(shape=(max_length,))

Embedding

embedding = StaticEmbedding(embedding_matrix)
q1_embed = BatchNormalization(axis=2)(embedding(q1))
q2_embed = BatchNormalization(axis=2)(embedding(q2))

Encoding

encode = Bidirectional(LSTM(lstm_dim, return_sequences=True))
q1_encoded = encode(q1_embed)
q2_encoded = encode(q2_embed)

Alignment

q1_aligned, q2_aligned = align(q1_encoded, q2_encoded)

Compare

q1_combined = concatenate([q1_encoded, q2_aligned, subtract(q1_encoded, q2_aligned), multiply([q1_encoded, q2_aligned])])
q2_combined = concatenate([q2_encoded, q1_aligned, subtract(q2_encoded, q1_aligned), multiply([q2_encoded, q1_aligned])])
compare = Bidirectional(LSTM(lstm_dim, return_sequences=True))
q1_compare = compare(q1_combined)
q2_compare = compare(q2_combined)

Aggregate

x = aggregate(q1_compare, q2_compare)
x = Dense(num_class, activation='sigmoid')(x)

return Model(inputs=[q1, q2], outputs=x)

link github: https://gist.github.com/namakemono/b74547e82ef9307da9c29057c650cdf1

Buffered data was truncated after reaching the output size limit.

Hi
Thanks for the implementation.
Is this code executed on Google Colab?

I ran it on Google Colab and got an error on epoch 15?

Buffered data was truncated after reaching the output size limit.

Do I need to run until epoch 64?
On which epochs will perform best?

50% train/dev accuracy for a binary classification task

Hi, here is a question that I try to use the ESIM to realize a binary classification task. I got more than 90% train/dev accuracy by using other models, but I got 50% accuracy by using ESIM. Does anyone have met similar issue before? Any suggestion will be great! Thanks a lot.

I have run the code for many times, but the test result fails to reach 88, which is only about 87.6. Is there any detail that needs attention

I have run the code for many times, but the test result fails to reach 88, which is only about 87.6. Is there any detail that needs attention？

Complete Esim implementation

Can you please state the reason as to why HIM (Hybrid Inference Model) is not implemented in many ESIM implementations.
Is their no visible improvement when Tree LSTM is added ?
which is advisable plain ESIM or HIM (considering the time to get an inference too)
or is BiMPM better of the three?