brmson / dataset-sts Goto Github PK
View Code? Open in Web Editor NEWSemantic Text Similarity Dataset Hub
Semantic Text Similarity Dataset Hub
We are maybe a little careless in the way we train the more complex (CNN, RNN etc.) models, in that we should carefully check the gradients and also the actual weight matrices; for CNN, some papers renormalize weights if their norm is too large, for RNN something similar might be necessary. I suspect that since we are getting reasonable-looking results, it's probably not a crucial issue, nevertheless we might get some improvements from deeply understanding the practical progression of training in our models.
We should implement+benchmark the http://arxiv.org/abs/1602.03609 model of attentive pooling. I wonder if some DimShuffles will be enough for the "horizontal maxpooling" in the final composition step.
Hi,
I am evaluating existing algorithms for paraphrasing and entailment. I wanted to run para.py, but I am unable to do so.
Keras have removed LambdaMerge and hence your code doesn't work. I am new to Keras and the topic.
Here is the link that talks about how to do lamda merge without it.
keras-team/keras#2342
Can you update your code?
thanks,
It is becoming popular to preinitialize matrices, especially projection matrices and MLP matrices) with identity. Recommended e.g. by the Maluuba guys in "A Parallel-Hierarchical Model for Machine Comprehension on Sparse Data".
Another part of this is taking a more serious look at relu again as a transfer function - of course with tanh() the identity will be a bit skewed even though repeated tanh() near zero doesn't have a big effect.
We want fineval for all tasks, which is getting bothersome (I really don't want to implement a separate one for Ubuntu too...), and ubuntu_transfer_* also shows clear scaling limitations and we'd want a universal transfer script instead. Let's make a class interface and implement each task in that.
Work will happen in f/tasksep.
Skip-connections (TODO what's the reference?) mean connecting both input and previous layer in the inner layers of the RNN. rnnlevels>1 (introduced for anssel in (Wang+Nyberg, 2015)) didn't work great for us, but this might help.
We still depend on Keras 0.3.2 and its Graph model interface. Porting to Keras 1.0 functional interface is top priority.
Accidentally discovered that mctest Ddim=2 results are much better than the Ddim=0 default.
We should include the model of http://arxiv.org/abs/1602.07019 - should be a relatively easy one, with probably some custom lambdas in the decomposition step.
So far, we have support just for parameter tuning using random search (tools/anssel_tune.py). Since this search is pretty high dimensional, what we then sometimes do is look at what works in general and manually focus the parameters on that. But this is sort of inexact, and quite tiresome + boring too. We should use a smarter way to tune stuff!
I don't know if there's a better choice than https://github.com/JasperSnoek/spearmint .
(Using software allowing commercial usage etc. and ideally without CLA is pretty important to me.)
We should come back to attn1511 while trying out a bilinear form for attention rather than the dot product or elementwise weighed sum. This is basically an analog of our projection layer, and what MemNNs use for memory-level attention and what Danqi Chen, Jason Bolton and Christopher D. Manning report as quite helpful for the CNN/Daily Mail Reading Comprehension Task.
In the yoda datasets of anssel task, we have extra supervision in the form of binary markers for tokens that actually denote the answer. We should try to make use of this supervision during training by just passing that as another set of NLP-style token flags. It should be pretty easy (except that it'll of course break transfer learning; but there's a way to keep transfer learning working if we make some changes that include fixing N to original embedding size).
Model wrapper for relevance modelling in hypev selection.
In the pearsonobj function (implemented in objectives.py) , class to score conversion is done on both y_true and y_pred and it is used as the loss function for the sts data set. But in sts data set the true values are floating point numbers so is it necessary to do class to score conversion(for y_true) here? If I am going wrong somewhere could you clarify the approach in pearsonobj you are using in context of sts dataset.
This should be a hard one! Let's use our chios infrastructure to generate it from what AI2 publicly released.
Future plans: The winner models are now on github, use their output instead
Hi,
Thanks for this useful project.
There is an issue I guess in the evaluation of the termfreq model.
I'm running:
python3 tools/train.py termfreq anssel ./data/anssel/wang/train.csv ./data/anssel/wang/test.csv inp_e_dropout=1/2 nb_epoch=1
However, I get this error:
"... tools/pysts/eval.py", line 28, in binclass_accuracy rawacc = np.sum((ypred > 0.5) == (y > 0.5)) / ypred.shape[0]
TypeError: unorderable types: dict() > float()
The origin of this issue is that y_pred
should be replaced by y_pred['score']
for this particular task (as the function predict
in termfreq.py
returns a dictionary).
This is also the case in the other function aggregate_s0
.
Still after fixing this, I get another issue:
... /tools/pysts/eval.py", line 122, in mrr if yy[1] in ysd: TypeError: unhashable type: 'numpy.ndarray'
I appreciate your feedback on this issue or on whether I am running something incorrectly.
2015.test.tsv
has 12250 pairs while 2015.train.tsv
has 3000. Is that correct?"Inner Attention based Recurrent Neural Networks for Answer Selection", attention applied before RNN is awesome, apparently. We could also try to "sandwich" attention in two layers of a multi-level RNN.
Add support for easy exploration of whether individual neurons are learning specific concepts - say similar as the heatmap table, but with extra javascript code that lets you quickly flip through highlighting based on individual dimensions rathre than the whole norm.
See also http://arxiv.org/pdf/1506.02078.pdf
We should have some non-neural baselines.
anything else?
Thanks for the wonderful tool! But I got some errors when I tried one command in the readme: python tools/train.py cnn para data/para/msr/msr-para-train.tsv data/para/msr/msr-para-val.tsv
Using the latest version of keras (1.0.7), I found the following error:
ImportError: cannot import name LambdaMerge
I replaced all the LambdaMerge by Merge in blocks.py and re-run the command, another error appears:
Exception: Layer e0[0] does not support masking, but was passed an input_mask: Elemwise{neq,no_inplace}.0
Then I uninstall it and install keras 0.3.2, but another error occur.
AssertionError: Keyword argument not understood: dropout
Appreciate the help!
For the SNLI task, models
are doing well. The basic idea is that we have two RNNs for the two sentences, but the second one is initialized by the output of the first one. Plus there is one-direction attention.
The models seem to be pretty incremental tweaks of each other, so it would be probably easiest to implement this as a single model with configurable features. Not sure how to coerce Keras to perform the initialization of the second RNN, though, might require Keras modifications.
Somehow, the AskUbuntu (asku) task is broken and the models don't get trained:
RunID: asku-avg--5e692e270bddb64b-00 ({"Ddim": "1", "balance_class": "False", "batch_size": "192", "deep": "0", "e_add_flags": "True", "embdim": "300", "epoch_fract": "0.25", "f_add_kw": "False", "fix_layers": "[]", "inp_e_dropout": "0.333333333333", "inp_w_dropout": "0", "l2reg": "1e-05", "loss": "<function ranknet at 0x995b5f0>", "mlpsum": "sum", "nb_epoch": "16", "nb_runs": "4", "nnact": "relu", "nninit": "glorot_uniform", "opt": "adam", "pact": "tanh", "pdim": "1", "prescoring": "None", "prescoring_input": "None", "prescoring_prune": "None", "project": "True", "ptscorer": "<function mlp_ptscorer at 0x99631b8>", "wact": "linear", "wdim": "1", "wproject": "False"})
Model
Training
Epoch 1/16
323637/323518 [==============================] - 384s - loss: 0.6949 val mrr 0.471191
Epoch 2/16
323543/323518 [==============================] - 373s - loss: 0.6933 val mrr 0.463537
Epoch 3/16
323620/323518 [==============================] - 374s - loss: 0.6932 val mrr 0.455145
Epoch 4/16
323677/323518 [==============================] - 384s - loss: 0.6932 val mrr 0.454692
In the anssel task, it is semi-standard practice to ensemble the NN-based scores with TF-IDF-based scores in an additional logreg-like layer. That should be easy for us to do with the termfreq model.
Another approach that hasn't been done before but is eminently important from practical POV is to prerank by TF-IDF-like measure and then filter out just top N for NN scoring.
Is there some dataset that has sentence pairs annotated for both semantic similarity and semantic relatedness?
Similarity here is 'same meaning', whereas relatedness is more general with similarity being one of the relationships between concepts.
I know there exists one for word pairs (WordSim), but is there such a dataset with sentence pairs?
We didn't have much success with multi-level RNNs, but skip-layer connections represent an important innovation in that regard.
Implement a model that uses skip-thoughts (sentence-wide embeddings) to generate aggregate sentence representations. We probably want to just rely on an external component to precompute the representations. Not sure if we can meaningfully combine this with other architectures, but it should serve at least as a really strong baseline.
A common practice in neural NLP models is to have the embedding matrix adaptable, but only the portion of it that covers randomly initialized rather than preinitialized word embeddings. This might help overfitting.
Unfortunately, this is not completely straightforward in Keras. A possible idea would be to transform word indices to index tuples and have two embedding matrices, one fixed and another trainable. Or modify Keras to allow per-row trainability, but I don't know how hard that would be.
Progress tacker for conversion outlined below:
We need that for proper reporting, split from the training set.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.