Giter Club home page Giter Club logo

breaking_nli's Introduction

Breaking NLI

NLI test set with lexical inferences

Overview

This dataset consists of 8193 premise-hypothesis sentence-pairs annotated to entailment, contradiction, and neutral. The premise and the hypothesis are identical except for one word/phrase that was replaced. This dataset is meant for testing methods trained to solve the natural language inference task, and it requires some lexical and world knowledge to achieve reasonable performance on it.

Fields

  • sentence1: The premise sentence.
  • sentence2: The hypothesis sentence, which is the same as the premise except for one word/phrase that was replaced.
  • annotator_labels: These are all of the individual labels from the three annotators.
  • gold_label: This is the label chosen by the majority of annotators.
  • pairID: A unique identifier for each sentence1--sentence2 pair.
  • category: This category sematically groups the replaced words.

Data Source

The premise sentences are taken from the Stanford Natural Language Inference corpus:

Samuel R. Bowman, Gabor Angeli, Christopher Potts, and Christopher D. Manning. 2015.
A large annotated corpus for learning natural language inference. 
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP).
@inproceedings{snli:emnlp2015,
		Author = {Bowman, Samuel R. and Angeli, Gabor and Potts, Christopher, and Manning, Christopher D.},
		Booktitle = {Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
		Publisher = {Association for Computational Linguistics},
		Title = {A large annotated corpus for learning natural language inference},
		Year = {2015}}

Statistics

Sentence pairs: 8193
Labels: {'entailment': 982 'neutral': 47, 'contradiction': 7164}

Categories: {
'antonyms': 1147, 
'synonyms': 894, 
'cardinals': 759, 
'nationalities': 755, 
'drinks': 731, 
'antonyms_wordnet': 706, 
'colors': 699, 
'ordinals': 663, 
'countries': 613, 
'rooms': 595, 
'materials': 397, 
'vegetables': 109, 
'instruments': 65, 
'planets': 60}

Citation

If you use this dataset in any published research, please cite:

@InProceedings{glockner_acl18,
  author    = {Glockner, Max and Shwartz, Vered and Goldberg, Yoav},
  title     = {Breaking NLI Systems with Sentences that Require Simple Lexical Inferences},
  booktitle = {The 56th Annual Meeting of the Association for Computational Linguistics (ACL)},
  month     = {July},
  year      = {2018},
  address   = {Melbourne, Australia}
}

License

This dataset is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Comments

  • The method we used for estimating human performance is based on Gong et al. (2018), and its description is only available in a previous version of that paper.

breaking_nli's People

Contributors

vered1986 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.