Giter Club home page Giter Club logo

training-data's Introduction

training-data

Training Data for the NLPContributionGraph Shared Task 11 at SemEval-2021

The repository is organized as follows:

README.md                            
[task-name-folder]/                                # natural_language_inference, paraphrase_generation, question_answering, relation_extraction, topic_models
    ├── [article-counter-folder]/                  # ranges between 0 to 100 since we annotated varying numbers of articles per task
    │   ├── [articlename].pdf                      # scholarly article pdf
    │   ├── [articlename]-Grobid-out.txt           # plaintext output from the [Grobid parser](https://github.com/kermitt2/grobid)
    │   ├── [articlename]-Stanza-out.txt           # plaintext preprocessed output from [Stanza](https://github.com/stanfordnlp/stanza)
    │   ├── sentences.txt                          # annotated Contribution sentences in the file
    │   ├── entities.txt                           # annotated entities in the Contribution sentences
    │   └── info-units/                            # the folder containing information units in JSON format
    │   │   └── research-problem.json              # `research problem` mandatory information unit in json format
    │   │   └── model.json                         # `model` information unit in json format; in some articles it is called `approach`
    │   │   └── ...                                # there are 12 information units in all and each article may be annotated by 3 or 6
    │   └── triples/                               # the folder containing information unit triples one per line
    │   │   └── research-problem.txt               # `research problem` triples (one research problem statement per line)
    │   │   └── model.txt                          # `model` triples (one statement per line)
    │   │   └── ...                                # there are 12 information units in all and each article may be annotated by 3 or 6
    │   └── ...                                    # there are between 1 to 100 articles annotated for each task, so this repeats for the remaining annotated articles
    └── ...                                        # there are 24 tasks selected overall, so this repeats 23 more times

Data Element Counts

Total Papers Annotated: 237

Tasks info-units sentences entities total triples total unique triples subject predicate object
1 natural_language_inference 427 2168 12657 7969 7330 3171 1251 5242
2 negation_scope_resolution 4 28 163 94 94 50 42 80
3 paraphrase_generation 9 44 293 177 175 99 77 160
4 part-of-speech_tagging 36 144 804 501 479 249 156 401
5 passage_re-ranking 8 32 214 126 123 63 66 103
6 phrase_grounding 5 29 172 102 102 58 53 94
7 prosody_prediction 5 31 172 105 103 58 43 97
8 query_wellformedness 5 11 54 35 35 22 25 33
9 question_answering 30 194 1059 665 640 332 203 547
10 question_generation 7 34 133 87 87 45 44 74
11 question_similarity 4 16 82 51 51 30 26 49
12 relation_extraction 69 346 1923 1154 1084 552 372 922
13 sarcasm_detection 10 40 225 138 136 77 73 116
14 semantic_parsing 12 60 275 183 180 91 74 157
15 semantic_role_labeling 22 100 545 338 318 163 137 288
16 sentence_classification 15 85 513 300 297 167 134 273
17 sentence_compression 19 77 426 260 248 138 104 223
18 sentiment_analysis 240 1275 7452 4517 4086 1864 940 2967
19 smile_recognition 3 17 85 54 54 29 34 49
20 temporal_information_extraction 8 26 152 94 93 58 62 85
21 text-to-speech_synthesis 13 69 316 197 192 103 98 174
22 text_generation 24 129 704 431 420 222 165 351
23 text_summarization 70 347 1777 1077 1010 513 346 825
24 topic_models 5 18 75 48 48 30 28 48

Note

For system training, participants are encouraged to merge the 50 files additionally from the trial-data release.

training-data's People

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.