💡 SemEval-2022 Task 7: Identifying Plausible Clarifications of Implicit and Underspecified Phrases in Instructional Texts

Starter kit with a format checker, a scorer and baselines

The Task

The goal of this shared task is to evaluate the ability of NLP systems to distinguish between plausible and implausible clarifications of an instruction.

For details on the task, see the task website and the CodaLab page.

Repo Setup

After cloning this repository, install the dependencies by running the following command:

$ pip install -r requirements.txt

To download the training data and the corresponding labels, got to the task website:

one file with the training instances
one reference file with the gold class labels for these instances (classification subtask)
one reference file with the gold plausibility scores for these instances (ranking subtask)

Format Checker

Each submission file with predictions has to be a TSV.

The requirements for a classification task submission are:

there should be two columns, one for identifiers and another for the class labels
the id is a string:
- it starts with an integer representing the id of the instance
- next, there is an underscore
- finally, the id of the filler (1 to 5)
- e. g. "42_1" stands for the sentence with id 42 with filler 1
the class label is string from the label set "IMPLAUSIBLE", "NEUTRAL", "PLAUSIBLE"

The requirements for a ranking task submission are:

there should be two columns, one for identifiers and another for the plausibility scores
the id looks like the one for the classification task
the plausibility score is a float

To see an example for these format requirements, have a look at the files with gold class labels and plausibility scores published on the task website.

There is a format checker for submission files. You can call it with the following flags:

# for the classification task
$ python format_checker_for_submission.py --path_to_predictions submission_classification.tsv --subtask classification
# for the ranking task
$ python format_checker_for_submission.py --path_to_predictions submission_ranking.tsv --subtask ranking

Scorer

The scorer takes

a submission file with predictions in the above mentioned format
a reference file with gold labels that is in the same format as the submission file

You can download the reference files for the two subtasks from the task website.

For the classification task, the scorer calculates the accuracy:

$ python scorer.py --path_to_predictions submission_classification.tsv --path_to_labels reference_classification.tsv --subtask classification

For the ranking task, the scorer computes Spearman's rank correlation coefficient:

$ python scorer.py --path_to_predictions submission_ranking.tsv --path_to_labels reference_ranking.tsv --subtask ranking

Baselines

We provide two very simple baseline models here to help you to get started with the shared task. The baseline for the classification subtask combines a tf-idf feature extractor with a Naive Bayes classifier. The baseline for the ranking subtask uses a tf-idf feature extractor and a linear regression model.

After being trained on the training set, these models achieve the following performance on the development set:

subtask and performance metric	performance
multi-class classification: accuracy	33.44%
ranking: Spearman's rank correlation coefficient	+0.0590

You can use the script main.py to reproduce these training and evaluation steps:

# for the classification subtask
$ python main.py --path_to_train train_data.tsv --path_to_training_labels train_labels.tsv --path_to_dev dev_data.tsv --path_to_dev_labels dev_labels.tsv --path_to_predictions pred_dev_class.tsv --classification_baseline bag-of-words
# for the ranking subtask
$ python main.py --path_to_train train_data.tsv --path_to_training_labels train_scores.tsv --path_to_dev dev_data.tsv --path_to_dev_labels dev_scores.tsv --path_to_predictions pred_dev_rank.tsv --ranking_baseline bag-of-words

The script produces a TSV file with predictions (under the path specified with the flag path_to_predictions) and then hands that file to the scorer script.

Questions? Need a Clarification? :)

If you have technical trouble with the code in this repo, please open an Issue. You can also ask question on the task in general in our Google group.

acidann / semeval2022_task7_starter_kit Goto Github PK

semeval2022_task7_starter_kit's Introduction

💡 SemEval-2022 Task 7: Identifying Plausible Clarifications of Implicit and Underspecified Phrases in Instructional Texts

Contents

The Task

Repo Setup

Format Checker

Scorer

Baselines

Questions? Need a Clarification? :)

semeval2022_task7_starter_kit's People

Contributors

Stargazers

Watchers

Forkers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent