Speaker Attribution in German Parliamentary Debates through BERT models

This repository holds the code for the submission “Politics, BERTed: Automatic Attribution of Speech Events in German Parliamentary Debates” submitted to the KONVENS 2023 Shared Task on Speaker Attribution, Task 1.

The accompanying paper can be found here: PDF (Full Proceeedings).

The goal of the shared task is the automatic identification of speech events in political debates and attributing them to their respective speakers, essentially identifying who says what to whom in the parliamentary debates.

The task is divided into two subtasks:

Task 1a is the full task, predicting both cue spans and associated role spans
Task 1b is the role prediction task only, where gold cue spans are already given.

Citing

If you are using the software and/or models, please consider citing the accompanying publication:

Ehrmanntraut, Anton. 2023. “Politics, BERTed: Automatic Attribution of Speech Events in German Parliamentary Debates.” In Proceedings of the GermEval 2023 Shared Task on Speaker Attribution in Newswire and Parliamentary Debates (SpkAtt-2023), edited by Ines Rehbein, Fynn Petersen-Frey, Annelen Brunner, Josef Ruppenhofer, Chris Biemann, and Simone Paolo Ponzetto, 22–30. Ingolstadt, Germany.

@inproceedings{ehrmanntraut_politics_2023,
	location = {Ingolstadt, Germany},
	title = {Politics, {BERTed}: Automatic Attribution of Speech Events in German Parliamentary Debates},
	pages = {22--30},
	booktitle = {Proceedings of the {GermEval} 2023 Shared Task on Speaker Attribution in Newswire and Parliamentary Debates ({SpkAtt}-2023)},
	author = {Ehrmanntraut, Anton},
	editor = {Rehbein, Ines and Petersen-Frey, Fynn and Brunner, Annelen and Ruppenhofer, Josef and Biemann, Chris and Ponzetto, Simone Paolo},
	date = {2023-09-18}
}

Models

Used Base Model	SpkAtt-F1 (test set)	Match-F1 (dev set)	Download
aehrm/gepabert	82.8	84.8	Link
deepset/gbert-large	(not evaluated)	84.4	Link
deepset/gbert-base	(not evaluated)	81.2	Link

Setup

The project uses poetry for dependency management. You can just run: poetry install to install all dependencies.

You may open a shell with poetry shell with all required python packages and interpreter. Alternatively, you can run scripts with the project-dependent python interpreter with poetry run python <script.py>.

Usage

Inference

Before inference, you either need to download the published models and place them into the models/ folder, or train the models yourself (see below).

After the models/ folder has been populated, you can run the full inference (1a) like this:

# e.g, download GePaBERT models
(cd models; wget https://github.com/aehrm/spkatt_gepade/releases/download/konvens/gepabert_models.tar; tar xf gepabert_models.tar;)


# adjust if needed
#export PEFT_MODEL_DIR=./models

poetry run python ./predict.sh 1a input_dir [output_dir]

The input_dir should hold tokenized speeches as JSON file, like in the GePaDe test dataset (the one provided for the shared task).

E.g., to reproduce the results, run

wget https://github.com/umanlp/SpkAtt-2023/archive/refs/heads/master.zip -O gepade.zip
unzip gepade.zip

poetry run python ./predict.sh 1a SpkAtt-2023-master/data/dev/task1 [output_dir]

Alternatively, you can run the subtask 1b (role prediction from gold cues) like the following, e.g., on the GePaDe dev dataset. Make sure the input_dir JSON files contain annotation objects with cue spans.

poetry run python ./predict.sh 1b path/to/spkatt_data/dev/task1 [output_dir]

Training

After downloading the full GePaDe dataset in the folder data, you can run the training like this:

# adjust if needed
#export BASE_MODEL_NAME=aehrm/gepabert
#export PEFT_MODEL_DIR=./models
#export TRAIN_FILES='./data/train/task1'
#export DEV_FILES='./data/dev/task1'

poetry run python ./train_cue_detector.py
poetry run python ./train_cue_joiner.py
poetry run python ./train_role_detector.py

aehrm / spkatt_gepade Goto Github PK

spkatt_gepade's Introduction

Speaker Attribution in German Parliamentary Debates through BERT models

Citing

Models

Setup

Usage

Inference

Training

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent