The pragtag from lilys012

Enhancing Cross-Domain Generalization through Data Augmentation with Reduced Uncertainty

This is a repository containing the code of our submission to ArgMining @ EMNLP 2023.

The task is constructed in three conditions: full, low, and zero. Each subdirectory has its own README.md which instructs how to conduct experiments. Visit pragtag folder for further details.

There are some data that are universally shared across all conditions, which are stored in public_dat. Please check Data section below for specific instructions.

Besides data augmentation strategies, finetuning, predicting, and evaluating are performed at seperate directories.

Finetune

finetune_baseline.py is a file of finetuning an encoder classifier. Just run the below.

python3 finetune_baseline.py <train data path> <output path> <model name>

Various hyperparameters are subject to change. We commented all hyperparamters needed for each experiment, so please feel free to adjust the below hyperparameters in the file.

seed, num_training_epochs, learning_rate, per_device_train_batch_size, callbacks

Please be aware that on zero condition, only four labels are classified. The following has to be changed.

LABELS at load.py.
CLASS_MAP and iCLASS_MAP at utils.py

Prediction

predict_baseline.py allows trained model to predict labels when data is given. Just run the below.

python3 predict_baseline.py <test input data path> <model checkpoint path> <output path>

Evaluation

main.py will evaluate the F1-scores across all domains.

python3 main.py <input_path> <output_path>

Here, the input path should point to a directory containing a folder "ref" with the true labels (or training data with labels) under the name test_labels.json, and a folder res with the predicted labels (under predicted.json).

Data

Please rename the dataset and place it where described in the respective README.md.

Below are links to obtain the dataset.

Public Data

Gain access to the data from workshop page and rename them accordingly.

Auxiliary Data

Click on the link provided in the shared task and request the data. After confirmation (requires prior registration with the shared task), you will receive the auxiliary data. For conveniently loading it, checkout the associated github repo.

lilys012 / pragtag Goto Github PK

pragtag's Introduction

Enhancing Cross-Domain Generalization through Data Augmentation with Reduced Uncertainty

Finetune

Prediction

Evaluation

Data

Public Data

Auxiliary Data

pragtag's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent