Giter Club home page Giter Club logo

pragtag's Introduction

Enhancing Cross-Domain Generalization through Data Augmentation with Reduced Uncertainty

This is a repository containing the code of our submission to ArgMining @ EMNLP 2023.

The task is constructed in three conditions: full, low, and zero. Each subdirectory has its own README.md which instructs how to conduct experiments. Visit pragtag folder for further details.

There are some data that are universally shared across all conditions, which are stored in public_dat. Please check Data section below for specific instructions.

Besides data augmentation strategies, finetuning, predicting, and evaluating are performed at seperate directories.

Finetune

finetune_baseline.py is a file of finetuning an encoder classifier. Just run the below.

python3 finetune_baseline.py <train data path> <output path> <model name>

Various hyperparameters are subject to change. We commented all hyperparamters needed for each experiment, so please feel free to adjust the below hyperparameters in the file.

seed, num_training_epochs, learning_rate, per_device_train_batch_size, callbacks

Please be aware that on zero condition, only four labels are classified. The following has to be changed.

  1. LABELS at load.py.
  2. CLASS_MAP and iCLASS_MAP at utils.py

Prediction

predict_baseline.py allows trained model to predict labels when data is given. Just run the below.

python3 predict_baseline.py <test input data path> <model checkpoint path> <output path>

Evaluation

main.py will evaluate the F1-scores across all domains.

python3 main.py <input_path> <output_path>

Here, the input path should point to a directory containing a folder "ref" with the true labels (or training data with labels) under the name test_labels.json, and a folder res with the predicted labels (under predicted.json).

Data

Please rename the dataset and place it where described in the respective README.md.

Below are links to obtain the dataset.

Public Data

Gain access to the data from workshop page and rename them accordingly.

Auxiliary Data

Click on the link provided in the shared task and request the data. After confirmation (requires prior registration with the shared task), you will receive the auxiliary data. For conveniently loading it, checkout the associated github repo.

pragtag's People

Contributors

lilys012 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.