Giter Club home page Giter Club logo

daga's Introduction

DAGA

This is the source code of our method proposed in paper "DAGA: Data Augmentation with a Generation Approach for Low-resource Tagging Tasks" accepted by EMNLP 2020.

Examples

flair_seq_tagger: sequense tagging model

cd flair_seq_tagger;

python train_tagger.py \
  --data_dir PATH/TO/TRAIN_DIR \
  --train_file  train.txt \
  --dev_file  dev.txt \
  --data_columns text ner \
  --model_dir ./model \
  --comment_symbol "__label__" \
  --embeddings_file PATH/TO/emb \
  --optim adam \
  --learning_rate 0.001 --min_learning_rate 0.00001 \
  --patience 2 \
  --max_epochs 100 \
  --hidden_size 512 \
  --mini_batch_size 32 \
  --gpuid 0

lstm-lm: LSTM language model

  • train lstm-lm on linearized sequences
cd lstm-lm;

python train.py \
  --train_file PATH/TO/train.linearized.txt \
  --valid_file PATH/TO/dev.linearized.txt \
  --model_file PATH/TO/model.pt \
  --emb_dim 300 \
  --rnn_size 512 \
  --gpuid 0 
  • generate linearized sequences
cd lstm-lm;

python generate.py \
  --model_file PATH/TO/model.pt \
  --out_file PATH/TO/out.txt \
  --num_sentences 10000 \
  --temperature 1.0 \
  --seed 3435 \
  --max_sent_length 32 \
  --gpuid 0

tools: tools for data processing

  • preprocess.py: sequence linearization
  • line2cols.py: convert linearized sequence back to two-column format

Requirements

  • flair_seq_tagger/requirements.txt
  • lstm-lm/requirements.txt

Citation

Please cite our paper if you found the resources in this repository useful.

@inproceedings{ding-etal-2020-daga,
    title = "{DAGA}: Data Augmentation with a Generation Approach for Low-resource Tagging Tasks",
    author = "Ding, Bosheng  and
      Liu, Linlin  and
      Bing, Lidong  and
      Kruengkrai, Canasai  and
      Nguyen, Thien Hai  and
      Joty, Shafiq  and
      Si, Luo  and
      Miao, Chunyan",
    booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.emnlp-main.488",
    doi = "10.18653/v1/2020.emnlp-main.488",
    pages = "6045--6057",
}

daga's People

Contributors

liulinlin90 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.