Giter Club home page Giter Club logo

tab's Introduction

TAB (Tuning with Auxiliary Branch)

Code and pretrained models for ACL 2023 main conference short paper: "Modality Adaption or Regularization? A Case Study on End-to-End Speech Translation". method

Environment Configuration

conda create -n tab python=3.6
conda activate tab
conda install pytorch==1.8.0 torchaudio==0.8.0 cudatoolkit=11.1 -c pytorch -c conda-forge
# cudatoolkit=10.2 is also acceptable.

mkdir ~/TAB && cd ~/TAB && mkdir data checkpoints
git clone https://github.com/hannlp/tab Fairseq-S2T
cd Fairseq-S2T
pip install --upgrade pip
pip install --editable ./
pip install sacrebleu==1.4.13 pandas sacremoses sentencepiece configargparse gpustat tensorboard editdistance

Examples on MuST-C en-de

cd ~/TAB/Fairseq-S2T/egs/mustc/st

Data Preparation

ln -s path/to/mustc_root/ende ~/TAB/data/mustc.en-de

bash run.sh --tgt_lang de
# It may take a bit of time.

Once you have pre-processed the data, the directory should look like this:

ls ~/TAB/data/mustc.en-de/st
config_share.yaml  dev.tsv      spm_unigram10000_st_share.model  spm_unigram10000_st_share.vocab  tst-COMMON.en
dev.en             fbank80.zip  spm_unigram10000_st_share.txt    train.tsv                        tst-COMMON.tsv

Tuning with auxiliary branch

  1. Download the pre-trained ASR and MT models and modify the parameters in conf/tab.yaml such as load-pretrained-xxx-from to match the corresponding models.
  2. Some other parameters in the paper can be adjusted, including: use-auxiliary-branch,consistency-type,consistency-weight,replacement-probability-strategy,replacement-probability,uncertainty-gamma.
  3. Run the following command to start training (Note that gpu_num * max_tokens * update_freq should be 320k. And exp_tag can be set to mark and distinguish different experiments).
bash run.sh --stage 1 --stop_stage 1 --tgt_lang de --train_config tab --exp_tag uncertainty_gamma0.5_alpha5 --gpu_num 4 --max_tokens 20000 --update_freq 4

The trained models will be saved at ~/TAB/checkpoints/mustc.en-de/st/{DATE}_tab_{EXP_TAG}

Evaluate

bash run.sh --stage 2 --stop_stage 2 --tgt_lang de --exp_name {DATE}_tab_{EXP_TAG} --gpu_num 1 --test_subset dev,tst-COMMON

Contact and Citation

To keep things simple, we have not included the scripts for the pre-training stage. If you have any questions, please do not hesitate to contact me at [email protected]. If you find this repository helpful, please cite it as:

@inproceedings{han-etal-2023-modality,
    title = "Modality Adaption or Regularization? A Case Study on End-to-End Speech Translation",
    author = "Han, Yuchen  and
      Xu, Chen  and
      Xiao, Tong  and
      Zhu, Jingbo",
    booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)",
    month = jul,
    year = "2023",
    address = "Toronto, Canada",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.acl-short.115",
    pages = "1340--1348",
    abstract = "Pre-training and fine-tuning is a paradigm for alleviating the data scarcity problem in end-to-end speech translation (E2E ST). The commonplace {''}modality gap{''} between speech and text data often leads to inconsistent inputs between pre-training and fine-tuning. However, we observe that this gap occurs in the early stages of fine-tuning, but does not have a major impact on the final performance. On the other hand, we find that there has another gap, which we call the {''}capacity gap{''}: high resource tasks (such as ASR and MT) always require a large model to fit, when the model is reused for a low resource task (E2E ST), it will get a sub-optimal performance due to the over-fitting. In a case study, we find that the regularization plays a more important role than the well-designed modality adaption method, which achieves 29.0 for en-de and 40.3 for en-fr on the MuST-C dataset.",
}

tab's People

Contributors

hannlp avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.