Giter Club home page Giter Club logo

active-it's Introduction

Active Instruction Tuning

  • This repo releases our implementation for the ActiveIT in the paper Active Instruction Tuning: Improving Cross-Task Generalization by Training on Prompt Sensitive Tasks.
  • Active Instruction Tuning aims to select novel tasks from a large task pool, and continuously improve the model's performance by training on this subset of tasks
  • In this work, we proposed the following three components:
    • Active Instruction Tuning: A framework to actively select novel tasks to improve an instruction-tuned model.
    • Prompt Uncertainty: Measure prompt sensitiveness of a task to a model and identify novel tasks.
    • Task Map: Task categorization tool to identify Easy, Difficult, and Ambiguous tasks without the need for data labels.

About This Repo

Requirements

Our main experiments and analysis are conducted on the following environment:

  • CUDA (11.3)
  • cuDNN (8.2.0.53)
  • Pytorch (1.10.0)
  • Transformers (4.17.0)
  • DeepSpeed
pip install -r requirements.txt

Preparation

Before running experiments, should download natural-instruction-v2 dataset, and then copy the task information from our previous experiments for reproducing. Run the following script to prepare the files.

sh prepare.sh

Reproduce Experiments Results

To reproduce the experiments results from the paper, we will use the task split we got from our experiments. When running the following experiment script, it will not update the task pool at each ActiveIT iteration but to use the one from our experiments.

cd ActiveIT
sh reproduce.sh
  • Note the following:

    • For reproduce purpose, we do not update the task pool but use out previous seleted tasks. This is done by specifying --no_update_task_list in ActiveIT/reproduce.sh
    • The reason why we've done this is because of the large variance during task selection. While we fix the selected tasks to be the same as our previous experiments, the data randomly select from each task is still different, which can make the results different from our report scores.
    • To get a fair performance comparison, always run the experiment on multiple random seeds and get the average scores.
  • After running the ActiveIT pipleine, you can check your score via ActiveIT/ActiveIT_README.ipynb You can also check the TaskMap of the trained model via this file.

    • However, as we mentioned that we do not update the training task list but use the old one for reproducibility, the selected task plotted in the Task Map is the old one, which can make the selected task looks inconsistent with the task selecting algorithm.
    • For example, while prompt uncertainty method should select the tasks with highest prompt uncertainty(the rightmost points), it will not be the case in the Task Map plotted here. Since the prompt uncertainty and prediction probability of all these tasks are measured from this newly trained model and the selected tasks are based on our previous experiments, the selected tasks plotted here might not all be the rightmost points.

Running New Experiments

Create New Task Pool

To run new experiments, you can follow the ActiveIT/ActiveIT_README.ipynb to create new random splits(Task Pool), which include a new random set of training tasks for iteration 0. The newly created splits can be found at natural-instructions/splits/.

Run New Experiments

To run a new experiments, you can run the following script in ActiveIT/ folder:

python3 my_scripts/TLAL/TLAL_pipeline.py
  --AL_type FCGRatioPI-NL-I10-TDTE-High-0.2-Mean-abs-bald
  --gpus $GPUS
  --split_dir ../natural-instructions/splits/TLAL_Exp0_all_10
  --max_iter 5
  --fix_cls_gen_ratio 0.356
  --base_script my_scripts/TLAL/TLAL_base_script_v4.sh
  --perturb_num 10
  • Here we explain what each argument does:
  • --AL_type: Specify the task selection strategy.
    • *Random*: Random Sampling if the string contains the keyword 'Random'.
    • HighPerplexity: Select tasks with high perplexity if the string is exactly 'HighPerplexity'.
    • LowPerplexity: Select tasks with low perplexity if the string is exactly 'LowPerplexity'.
    • *PI-NL-I10-TDTE-High-0.2-Mean-abs-bald: Select tasks based on prompt uncertainty. The AL_type here defines the active IT configuration, so will need to follow a specific pattern:
      • PI-: Need to be specified to use prompt uncertainty. Can add arbitrary prefix in the name. In this example, we add FCGRatio, which doesn't affect the pipeline config.
      • NL: If specified, it means 'No Labels' that we don't use the label for prompt uncertainty. Can also specify HL which means 'Have Labels', which will use the data ground truth as the original prediction during prompt uncertainty measuring.
      • I10: Specify the instance number to use for prompt uncertainty measuring. For this example, will sample 10 examples from a task and measure prompt uncertainty, and then average it.
      • TDTE: Specify the prompt perturbation target. The choices are [TD, TE, TDTE], which TD means that we perturb the task definition, and TE means that we perturb the provided examples, and TDTE to perturb both.
      • High: Specify what to select after the getting the prompt uncertainty. [High, Low] means to select task with high/low prompt uncertainty, respectively.
      • 0.2: Specify the prompt perturbation rate, should be a number between [0,1]. Here we specify 0.2, which will assign a 0.2 dropping probability for each token.
      • Mean-abs-bald: Simply need to specify this at the end. This will specify the prompt uncertainty measuring method described in our paper.
    • For all the arguments, please make sure to specify them in the --AL_Type following the above format.

Evaluation

Getting Pipeline Curve

To get the pipeline results, refer to ActiveIT/ActiveIT_README.ipynb. You should get something like this:

Eval Single Model

The following script evaluates the model with task definition + 2 positive examples as instructions:

./scripts/eval_tk_instruct.sh

Task Map Visualization

To visualize task map, refer to ActiveIT/ActiveIT_README.ipynb. You should get something like this:

The task plotted on this task map with Prediciton Probability and Prompt Uncertainty measured by the model with first task selection method. In this example, it is measured from FCGRatioPI-NL-I10-TDTE-High-0.2-Mean-abs-bald. Note that the uncertainty scores measured from different models(or the same models with different random seed) can be different due to randomness. Also, when plotting task map for multiple ActiveIT iterations(total_run > 1), please make sure only specify one uncertainty method, since the task pool can be different after iteration 2 for each task selection method(the task selected at iter 1 can be different).

Citation

@article{kung2023active,
  title={Active Instruction Tuning: Improving Cross-Task Generalization by Training on Prompt Sensitive Tasks},
  author={Kung, Po-Nien and Yin, Fan and Wu, Di and Chang, Kai-Wei and Peng, Nanyun},
  journal={arXiv preprint arXiv:2311.00288},
  year={2023}
}

active-it's People

Contributors

billkunghappy avatar

Stargazers

ZikaiXiao avatar Jiawei Liu avatar Wenbo Hu avatar  avatar Yingfei(Jeremy) Xiang avatar Sun avatar bigboy avatar  avatar zen avatar Wanru Zhao avatar  avatar Shamy Ji avatar  avatar Kai-Wei Chang avatar Renat Zayashnikov avatar Sun Hao avatar WEI HAORAN avatar Rongxiang Weng avatar  avatar

Watchers

 avatar  avatar I-Hung (Alex) Hsu avatar Mingyu Derek Ma avatar

Forkers

superxiang

active-it's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.