Giter Club home page Giter Club logo

advprompter's Introduction

AdvPrompter: Fast Adaptive Adversarial Prompting for LLMs

This repo is an official implementation of AdvPrompter (arxiv:2404.16873).

Please 🌟star🌟 this repo and cite our paper πŸ“œ if you like (and/or use) our work, thank you!

0. Installation

  • Option 1. Use Singularity to install the advprompter.def container
  • Option 2. Install python3.11 and requirements.txt manually:
  conda create -n advprompter python=3.11.4
  conda activate advprompter
  pip install -r requirements.txt

1. Running AdvPrompter

We use hydra as a configuration management tool. Main config files: ./conf/{train,eval,eval_suffix_dataset,base}.yaml The AdvPrompter and the TargetLLM are specified in conf/base.yaml, various options are already implemented.

The codebase optionally supports wandb by setting the corresponding options in conf/base.yaml.

Note on hardware specifications: all the experiments we conducted utilized two NVIDIA A100 GPUs, one for AdvPrompter and one for TargetLLM. You can manage devices in conf/target_llm/base_target_llm.yaml and conf/prompter/base_prompter.yaml

1.1 Evaluation

Run

python3 main.py --config-name=eval

to test the performance of the specified AdvPrompter against the TargetLLM on a given dataset. You'll have to specify TargetLLM and AdvPrompter in conf/base.yaml. Also, you may want to specify a path to peft_checkpoint if AdvPrompter was finetuned before:

// see conf/prompter/llama2.yaml
lora_params:
warmstart: true
lora_checkpoint: "path_to_peft_checkpoint"

The suffixes generated during evaluation are saved to a new dataset under the run-directory in ./exp/.../suffix_dataset for later use. Such a dataset can also be useful for evaluating baselines or hand-crafted suffixes against a TargetLLM, and it can be evaluated by running

python3 main.py --config-name=eval_suffix_dataset

after populating the suffix_dataset_pth_dct in eval_suffix_dataset.yaml

1.2. Training

Run

python3 main.py --config-name=train

to train the specified AdvPrompter against the TargetLLM. It automatically performs the evaulation specified above in regular intervals, and it also saves intermediate versions of the AdvPrompter to the run-directory under ./exp/.../checkpoints for later warmstart. Checkpoint can be specified with the lora_checkpoint parameter in the model configs (as illustrated in 1.1 Evaluation). Training also saves for each epoch the target suffixes generated with AdvPrompterOpt to ./exp/.../suffix_opt_dataset. This allows pretraining on such a dataset of suffixes by specifying the corresponding path under pretrain in train.yaml

Some important hyperparameters to consider in conf/train.yaml: [epochs, lr, top_k, num_chunks, lambda_val]

Examples

Note: you may want to replace target_llm.llm_params.checkpoint with a local path.

  • Example 1: AdvPrompter on Vicuna-7B:

     python3 main.py --config-name=train target_llm=vicuna_chat target_llm.llm_params.model_name=vicuna-7b-v1.5
  • Example 2: AdvPrompter on Vicuna-13B:

     python3 main.py --config-name=train target_llm=vicuna_chat target_llm.llm_params.model_name=vicuna-13b-v1.5 target_llm.llm_params.checkpoint=lmsys/vicuna-13b-v1.5 train.q_params.num_chunks=2
  • Example 3: AdvPrompter on Mistral-7B-chat:

     python3 main.py --config-name=train target_llm=mistral_chat
  • Example 4: AdvPrompter on Llama2-7B-chat:

     python3 main.py --config-name=train target_llm=llama2_chat train.q_params.lambda_val=150

2. Contributors

Anselm Paulus*, Arman Zharmagambetov*, Chuan Guo, Brandon Amos**, Yuandong Tian**

(* = Equal 1st authors, ** = Equal advising)

3. License

Our source code is under CC-BY-NC 4.0 license.

advprompter's People

Contributors

arman-z avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.