Giter Club home page Giter Club logo

codeattack's Introduction

CodeAttack 🧑‍💻🐞

A novel framework CodeAttack to systematically investigate the safety vulnerability issues of LLMs in the domain of code.

RESEARCH USE ONLY✅ NO MISUSE❌

LOVE💗 and Peace🌊

🆙 Updates

  • An enhanced version of CodeAttack, highly effective against the latest GPT-4 and Claude-3 series models, will be available next week!

👉 Paper

For more details, please refer to our paper ACL 2024.

🛠️ Usage

✨An example run:

python3 main.py --num_sample 3 \
--prompt-type code_python \ 
--target-model=gpt-4-1106-preview \
--judge-model=gpt-4-1106-preview \
--exp_name=python_stack_full \
--data_key=code_wrapped_plain_attack \
--target-max-n-tokens=1000 \
--judge \
--multi-thread \
--temperature 0 \
--start_idx 0 --end_idx -1

Experiments

  1. The 'data' folder contains the CodeAttack datasets curated using AdvBench. There are three versions of CodeAttack, each utilizing different input encoding ways: data_python_string_full.json, data_python_list_full.json, and data_python_stack_full.json.
  2. We provide templates for our CodeAttack in both C++ and Go, named code_C_string.txt and code_go_string.txt, respectively.
  3. Given the limited capability of Llama-2-7B-chat to follow instructions within the code domain, we simplify our CodeAttack in the form of code_python_list_llama.txt for evaluation purposes. We also present manual evaluation results of Llama-2-7B-chat in our paper, instead of using GPT-4 as the evaluator.

💡Framework

Logo

Overview of our CodeAttack. CodeAttack constructs a code template with three steps: (1) Input encoding which encodes the harmful text-based query with common data structures; (2) Task understanding which applies a decode() function to allow LLMs to extract the target task from various kinds of inputs; (3) Output specification which enables LLMs to fill the output structure with the user’s desired content.

Citation

If you find our paper&tool interesting and useful, please feel free to give us a star and cite us through:

@inproceedings{
Ren2024codeattack,
title={Exploring Safety Generalization Challenges of Large Language Models via Code},
author={Qibing Ren and Chang Gao and Jing Shao and Junchi Yan and Xin Tan and Wai Lam and Lizhuang Ma},
booktitle={The 62nd Annual Meeting of the Association for Computational Linguistics},
year={2024},
url={https://arxiv.org/abs/2403.07865}
}

codeattack's People

Contributors

renqibing avatar

Stargazers

 avatar Jiaying Li avatar WendyZhang avatar  avatar Bersekas Tully avatar Chang Gao avatar loopchen avatar Zeming Wei avatar Zhanpeng Zhou avatar Xiaosen Zheng avatar Youliang Yuan avatar Chen Qian avatar  avatar

Watchers

Kostas Georgiou avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.