sherdencooper / gptfuzz Goto Github PK
View Code? Open in Web Editor NEWOfficial repo for GPTFUZZER : Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts
License: MIT License
Official repo for GPTFUZZER : Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts
License: MIT License
The paper shows the following formulas with the term + 1
whereas in the code we always seem to do + 0.01
here
GPTFuzz/gptfuzzer/fuzzer/selection.py
Line 103 in 745d5fc
GPTFuzz/gptfuzzer/fuzzer/selection.py
Line 115 in 745d5fc
Am I missing something or is there a particular reason for this difference?
CC @gseetha04 who discovered this.
Hi could you please have a look at the codebase and try to run the scripts from scratch? It seems like there are multiple errors and dependencies missing.
FileNotFoundError: [Errno 2] No such file or directory: './datasets/prompts_generated/multi_single/multi_single_chatglm2-6b_random.csv'
Out of curiosity, did you try experimenting with alpha and beta before arriving at the values used in this repo? It doesn't seem to be mentioned in the paper.
CC @gseetha04
Thank you!
Thanks for making the code public available. I am trying to understand codebase to see how GPTFuzzer interact with target LLM models. The paper shows some attack results on commercial LLMs like Bard and Claude2. However, I didn't find any code attacking Bard/Claude2/PaLM2 in the current repo. It is understandable since authors already explained in the paper: "we did not have the API accesses to some commercial models. Therefore, we conducted attacks via web inference for Claude2, PaLM2, and Bard"
The code below shows that currently only OpenAI and open-source models are supported.
GPTFuzz/fuzz_single_question_single_model.py
Lines 96 to 98 in 0cb85c0
GPTFuzz/llm_utils/creat_model.py
Lines 21 to 25 in 0cb85c0
I try to locate the code to interact with LLM and it seems that OpenAI models are called through function openai_request
, while open-source models are locally inferenced.
Lines 417 to 425 in 0cb85c0
But it seems that openai_request
hardcodes model='gpt-3.5-turbo'
and MODEL_TARGET
is never used. So I think the current code will always use 'gpt-3.5-turbo' no matter which target_model
is specified. If it's indeed a bug, then a possible fix would be passing an argument to specify model when calling openai.ChatCompletion.create
.
Lines 327 to 340 in 0cb85c0
I wonder how to fuzz close sourced LLMs with API available. If model can be specified by user, then it would be possible to fuzz any close sourced LLMs served with OpenAI-compatible API by setting OPENAI_API_BASE
env.
The paper says introduce five specialized mutation operators, but only four were introduced: Crossover, Expand, Shorten, Rephrase. The Generate was left behind.
There have been many incompatibility issues. My CUDA version is 12.1. Follow your steps to report various errors.
If I want to use the roberta model as an evaluator, what should I input to the model.
The response from LLM only or Q&A together.
如题
As the title suggests, which conference is the author currently submitting to? I feel that the author's writing is quite good.
In the second code block, questions_set = pd.read_csv(seed_path)['question_path'].tolist()
seems to be wrong.
Maybe questions_set = pd.read_csv(path_path)['text'].tolist()
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.