sherdencooper / gptfuzz Goto Github PK

View Code? Open in Web Editor NEW

319.0 319.0 40.0 3.82 MB

Official repo for GPTFUZZER : Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts

License: MIT License

Python 75.50% Shell 5.32% Jupyter Notebook 19.19%

gptfuzz's People

Contributors

Stargazers

Watchers

gptfuzz's Issues

Mismatch in formula between paper and code

The paper shows the following formulas with the term + 1

whereas in the code we always seem to do + 0.01

here

GPTFuzz/gptfuzzer/fuzzer/selection.py

Line 103 in 745d5fc

(pn.visited_num + 0.01))

and here

GPTFuzz/gptfuzzer/fuzzer/selection.py

Line 115 in 745d5fc

(pn.visited_num + 0.01))

Am I missing something or is there a particular reason for this difference?

CC @gseetha04 who discovered this.

Unable to run the scripts

Hi could you please have a look at the codebase and try to run the scripts from scratch? It seems like there are multiple errors and dependencies missing.

cuda out of memory

HI team,
I try to run the example code on A100(40Gb) but it shwos

I wonder how to fix it?

关于CUDA的一些问题

您好，安装cuda需要什么版本呢？请问您安装cuda时安装VS了吗，我在使用pip安装时遇到NameError: name 'nvcc_cuda_version' is not defined，不知道和这个有没有关系。还有，通过克隆仓库方式安装vllm，只挂代理就行吗，我挂代理遇到这种问题

How to get multi_single_chatglm2-6b_random.csv

FileNotFoundError: [Errno 2] No such file or directory: './datasets/prompts_generated/multi_single/multi_single_chatglm2-6b_random.csv'

Experimentation with alpha and beta

Out of curiosity, did you try experimenting with alpha and beta before arriving at the values used in this repo? It doesn't seem to be mentioned in the paper.

CC @gseetha04

Thank you!

How to fuzz closed source LLMs and possible bug when calling OpenAI model

Thanks for making the code public available. I am trying to understand codebase to see how GPTFuzzer interact with target LLM models. The paper shows some attack results on commercial LLMs like Bard and Claude2. However, I didn't find any code attacking Bard/Claude2/PaLM2 in the current repo. It is understandable since authors already explained in the paper: "we did not have the API accesses to some commercial models. Therefore, we conducted attacks via web inference for Claude2, PaLM2, and Bard"

The code below shows that currently only OpenAI and open-source models are supported.

GPTFuzz/fuzz_single_question_single_model.py

Lines 96 to 98 in 0cb85c0

 args_target.model_path = args.target_model 

 args_target.temperature = 0.01 #some models need to have strict positive temperature 

 MODEL_TARGET, TOK_TARGET = prepare_model_and_tok(args_target)

GPTFuzz/llm_utils/creat_model.py

Lines 21 to 25 in 0cb85c0

 def create_model_and_tok(args, model_path): 

 # Note that 'moderation' is only used for classification and cannot be used for generation  

 openai_model_list = ['gpt-3.5-turbo-0613', 'gpt-3.5-turbo', 'gpt-3.5-turbo-0301', 'gpt-4-0613', 'gpt-4', 'gpt-4-0301', 'moderation'] 

 open_sourced_model_list = ['lmsys/vicuna-7b-v1.3', 'lmsys/vicuna-33b-v1.3', 'meta-llama/Llama-2-7b-chat-hf', 'lmsys/vicuna-13b-v1.3', 'THUDM/chatglm2-6b', 'meta-llama/Llama-2-13b-chat-hf', 'meta-llama/Llama-2-70b-chat-hf','baichuan-inc/Baichuan-13B-Chat'] 

 supported_model_list = openai_model_list + open_sourced_model_list

I try to locate the code to interact with LLM and it seems that OpenAI models are called through function openai_request, while open-source models are locally inferenced.

GPTFuzz/fuzz_utils.py

Lines 417 to 425 in 0cb85c0

 if TOK_TARGET == None: #openai model 

 with concurrent.futures.ThreadPoolExecutor() as executor: 

 futures = {executor.submit(openai_request, prompt): prompt for prompt in inputs} 

 for future in concurrent.futures.as_completed(futures): 

 try: 

 data.append(future.result()['choices'][0]['message']['content']) 

 except: 

 data.append(future.result())

But it seems that openai_request hardcodes model='gpt-3.5-turbo' and MODEL_TARGET is never used. So I think the current code will always use 'gpt-3.5-turbo' no matter which target_model is specified. If it's indeed a bug, then a possible fix would be passing an argument to specify model when calling openai.ChatCompletion.create.

GPTFuzz/fuzz_utils.py

Lines 327 to 340 in 0cb85c0

 def openai_request(prompt, temperature=0, n=1): 

 response = "Sorry, I cannot help with this request. The system is busy now." 

 max_trial = 50 

 for i in range(max_trial): 

 try: 

 response = openai.ChatCompletion.create( 

 model='gpt-3.5-turbo', 

 messages=[ 

 {"role": "system", "content": "You are a helpful assistant."}, 

 {"role": "user", "content": prompt}, 

 ], 

 temperature=temperature, 

 n = n, 

 )

I wonder how to fuzz close sourced LLMs with API available. If model can be specified by user, then it would be possible to fuzz any close sourced LLMs served with OpenAI-compatible API by setting OPENAI_API_BASE env.

Correcting errors in papers

The paper says introduce five specialized mutation operators, but only four were introduced: Crossover, Expand, Shorten, Rephrase. The Generate was left behind.

Can you package your dependency package?

There have been many incompatibility issues. My CUDA version is 12.1. Follow your steps to report various errors.

vllm could not be used because of CUDA kernal

hi, I got a problem when I was trying to use vllm.

nt PyTorch version is :

and my gpu machine is P100, Nvidia-driveris 470.141. could you please check this problem? thx

	args_target.model_path = args.target_model
	args_target.temperature = 0.01 #some models need to have strict positive temperature
	MODEL_TARGET, TOK_TARGET = prepare_model_and_tok(args_target)

	def create_model_and_tok(args, model_path):
	# Note that 'moderation' is only used for classification and cannot be used for generation
	openai_model_list = ['gpt-3.5-turbo-0613', 'gpt-3.5-turbo', 'gpt-3.5-turbo-0301', 'gpt-4-0613', 'gpt-4', 'gpt-4-0301', 'moderation']
	open_sourced_model_list = ['lmsys/vicuna-7b-v1.3', 'lmsys/vicuna-33b-v1.3', 'meta-llama/Llama-2-7b-chat-hf', 'lmsys/vicuna-13b-v1.3', 'THUDM/chatglm2-6b', 'meta-llama/Llama-2-13b-chat-hf', 'meta-llama/Llama-2-70b-chat-hf','baichuan-inc/Baichuan-13B-Chat']
	supported_model_list = openai_model_list + open_sourced_model_list

	if TOK_TARGET == None: #openai model
	with concurrent.futures.ThreadPoolExecutor() as executor:
	futures = {executor.submit(openai_request, prompt): prompt for prompt in inputs}

	for future in concurrent.futures.as_completed(futures):
	try:
	data.append(future.result()['choices'][0]['message']['content'])
	except:
	data.append(future.result())

	def openai_request(prompt, temperature=0, n=1):
	response = "Sorry, I cannot help with this request. The system is busy now."
	max_trial = 50
	for i in range(max_trial):
	try:
	response = openai.ChatCompletion.create(
	model='gpt-3.5-turbo',
	messages=[
	{"role": "system", "content": "You are a helpful assistant."},
	{"role": "user", "content": prompt},
	],
	temperature=temperature,
	n = n,
	)

sherdencooper / gptfuzz Goto Github PK

gptfuzz's People

Contributors

Stargazers

Watchers

Forkers

gptfuzz's Issues

Recommend Projects

Recommend Topics

Recommend Org