===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
/home/ubuntu/.local/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/cuda/
lib64')}
warn(msg)
/home/ubuntu/.local/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: No libcudart.so found! Install CUDA or the cudatoolkit package (anaconda)!
warn(msg)
CUDA SETUP: Highest compute capability among GPUs detected: 7.0
CUDA SETUP: Detected CUDA version 117
/home/ubuntu/.local/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: Compute capability < 7.5 detected! Only slow 8-bit matmul is supported for your GPU!
warn(msg)
CUDA SETUP: Loading binary /home/ubuntu/.local/lib/python3.8/site-packages/bitsandbytes/libbitsandbytes_cpu.so...
/home/ubuntu/.local/lib/python3.8/site-packages/bitsandbytes/cextension.py:31: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers and GPU quantization are unavai
lable.
warn("The installed version of bitsandbytes was compiled without GPU support. "
No compiled kernel found.
Compiling kernels : /home/ubuntu/.cache/huggingface/modules/transformers_modules/chatglm-6b-int8/quantization_kernels_parallel.c
Compiling gcc -O3 -fPIC -pthread -fopenmp -std=c99 /home/ubuntu/.cache/huggingface/modules/transformers_modules/chatglm-6b-int8/quantization_kernels_parallel.c -shared -o /home/ubuntu/.cache/huggingface/modules/t
ransformers_modules/chatglm-6b-int8/quantization_kernels_parallel.so
Load kernel : /home/ubuntu/.cache/huggingface/modules/transformers_modules/chatglm-6b-int8/quantization_kernels_parallel.so
Setting CPU quantization kernel threads to 5
Using quantization cache
Applying quantization to glm layers
Traceback (most recent call last):
File "finetune.py", line 137, in
main()
File "finetune.py", line 90, in main
model = AutoModel.from_pretrained(
File "/home/ubuntu/.local/lib/python3.8/site-packages/transformers/models/auto/auto_factory.py", line 479, in from_pretrained
return model_class.from_pretrained(
File "/home/ubuntu/.local/lib/python3.8/site-packages/transformers/modeling_utils.py", line 2675, in from_pretrained
model = cls(config, *model_args, **model_kwargs)
File "/home/ubuntu/.cache/huggingface/modules/transformers_modules/chatglm-6b-int8/modeling_chatglm.py", line 1061, in init
self.quantize(self.config.quantization_bit, self.config.quantization_embeddings, use_quantization_cache=True, empty_init=True)
File "/home/ubuntu/.cache/huggingface/modules/transformers_modules/chatglm-6b-int8/modeling_chatglm.py", line 1439, in quantize
self.transformer = quantize(self.transformer, bits, use_quantization_cache=use_quantization_cache, empty_init=empty_init, **kwargs)
File "/home/ubuntu/.cache/huggingface/modules/transformers_modules/chatglm-6b-int8/quantization.py", line 501, in quantize
layer.attention.query_key_value = QuantizedLinearWithPara(
File "/home/ubuntu/.cache/huggingface/modules/transformers_modules/chatglm-6b-int8/quantization.py", line 374, in init
self.weight = Parameter(self.weight.to(kwargs["device"]), requires_grad=False)
File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1632, in setattr
self.register_parameter(name, value)
File "/home/ubuntu/.local/lib/python3.8/site-packages/accelerate/big_modeling.py", line 108, in register_empty_parameter
module._parameters[name] = param_cls(module._parameters[name].to(device), **kwargs)
File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/nn/parameter.py", line 36, in new
return torch.Tensor._make_subclass(cls, data, requires_grad)
RuntimeError: Only Tensors of floating point and complex dtype can require gradients
环境: ubuntu
GPU: [Tesla V100 SXM2 32GB] GPU是 32G的
config.json
{
"_name_or_path": "THUDM/chatglm-6b-int8",
"architectures": [
"ChatGLMModel"
],
"auto_map": {
"AutoConfig": "configuration_chatglm.ChatGLMConfig",
"AutoModel": "modeling_chatglm.ChatGLMForConditionalGeneration",
"AutoModelForSeq2SeqLM": "modeling_chatglm.ChatGLMForConditionalGeneration"
},
"bos_token_id": 130004,
"eos_token_id": 130005,
"gmask_token_id": 130001,
"hidden_size": 4096,
"inner_hidden_size": 16384,
"layernorm_epsilon": 1e-05,
"mask_token_id": 130000,
"max_sequence_length": 2048,
"model_type": "chatglm",
"num_attention_heads": 32,
"num_layers": 28,
"pad_token_id": 3,
"position_encoding_2d": true,
"quantization_bit": 0,
"quantization_embeddings": false,
"torch_dtype": "float16",
"transformers_version": "4.27.1",
"use_cache": true,
"vocab_size": 130528
}