Giter Club home page Giter Club logo

chatglm-tuning's People

Contributors

kinghuin avatar mymusise avatar oedosoldier avatar ypwhs avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

chatglm-tuning's Issues

在colab上报错ModuleNotFoundError: No module named 'modeling_chatglm'


ModuleNotFoundError Traceback (most recent call last)
in
1 from transformers import AutoTokenizer, AutoModel, TrainingArguments, AutoConfig
----> 2 from modeling_chatglm import ChatGLMForConditionalGeneration
3 import torch
4 import torch.nn as nn
5 from peft import get_peft_model, LoraConfig, TaskType

ModuleNotFoundError: No module named 'modeling_chatglm'


NOTE: If your import is failing due to a missing package, you can
manually install dependencies using either !pip or !apt.

To view examples of installing some common dependencies, click the
"Open Examples" button below.

问答数据集

诚心请教一下,如果我想用问答数据集进行微调,我大概需要修改哪些地方呢?

finetune指定--per_device_train_batch_size 大于1时报错

Traceback (most recent call last):
File "finetune.py", line 93, in
main()
File "finetune.py", line 85, in main
trainer.train()
File "/opt/conda/envs/alpa/lib/python3.8/site-packages/transformers/trainer.py", line 1633, in train
return inner_training_loop(
File "/opt/conda/envs/alpa/lib/python3.8/site-packages/transformers/trainer.py", line 1872, in _inner_training_loop
for step, inputs in enumerate(epoch_iterator):
File "/opt/conda/envs/alpa/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 628, in next
data = self._next_data()
File "/opt/conda/envs/alpa/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 671, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "/opt/conda/envs/alpa/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 61, in fetch
return self.collate_fn(data)
File "finetune.py", line 35, in data_collator
"input_ids": torch.stack([
RuntimeError: stack expects each tensor to be equal size, but got [51] at entry 0 and [55] at entry 1

infer的时候使用int8报错

infer的时候(load_in_8bit=True):
expected scalar type Float but found Half

另外我怎么用int8和int4呢,我只有1080ti,显存不够;
多谢多谢!

环境配置问题

CUDA必须大于11.6吗?
我的环境是CUDA11.2,微调的时候报错:ImportError: cannot import name 'skip_init' from 'torch.nn.utils'
skip_init函数是不是只有在torch 2.0上才能用?
哪位大佬帮忙给解答一下?十分感谢!!!

加载模型该用哪个pt文件

如果是在训练过程中的中间文件,并没有chatglm-lora.pt文件,那么在加载模型的时候,需要加载哪一个文件呢?

我去掉了lora部分,在原始结构finetune,总报错ValueError: Attempting to unscale FP16 gradients.

我查了一下代码,是在这个地方 allow_fp16 无法被设置成true,设置了就能通过,我应该在什么地方配置,有大神指导吗?

│ /home/ubuntu/venv/lib/python3.8/site-packages/torch/cuda/amp/grad_scaler.py:285 in unscale_ │
│ │
│ 282 │ │ inv_scale = self._scale.double().reciprocal().float() │
│ 283 │ │ found_inf = torch.full((1,), 0.0, dtype=torch.float32, device=self._scale.device │
│ 284 │ │ │
│ ❱ 285 │ │ optimizer_state["found_inf_per_device"] = self.unscale_grads(optimizer, inv_sc │
│ 286 │ │ optimizer_state["stage"] = OptState.UNSCALED │
│ 287 │ │
│ 288 │ def _maybe_opt_step(self, optimizer, optimizer_state, *args, **kwargs): │
│ │
│ /home/ubuntu/venv/lib/python3.8/site-packages/torch/cuda/amp/grad_scaler.py:213 in │
unscale_grads
│ │
│ 210 │ │ │ │ │ │ continue │
│ 211 │ │ │ │ │ #allow_fp16 = True │
│ 212 │ │ │ │ │ if (not allow_fp16) and param.grad.dtype == torch.float16: │
│ ❱ 213 │ │ │ │ │ │ raise ValueError("Attempting to unscale FP16 gradients.") │
│ 214 │ │ │ │ │ if param.grad.is_sparse:

如何在上一次的结果上继续训练

1、上一次训练没有收敛好,想在上次的最佳checkpoint上继续训练
2、数据发生了变化,新的微调量变得更小,想在之前的checkpoint上继续fine-tune
通过指定--overwrite_output_dir True, resume_from_checkpoint=True不能奏效,loss会恢复成很大的值(但是确实是从上一个epoch开始迭代并接续了之前结果的学习率),请教下应该如何调整?

bug: _IncompatibleKeys(missing_keys:.......................

感谢作者大佬的伟大工作。
我在infer时遇到个bug,不知道是我哪一步操作有问题,请大佬们指正。

我用 https://github.com/mymusise/ChatGLM-Tuning/blob/master/finetune.py 训出lora,在 https://github.com/mymusise/ChatGLM-Tuning/blob/master/infer.ipynb 加载lora文件,出现提示如下
image
不知道是lora训练的问题,还是infer加载的有问题??

训练代码如下:
CUDA_VISIBLE_DEVICES=2 python finetune.py
--dataset_path data/need_demo
--lora_rank 8
--per_device_train_batch_size 4
--gradient_accumulation_steps 1
--max_steps 50000
--save_steps 10000
--save_total_limit 2
--learning_rate 2e-5
--fp16
--logging_steps 50
--output_dir output

infer代码如下:
1679385364624

无法正确生成 eos token

模型在训练结束后 Inference 无法正确生成 eos token,之前看有 issue 提过这个问题,但是关闭了

infer报错

Traceback (most recent call last):
File "infer.py", line 27, in
model = get_peft_model(model, peft_config)
File "/root/miniconda3/envs/torch_1.13/lib/python3.8/site-packages/peft/mapping.py", line 143, in get_peft_model
peft_config = _prepare_lora_config(peft_config, model_config)
File "/root/miniconda3/envs/torch_1.13/lib/python3.8/site-packages/peft/mapping.py", line 118, in _prepare_lora_config
raise ValueError("Please specify target_modules in peft_config")
ValueError: Please specify target_modules in peft_config

3090训练时报显存溢出

File "L:\PycharmProjects\chatglm\venv\lib\site-packages\bitsandbytes\functional.py", line 361, in get_transform_buffer
return init_func((rows, cols), dtype=dtype, device=device), state
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 64.00 MiB (GPU 0; 24.00 GiB total capacity; 22.76 GiB already allocated; 0 bytes free; 23.20 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
0%| | 0/52000 [00:09<?, ?it/s]

大概5小时可以训练完,但是loss一直是0,是正常的吗

大概5小时可以训练完,但是loss一直是0,是正常的吗

{'loss': 0.0, 'learning_rate': 1.9230769230769234e-07, 'epoch': 0.99}
{'loss': 0.0, 'learning_rate': 1.730769230769231e-07, 'epoch': 0.99}
{'loss': 0.0, 'learning_rate': 1.5384615384615387e-07, 'epoch': 0.99}
{'loss': 0.0, 'learning_rate': 1.3461538461538464e-07, 'epoch': 0.99}
{'loss': 0.0, 'learning_rate': 1.153846153846154e-07, 'epoch': 0.99}
{'loss': 0.0, 'learning_rate': 9.615384615384617e-08, 'epoch': 1.0}
{'loss': 0.0, 'learning_rate': 7.692307692307694e-08, 'epoch': 1.0}
{'loss': 0.0, 'learning_rate': 5.76923076923077e-08, 'epoch': 1.0}
{'loss': 0.0, 'learning_rate': 3.846153846153847e-08, 'epoch': 1.0}
{'loss': 0.0, 'learning_rate': 1.9230769230769234e-08, 'epoch': 1.0}
{'loss': 0.0, 'learning_rate': 0.0, 'epoch': 1.0}

@mymusise

RuntimeError: GET was unable to find an engine to execute this computation

Traceback (most recent call last):
File "/home/guangzhao/ChatGLM-Tuning-master/finetune.py", line 162, in
main()
File "/home/guangzhao/ChatGLM-Tuning-master/finetune.py", line 153, in main
trainer.train()
File "/home/guangzhao/anaconda3/envs/chatglm/lib/python3.9/site-packages/transformers/trainer.py", line 1633, in train
return inner_training_loop(
File "/home/guangzhao/anaconda3/envs/chatglm/lib/python3.9/site-packages/transformers/trainer.py", line 1902, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/home/guangzhao/anaconda3/envs/chatglm/lib/python3.9/site-packages/transformers/trainer.py", line 2645, in training_step
loss = self.compute_loss(model, inputs)
File "/home/guangzhao/ChatGLM-Tuning-master/finetune.py", line 100, in compute_loss
return model(
File "/home/guangzhao/anaconda3/envs/chatglm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/guangzhao/anaconda3/envs/chatglm/lib/python3.9/site-packages/peft/peft_model.py", line 529, in forward
return self.base_model(
File "/home/guangzhao/anaconda3/envs/chatglm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/guangzhao/anaconda3/envs/chatglm/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/guangzhao/ChatGLM-Tuning-master/modeling_chatglm.py", line 1033, in forward
transformer_outputs = self.transformer(
File "/home/guangzhao/anaconda3/envs/chatglm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/guangzhao/anaconda3/envs/chatglm/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/guangzhao/ChatGLM-Tuning-master/modeling_chatglm.py", line 878, in forward
layer_ret = layer(
File "/home/guangzhao/anaconda3/envs/chatglm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/guangzhao/anaconda3/envs/chatglm/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/guangzhao/ChatGLM-Tuning-master/modeling_chatglm.py", line 573, in forward
attention_outputs = self.attention(
File "/home/guangzhao/anaconda3/envs/chatglm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/guangzhao/anaconda3/envs/chatglm/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/guangzhao/ChatGLM-Tuning-master/modeling_chatglm.py", line 398, in forward
mixed_raw_layer = self.query_key_value(hidden_states)
File "/home/guangzhao/anaconda3/envs/chatglm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/guangzhao/anaconda3/envs/chatglm/lib/python3.9/site-packages/peft/tuners/lora.py", line 613, in forward
after_B = self.lora_B(after_A.transpose(-2, -1)).transpose(-2, -1)
File "/home/guangzhao/anaconda3/envs/chatglm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/guangzhao/anaconda3/envs/chatglm/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 313, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/home/guangzhao/anaconda3/envs/chatglm/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 309, in _conv_forward
return F.conv1d(input, weight, bias, self.stride,
RuntimeError: GET was unable to find an engine to execute this computation

使用cuda11.2, 报错是因为cuda的原因吗?

预处理数据集时报错

Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
52002it [00:20, 2492.40it/s]
Traceback (most recent call last):
File "A:\ChatGLM-Tuning-master\tokenize_dataset_rows.py", line 45, in
main()
File "A:\ChatGLM-Tuning-master\tokenize_dataset_rows.py", line 38, in main
arr = np.array(all_tokenized)
Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

infer报错

Traceback (most recent call last):
File "infer.py", line 27, in
model = get_peft_model(model, peft_config)
File "/root/miniconda3/envs/torch_1.13/lib/python3.8/site-packages/peft/mapping.py", line 143, in get_peft_model
peft_config = _prepare_lora_config(peft_config, model_config)
File "/root/miniconda3/envs/torch_1.13/lib/python3.8/site-packages/peft/mapping.py", line 118, in _prepare_lora_config
raise ValueError("Please specify target_modules in peft_config")
ValueError: Please specify target_modules in peft_config

bs设置为1和3的时候,loss都是0, 是哪里有问题吗?

{'loss': 0.0, 'learning_rate': 1.9e-05, 'epoch': 0.15}
{'loss': 0.0, 'learning_rate': 1.8e-05, 'epoch': 0.3}
{'loss': 0.0, 'learning_rate': 1.7e-05, 'epoch': 0.45}
{'loss': 0.0, 'learning_rate': 1.6000000000000003e-05, 'epoch': 0.6}
{'loss': 0.0, 'learning_rate': 1.5000000000000002e-05, 'epoch': 0.75}
{'loss': 0.0, 'learning_rate': 1.4e-05, 'epoch': 0.9}

代码基本已经验证成功了,但会输出无关问题

@mymusise 首先非常感谢您的代码,我已经基本跑通。V100由于开不了int8推理,所以只设置batch_size=2。
我采用的数据的json格式如下:
{"context": "[Round 0]\n问:你的名字\n答:", "target": "我叫大山,是一个代表我的虚拟身份的名称。\n"}
即照搬chatglm的输入格式。

我采用了一个只有16个对话的极小数据集,经过我的测试,有一个小问题,就是虽然在finetune.py中您在label中多加入了一个eos_token_id,但针对训练集上的问题进行推理,不但有正确答案,还和其他问题和回答一起输出。

所以我又在tokenize_datasets_row.py第16行又加入一个eos_token_id,即:

input_ids = promopt_ids + target_ids + [tokenizer.eos_token_id]*2

输出结果就正常了,我也不知道为什么。可能原程序并没有预测出eos。

key error 'seq_len'

用最新的finetune代码,遇到错误:

dataset:

Dataset({ features: ['input_ids', 'seq_len'], num_rows: 52002 })

def data_collator(features: list) -> dict: len_ids = [len(feature["input_ids"]) for feature in features] longest = max(len_ids) + 1 input_ids = [] attention_mask_list = [] position_ids_list = [] labels_list = [] for ids_l, feature in sorted(zip(len_ids, features), key=lambda x: -x[0]): ids = feature["input_ids"] seq_len = feature["seq_len"] labels = ( [-100] * (seq_len - 1) + ids[(seq_len - 1) :] + [tokenizer.eos_token_id] + [-100] * (longest - ids_l - 1) )

File "finetune.py", line 71, in data_collator
seq_len = feature["seq_len"]
KeyError: 'seq_len'

ValueError: Please specify `target_modules` in `peft_config`

Traceback (most recent call last):
File "finetune.py", line 96, in
main()
File "finetune.py", line 76, in main
model = get_peft_model(model, peft_config)
File "/home/bocheng/softinstalled/anaconda3/envs/py38/lib/python3.8/site-packages/peft/mapping.py", line 142, in get_peft_model
peft_config = _prepare_lora_config(peft_config, model_config)
File "/home/bocheng/softinstalled/anaconda3/envs/py38/lib/python3.8/site-packages/peft/mapping.py", line 117, in _prepare_lora_config
raise ValueError("Please specify target_modules in peft_config")
ValueError: Please specify target_modules in peft_config

requirements问题

能不能pip list一下具体的torch版本、peft之类的,因为peft强制依赖torch>=1.13

低版本torch是不是不行?

运行finetune.py时报错

CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
C:\Users\Ge Yunxiang\AppData\Local\Programs\Python\Python310\lib\site-packages\bitsandbytes\cuda_setup\main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {WindowsPath('/usr/local/cuda/lib64')}
warn(msg)
CUDA SETUP: WARNING! libcuda.so not found! Do you have a CUDA driver installed? If you are on a cluster, make sure you are on a CUDA machine!
C:\Users\Ge Yunxiang\AppData\Local\Programs\Python\Python310\lib\site-packages\bitsandbytes\cuda_setup\main.py:136: UserWarning: WARNING: No libcudart.so found! Install CUDA or the cudatoolkit package (anaconda)!
warn(msg)
C:\Users\Ge Yunxiang\AppData\Local\Programs\Python\Python310\lib\site-packages\bitsandbytes\cuda_setup\main.py:136: UserWarning: WARNING: No GPU detected! Check your CUDA paths. Proceeding to load CPU-only library...
warn(msg)
CUDA SETUP: Loading binary C:\Users\Ge Yunxiang\AppData\Local\Programs\Python\Python310\lib\site-packages\bitsandbytes\libbitsandbytes_cpu.so...
argument of type 'WindowsPath' is not iterable
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
CUDA SETUP: WARNING! libcuda.so not found! Do you have a CUDA driver installed? If you are on a cluster, make sure you are on a CUDA machine!
CUDA SETUP: Loading binary C:\Users\Ge Yunxiang\AppData\Local\Programs\Python\Python310\lib\site-packages\bitsandbytes\libbitsandbytes_cpu.so...
argument of type 'WindowsPath' is not iterable
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
CUDA SETUP: WARNING! libcuda.so not found! Do you have a CUDA driver installed? If you are on a cluster, make sure you are on a CUDA machine!
CUDA SETUP: Loading binary C:\Users\Ge Yunxiang\AppData\Local\Programs\Python\Python310\lib\site-packages\bitsandbytes\libbitsandbytes_cpu.so...
argument of type 'WindowsPath' is not iterable
CUDA SETUP: Problem: The main issue seems to be that the main CUDA library was not detected.
CUDA SETUP: Solution 1): Your paths are probably not up-to-date. You can update them via: sudo ldconfig.
CUDA SETUP: Solution 2): If you do not have sudo rights, you can do the following:
CUDA SETUP: Solution 2a): Find the cuda library via: find / -name libcuda.so 2>/dev/null
CUDA SETUP: Solution 2b): Once the library is found add it to the LD_LIBRARY_PATH: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:FOUND_PATH_FROM_2a
CUDA SETUP: Solution 2c): For a permanent solution add the export from 2b into your .bashrc file, located at ~/.bashrc
Traceback (most recent call last):
File "A:\ChatGLM-Tuning-master\finetune.py", line 6, in
from peft import get_peft_model, LoraConfig, TaskType
File "C:\Users\Ge Yunxiang\AppData\Local\Programs\Python\Python310\lib\site-packages\peft_init_.py", line 22, in
from .mapping import MODEL_TYPE_TO_PEFT_MODEL_MAPPING, PEFT_TYPE_TO_CONFIG_MAPPING, get_peft_config, get_peft_model
File "C:\Users\Ge Yunxiang\AppData\Local\Programs\Python\Python310\lib\site-packages\peft\mapping.py", line 16, in
from .peft_model import (
File "C:\Users\Ge Yunxiang\AppData\Local\Programs\Python\Python310\lib\site-packages\peft\peft_model.py", line 31, in
from .tuners import LoraModel, PrefixEncoder, PromptEmbedding, PromptEncoder
File "C:\Users\Ge Yunxiang\AppData\Local\Programs\Python\Python310\lib\site-packages\peft\tuners_init_.py", line 20, in
from .lora import LoraConfig, LoraModel
File "C:\Users\Ge Yunxiang\AppData\Local\Programs\Python\Python310\lib\site-packages\peft\tuners\lora.py", line 36, in
import bitsandbytes as bnb
File "C:\Users\Ge Yunxiang\AppData\Local\Programs\Python\Python310\lib\site-packages\bitsandbytes_init_.py", line 7, in
from .autograd.functions import (
File "C:\Users\Ge Yunxiang\AppData\Local\Programs\Python\Python310\lib\site-packages\bitsandbytes\autograd_init
.py", line 1, in
from ._functions import undo_layout, get_inverse_transform_indices
File "C:\Users\Ge Yunxiang\AppData\Local\Programs\Python\Python310\lib\site-packages\bitsandbytes\autograd_functions.py", line 9, in
import bitsandbytes.functional as F
File "C:\Users\Ge Yunxiang\AppData\Local\Programs\Python\Python310\lib\site-packages\bitsandbytes\functional.py", line 17, in
from .cextension import COMPILED_WITH_CUDA, lib
File "C:\Users\Ge Yunxiang\AppData\Local\Programs\Python\Python310\lib\site-packages\bitsandbytes\cextension.py", line 22, in
raise RuntimeError('''
RuntimeError:
CUDA Setup failed despite GPU being available. Inspect the CUDA SETUP outputs above to fix your environment!
If you cannot find any issues and suspect a bug, please open an issue with detals about your environment:
https://github.com/TimDettmers/bitsandbytes/issues

关于微调代码的疑问

源代码如下:

class ModifiedTrainer(Trainer):

    def compute_loss(self, model, inputs, return_outputs=False):
        return model(
            input_ids=inputs["input_ids"],
            attention_mask=torch.ones_like(inputs["input_ids"]).bool(),
            labels=inputs["input_ids"],
        ).loss

疑问1:这里的attention mask不应该是下三角或者unilm那种吗?
疑问2:这里的labels不需要把一部分设置为-100吗?

数据预处理报错

使用的默认配置
Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.

52002it [00:27, 1911.56it/s]
Traceback (most recent call last):
File "/home/dspwasc/Public/chat/ChatGLM-Tuning/tokenize_dataset_rows.py", line 45, in
main()
File "/home/dspwasc/Public/chat/ChatGLM-Tuning/tokenize_dataset_rows.py", line 38, in main
arr = np.array(all_tokenized)
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (52002,) + inhomogeneous part.

必须finetune后infer?

[Errno 2] No such file or directory: 'output/chatglm-lora.pt'
必须finetune保存模型到output/chatglm-lora.pt后才能infer?

infer 时报错

│ 334 │ │ return _sentencepiece.SentencePieceProcessor__EncodeAsImmutableProtoBatch(self, │
│ 335 │ │
│ 336 │ def _DecodeIds(self, ids): │
│ ❱ 337 │ │ return _sentencepiece.SentencePieceProcessor__DecodeIds(self, ids) │
│ 338 │ │
│ 339 │ def _DecodePieces(self, pieces): │
│ 340 │ │ return _sentencepiece.SentencePieceProcessor__DecodePieces(self, pieces) │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
IndexError: Out of range: piece id is out of range.

用测试数据训练时bitsandbytes报的错,有大佬知道是什么回事吗

Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues

/opt/conda/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/nvidia/lib64'), PosixPath('/usr/local/nvidia/lib')}
warn(msg)
/opt/conda/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: /usr/local/nvidia/lib:/usr/local/nvidia/lib64 did not contain libcudart.so as expected! Searching further paths...
warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
/opt/conda/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/cuda/lib64')}
warn(msg)
/opt/conda/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: No libcudart.so found! Install CUDA or the cudatoolkit package (anaconda)!
warn(msg)
CUDA SETUP: Highest compute capability among GPUs detected: 7.5
CUDA SETUP: Detected CUDA version 116
CUDA SETUP: Loading binary /opt/conda/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so...
/opt/conda/lib/python3.10/site-packages/bitsandbytes/cextension.py:31: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers and GPU quantization are unavailable.
warn("The installed version of bitsandbytes was compiled without GPU support. "
Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
The argument trust_remote_code is to be used with Auto classes. It has no effect here and is ignored.
Overriding torch_dtype=None with torch_dtype=torch.float16 due to requirements of bitsandbytes to enable model loading in mixed int8. Either pass torch_dtype=torch.float16 or don't pass this argument at all to remove this warning.
Loading checkpoint shards: 0%| | 0/8 [00:01<?, ?it/s]
Traceback (most recent call last):
File "/home/python/finetune.py", line 162, in
main()
File "/home/python/finetune.py", line 121, in main
model = ChatGLMForConditionalGeneration.from_pretrained(
File "/opt/conda/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2646, in from_pretrained
) = cls._load_pretrained_model(
File "/opt/conda/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2969, in _load_pretrained_model
new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
File "/opt/conda/lib/python3.10/site-packages/transformers/modeling_utils.py", line 676, in _load_state_dict_into_meta_model
set_module_8bit_tensor_to_device(model, param_name, param_device, value=param)
File "/opt/conda/lib/python3.10/site-packages/transformers/utils/bitsandbytes.py", line 70, in set_module_8bit_tensor_to_device
new_value = bnb.nn.Int8Params(new_value, requires_grad=False, has_fp16_weights=has_fp16_weights).to(device)
File "/opt/conda/lib/python3.10/site-packages/bitsandbytes/nn/modules.py", line 196, in to
return self.cuda(device)
File "/opt/conda/lib/python3.10/site-packages/bitsandbytes/nn/modules.py", line 160, in cuda
CB, CBt, SCB, SCBt, coo_tensorB = bnb.functional.double_quant(B)
File "/opt/conda/lib/python3.10/site-packages/bitsandbytes/functional.py", line 1616, in double_quant
row_stats, col_stats, nnz_row_ptr = get_colrow_absmax(
File "/opt/conda/lib/python3.10/site-packages/bitsandbytes/functional.py", line 1505, in get_colrow_absmax
lib.cget_col_row_stats(ptrA, ptrRowStats, ptrColStats, ptrNnzrows, ct.c_float(threshold), rows, cols)
File "/opt/conda/lib/python3.10/ctypes/init.py", line 387, in getattr
func = self.getitem(name)
File "/opt/conda/lib/python3.10/ctypes/init.py", line 392, in getitem
func = self._FuncPtr((name_or_ordinal, self))
AttributeError: /opt/conda/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cget_col_row_stats

加载预训练报错

AttributeError: /root/anaconda3/envs/big_project/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cget_col_row_stats

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.