hiyouga / chatglm-efficient-tuning Goto Github PK

View Code? Open in Web Editor NEW

3.6K 32.0 458.0 198.18 MB

Fine-tuning ChatGLM-6B with PEFT | 基于 PEFT 的高效 ChatGLM 微调

License: Apache License 2.0

Python 100.00%

chatglm chatgpt fine-tuning lora alpaca peft huggingface language-model transformers pytorch

chatglm-efficient-tuning's Introduction

ChatGLM Efficient Tuning

Fine-tuning 🤖ChatGLM-6B model with 🤗PEFT.

👋 Join our WeChat.

[ English | 中文 ]

If you have any questions, please refer to our Wiki📄.

Notice

This repo will not be maintained in the future. Please follow LLaMA-Factory for fine-tuning the language models (including ChatGLM2-6B).

Changelog

[23/07/15] Now we develop an all-in-one Web UI for training, evaluation and inference. Try train_web.py to fine-tune ChatGLM-6B model in your Web browser. Thank @KanadeSiina and @codemayq for their efforts in the development.

[23/07/09] Now we release FastEdit⚡🩹, an easy-to-use package for editing the factual knowledge of large language models efficiently. Please follow FastEdit if you are interested.

[23/06/25] Now we align the demo API with the OpenAI's format where you can insert the fine-tuned model in arbitrary ChatGPT-based applications.

[23/06/25] Now we support fine-tuning the ChatGLM2-6B model with our framework!

[23/06/05] Now we support 4-bit LoRA training (aka QLoRA). Try --quantization_bit 4 argument to work with 4-bit quantized model. (experimental feature)

[23/06/01] We implemented a framework supporting the efficient tuning of LLaMA and BLOOM models. Please follow LLaMA-Efficient-Tuning if you are interested.

[23/05/19] Now we support using the development set to evaluate the model while training. Try --dev_ratio argument to specify the size of development set.

[23/04/29] Now we support training ChatGLM with Reinforcement Learning with Human Feedback (RLHF) ! We provide several examples to run RLHF training, please refer to the examples folder for details.

[23/04/20] Our repo achieved 100 stars within 12 days! Congratulations!

[23/04/19] Now we support merging the weights of fine-tuned models trained by LoRA! Try --checkpoint_dir checkpoint1,checkpoint2 argument for continually fine-tuning the models.

[23/04/18] Now we support training the quantized models using three fine-tuning methods! Try quantization_bit argument for training the model in 4/8 bits.

[23/04/12] Now we support training from checkpoints! Use --checkpoint_dir argument to specify the checkpoint model to fine-tune from.

[23/04/11] Now we support training with combined datasets! Try --dataset dataset1,dataset2 argument for training with multiple datasets.

Datasets

For supervised fine-tuning:
For reward modelling:

Please refer to data/README.md for details.

Some datasets require confirmation before using them, so we recommend logging in with your Hugging Face account using these commands.

pip install --upgrade huggingface_hub
huggingface-cli login

Fine-Tuning Methods

Our script now supports the following fine-tuning methods:

LoRA
- Fine-tuning the low-rank adapters of the model.
P-Tuning V2
- Fine-tuning the prefix encoder of the model.
Freeze
- Fine-tuning the MLPs in the last n blocks of the model.
Full Tuning
- Fine-tuning all the parameters of the model.

Requirement

Python 3.8+ and PyTorch 1.13.1+
🤗Transformers, Datasets, Accelerate, PEFT and TRL
fire, protobuf, cpm-kernels and sentencepiece
jieba, rouge-chinese and nltk (used at evaluation)
gradio and matplotlib (used in train_web.py)
uvicorn, fastapi and sse-starlette (used in api_demo.py)

And powerful GPUs!

Getting Started

Data Preparation (optional)

Please refer to data/example_dataset for checking the details about the format of dataset files. You can either use a single .json file or a dataset loading script with multiple files to create a custom dataset.

Note: please update data/dataset_info.json to use your custom dataset. About the format of this file, please refer to data/README.md.

Dependence Installation (optional)

git lfs install
git clone https://github.com/hiyouga/ChatGLM-Efficient-Tuning.git
conda create -n chatglm_etuning python=3.10
conda activate chatglm_etuning
cd ChatGLM-Efficient-Tuning
pip install -r requirements.txt

If you want to enable the quantized LoRA (QLoRA) on the Windows platform, you will be required to install a pre-built version of bitsandbytes library, which supports CUDA 11.1 to 12.1.

pip install https://github.com/jllllll/bitsandbytes-windows-webui/releases/download/wheels/bitsandbytes-0.39.1-py3-none-win_amd64.whl

All-in-one Web UI

CUDA_VISIBLE_DEVICES=0 python src/train_web.py

Currently the web UI only supports training on a single GPU.

Fine-tuning with a Single GPU

CUDA_VISIBLE_DEVICES=0 python src/train_bash.py \
    --stage sft \
    --model_name_or_path path_to_your_chatglm_model \
    --do_train \
    --dataset alpaca_gpt4_en \
    --finetuning_type lora \
    --output_dir path_to_sft_checkpoint \
    --per_device_train_batch_size 4 \
    --gradient_accumulation_steps 4 \
    --lr_scheduler_type cosine \
    --logging_steps 10 \
    --save_steps 1000 \
    --learning_rate 5e-5 \
    --num_train_epochs 3.0 \
    --plot_loss \
    --fp16

Please refer to our Wiki about the details of the arguments.

Distributed Fine-tuning with Multiple GPUs

accelerate config # configure the environment
accelerate launch src/train_bash.py # arguments (same as above)

Training Reward Model

CUDA_VISIBLE_DEVICES=0 python src/train_bash.py \
    --stage rm \
    --model_name_or_path path_to_your_chatglm_model \
    --do_train \
    --dataset comparison_gpt4_en \
    --finetuning_type lora \
    --resume_lora_training False \
    --checkpoint_dir path_to_sft_checkpoint \
    --output_dir path_to_rm_checkpoint \
    --per_device_train_batch_size 4 \
    --gradient_accumulation_steps 4 \
    --lr_scheduler_type cosine \
    --logging_steps 10 \
    --save_steps 1000 \
    --learning_rate 1e-5 \
    --num_train_epochs 1.0 \
    --plot_loss \
    --fp16

Training with RLHF

CUDA_VISIBLE_DEVICES=0 python src/train_bash.py \
    --stage ppo \
    --model_name_or_path path_to_your_chatglm_model \
    --do_train \
    --dataset alpaca_gpt4_en \
    --finetuning_type lora \
    --resume_lora_training False \
    --checkpoint_dir path_to_sft_checkpoint \
    --reward_model path_to_rm_checkpoint \
    --output_dir path_to_ppo_checkpoint \
    --per_device_train_batch_size 2 \
    --gradient_accumulation_steps 4 \
    --lr_scheduler_type cosine \
    --logging_steps 10 \
    --save_steps 1000 \
    --learning_rate 1e-5 \
    --num_train_epochs 1.0 \
    --plot_loss

Evaluation (BLEU and ROUGE_CHINESE)

CUDA_VISIBLE_DEVICES=0 python src/train_bash.py \
    --stage sft \
    --model_name_or_path path_to_your_chatglm_model \
    --do_eval \
    --dataset alpaca_gpt4_en \
    --finetuning_type lora \
    --checkpoint_dir path_to_checkpoint \
    --output_dir path_to_eval_result \
    --per_device_eval_batch_size 8 \
    --max_samples 50 \
    --predict_with_generate

Predict

CUDA_VISIBLE_DEVICES=0 python src/train_bash.py \
    --stage sft \
    --model_name_or_path path_to_your_chatglm_model \
    --do_predict \
    --dataset alpaca_gpt4_en \
    --finetuning_type lora \
    --checkpoint_dir path_to_checkpoint \
    --output_dir path_to_predict_result \
    --per_device_eval_batch_size 8 \
    --max_samples 100 \
    --predict_with_generate

If you want to predict the samples with empty responses, please kindly fill the response column with dummy tokens to ensure the sample will not be discarded throughout the preprocessing phase.

API Demo

python src/api_demo.py \
    --model_name_or_path path_to_your_chatglm_model \
    --finetuning_type lora \
    --checkpoint_dir path_to_checkpoint

Visit http://localhost:8000/docs for API documentation.

CLI Demo

python src/cli_demo.py \
    --model_name_or_path path_to_your_chatglm_model \
    --finetuning_type lora \
    --checkpoint_dir path_to_checkpoint

Web Demo

python src/web_demo.py \
    --model_name_or_path path_to_your_chatglm_model \
    --finetuning_type lora \
    --checkpoint_dir path_to_checkpoint

Export model

python src/export_model.py \
    --model_name_or_path path_to_your_chatglm_model \
    --finetuning_type lora \
    --checkpoint_dir path_to_checkpoint \
    --output_dir path_to_export

Hardware Requirements

Fine-tune method	Batch size	Mode	GRAM	Speed
LoRA (r=8)	16	FP16	28GB	8ex/s
LoRA (r=8)	8	FP16	24GB	8ex/s
LoRA (r=8)	4	FP16	20GB	8ex/s
LoRA (r=8)	4	INT8	10GB	8ex/s
LoRA (r=8)	4	INT4	8GB	8ex/s
P-Tuning (p=16)	4	FP16	20GB	8ex/s
P-Tuning (p=16)	4	INT8	16GB	8ex/s
P-Tuning (p=16)	4	INT4	12GB	8ex/s
Freeze (l=3)	4	FP16	24GB	8ex/s

RM method	Batch size	Mode	GRAM	Speed
LoRA (r=8) + rm	4	FP16	22GB	-
LoRA (r=8) + rm	1	INT8	11GB	-

RLHF method	Batch size	Mode	GRAM	Speed
LoRA (r=8) + ppo	4	FP16	23GB	-
LoRA (r=8) + ppo	1	INT8	12GB	-

Note: r is the lora rank, p is the number of prefix tokens, l is the number of trainable layers, ex/s is the examples per second at training. The gradient_accumulation_steps is set to 1. All are evaluated on a single Tesla V100 (32G) GPU, they are approximated values and may vary in different GPUs.

Fine-tuning ChatGLM: A Case

Training Results

We use the whole alpaca_gpt4_zh dataset to fine-tune the ChatGLM model with LoRA (r=8) for one epoch, using the default hyper-parameters. The loss curve during training is presented below.

Evaluation Results

We select 100 instances in the alpaca_gpt4_zh dataset to evaluate the fine-tuned ChatGLM model and compute the BLEU and ROUGE scores. The results are presented below.

Score	Original	FZ (l=2)	PT (p=16)	LoRA (r=8)
BLEU-4	15.75	16.85	16.06	17.01 (+1.26)
Rouge-1	34.51	36.62	34.80	36.77 (+2.26)
Rouge-2	15.11	17.04	15.32	16.83 (+1.72)
Rouge-l	26.18	28.17	26.35	28.86 (+2.68)
Params (%)	/	4.35%	0.06%	0.06%

FZ: freeze tuning, PT: P-Tuning V2 (we use pre_seq_len=16 for fair comparison with LoRA), Params: the percentange of trainable parameters.

Projects

SupritYoung/RLHF-Label-Tool: A tool for ranking the responses of LLMs to generate annotated samples used in RLHF training.

Compared with Existing Implementations

THUDM/ChatGLM-6B
- Official implementation of fine-tuning ChatGLM with P-Tuning v2 on the ADGEN dataset.
- Our fine-tuning script is largely depend on it. We further implement the LoRA tuning method. Additionally, we dynamically pad the inputs to the longest sequence in the batch instead of the maximum length, to accelerate the fine-tuning.
mymusise/ChatGLM-Tuning
- An unoffical implementation of fine-tuning ChatGLM with LoRA on the Stanford Alpaca dataset.
- We borrowed some ideas from it. Our fine-tuning script integrates the data pre-processing part into the training procedure, so we need not generate a pre-processed dataset before training.
ssbuild/chatglm_finetuning
- An unofficial implementation of fine-tuning ChatGLM with several PEFT methods on the Stanford Alpaca dataset.
- Our fine-tuning script is implemented purely with Hugging Face transformers and is independent of the deep_training framework.
lich99/ChatGLM-finetune-LoRA
- An unofficial implementation of fine-tuning ChatGLM with LoRA on the Stanford Alpaca dataset.
- We use the Hugging Face PEFT to provide the state-of-the-art PEFT methods.
liucongg/ChatGLM-Finetuning
- An unofficial implementation of fine-tuning ChatGLM with several methods including Freeze, LoRA and P-Tuning on the industrial dataset.
- We are aim to incorporate more instruction-following datasets for fine-tuning the ChatGLM model.
yanqiangmiffy/InstructGLM
- An unofficial implementation of fine-tuning ChatGLM that explores the ChatGLM's ability on the instruction-following datasets.
- Our fine-tuning script integrates the data pre-processing part in to the training procedure.

TODO

License

This repository is licensed under the Apache-2.0 License. Please follow the Model License to use ChatGLM-6B model.

Citation

If this work is helpful, please cite as:

@Misc{chatglm-efficient-tuning,
  title = {ChatGLM Efficient Tuning},
  author = {hiyouga},
  howpublished = {\url{https://github.com/hiyouga/ChatGLM-Efficient-Tuning}},
  year = {2023}
}

Acknowledgement

This repo benefits from ChatGLM-6B, ChatGLM-Tuning and yuanzhoulvpi2017/zero_nlp. Thanks for their wonderful works.

Star History

chatglm-efficient-tuning's People

Contributors

Stargazers

Watchers

Forkers

deltavml lyzkf dongdong9 iamleon121 noesis-yu rayjue ninehills tian64873493 daxiajames skyroot guoswang savokiss sam-at-git anyz01 maeganyork lebronhe felixzhang7 wangjianxiong-maker michaeloo0 qinyuenlp cyjack henryhesz ai-jie01 misterchangray yueyedeai liuyanyi makoofficial nerohin xusenlinzy swimtobird xujunrt tistergit brightxiaohan zero506 doodlebears waynedeng zhongpei km1994 catlove006 sysuhys yijiantx xu-wave janglichao wzsage qqr1 charlessl wesleyhuang2014 dumpmemory xsun15 ambier mesosxzan xiaoyichao shiyybua aibihub woshihj kuangjunwei1 www516717402 weimch zgctmac smilesmith liwenju0 asdlei99 huiguorou12 aceanan linhuall legichat mxcyixuan jiluojiluo cuiyc threestonessl songjx010 immortal5655 haojiepan1 weiwancheng zhanglv0209 iq-scm zyzyzhou cfireworks zhangnn520 fuxiaoyi githungdang aflyhat tianyuso corteam pyxsqbs flaviadeutsch dfqytcom garyfanhku wtwong316 swartzmss mmrbun wangshengyang2004 lcmd65 gshan4056 booniesfx youly172 chunweixu tonywang-sh tamanna18 linkmancheng

chatglm-efficient-tuning's Issues

Occurs some loss = nan steps, is it rational?

I am using your example to train ppo, while the logging show some loss = nan, I am curious about that if it is rational?

some plots:

{'loss': 0.3290, 'reward': -2.0304, 'learning_rate': 5e-05}
0%| | 1/13000 [00:10<36:34:36, 10.13s/it]{'loss': nan, 'reward': 5.7646, 'learning_rate': 5e-05}
0%| | 2/13000 [00:28<54:33:11, 15.11s/it{'loss': 0.2527, 'reward': 1.2237, 'learning_rate': 5e-05}
0%| | 3/13000 [00:39<47:01:51, 13.03s/it]{'loss': 0.1512, 'reward': 10.7681, 'learning_rate': 5e-05}
0%| | 4/13000 [01:01<60:36:15, 16.79s/it]{'loss': 0.0769, 'reward': 7.0280, 'learning_rate': 5e-05}
0%| | 5/13000 [01:20<63:06:18, 17.48s/it]{'loss': 0.1685, 'reward': 12.2049, 'learning_rate': 5e-05}
0%| | 6/13000 [01:28<51:51:58, 14.37s/it]

如何本地加载模型

支持多卡+lora+8bit量化训练吗？

lora训练后如何加速预测

lora fp16方式训练后，如何加速预测呢

大佬你好，请问通过lora训练方式得到的文件夹如何通过transformers的方式加载到模型中

得到的文件：

想要加载的位置（类似）：
config = AutoConfig.from_pretrained("chatglm-6b", trust_remote_code=True, pre_seq_len=128)
model = AutoModel.from_pretrained("chatglm-6b", config=config, trust_remote_code=True).half().cuda()

OOM Error when saving models trained in INT8 mode

A bug report from the WeChat group:

04/20/2023 11:04:39 - INFO - utils .common - Saving model checkpoint to path_to_checkpoint
Traceback (most recent call last):
  File "/src/finetune.py", line 73, in <module>
    main()
  File "/src/finetune.py", line 55, in main
    trainer.save_model()
  File "/site-packages/transformers/trainer.py", line 2830, in save_model
    self._save(output_dir)
  File "src/utils/common.py", line 462, in _save
    self.model.save_pretrained(output_dir) # only save peft weights with the built-in method
  File "/peft/src/peft/peft_model.py", line 116, in save_pretrained
    output_state_dict = get_peft_model_state_dict(
  File "/peft/src/peft/utils/save_and_load.py", line 32, in get_peft_model_state_dict
    state_dict = model.state_dict()
  File "/torch/nn/modules/module.py", line 1818, in state_dict
    module.state_dict(destination=destination, prefix=prefix + name + '.', keep_vars=keep_vars)
  File "/torch/nn/modules/module.py", line 1818, in state_dict
    module.state_dict(destination=destination, prefix=prefix + name + '.', keep_vars=keep_vars)
  File "/torch/nn/modules/module.py", line 1818, in state_dict
      [Previous line repeated 4 more times]
  File "/torch/nn/modules/module.py", line 1815, in state_dict
    self._save_to_state_dict(destination, prefix, keep_vars)
  File "/bitsandbytes/nn/modules.py", line 268, in _save to_state_dict
    self.weight.data = undo_layout(self.state.CxB, self.state.tile_indices)
  File "/bitsandbytes/autograd/_functions.py", line 96, in undo_layout
    outputs = torch.empty_like(tensor) # note: not using .index_copy because it was slower on cuda
torch.cuda.OutofMemoryError: CUDA out of memory, Tried to allocate 64.00 MiB (GPU 0: 14,76 GiB total capacity; 13.82 GiB already allocated; 47.75 MiB free; 14.02 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

A similar report can be found at: huggingface/peft#335

I suppose the failure is caused by state_dict = model.state_dict().

请问--checkpoint_dir是只支持一个epoch的断点么？每1000次保存的没法继续吗

如图，我打算加载1000次的断点，但这里报错找不到文件，我看了一下文件名，这个finetuning_args.bin在1000step自动保存的文件夹下面是没有的

训练的时候出现RuntimeError: "LayerNormKernelImpl" not implemented for 'Half'

显卡可能低一些，2070super 16G，请问怎么样修改

怎么使用int4量化版本模型进行finetune,使用quantization_bit 4 后loss 一直为0

Is train_rm not supported by multiple GPUs?

ValueError: FP16 Mixed precision training with AMP or APEX (`--fp16`) and FP16 half precision evaluation (`--fp16_full_eval`) can only be used on CUDA devices.

请问这是什么问题

mode怎么改成int4,显存不够，只有12G

大神，能出个懒人包吗？

在b站看到你的这个项目，期待你出个懒人包喔，这样方便我们这些小白使用

训练语料长度超出2048token

训练语料长度超出时候，只有warnning，但是训练的时候会报错，我要改成超长语料剔除的话，在哪里修改代码？

支持其他llm吗

python3.7是否可以运行呢

为啥照着步骤来，加载dataset example，始终无法加载自定义内容

請問要怎麼加入--lora_rank 8這個參數

請問要怎麼加入--lora_rank 8這個參數
以下為例子

CUDA_VISIBLE_DEVICES=0 python src/finetune.py \
    --do_train \
    --dataset alpaca_gpt4_zh \
    --finetuning_type lora \
    --output_dir path_to_checkpoint \
    --per_device_train_batch_size 4 \
    --gradient_accumulation_steps 4 \
    --lr_scheduler_type cosine \
    --logging_steps 10 \
    --save_steps 1000 \
    --learning_rate 5e-5 \
    --num_train_epochs 1.0 \
    --lora_rank 8 \
    --fp16

直接加在倒數第二行可以嗎?
我是個程式新手，請多包涵。謝謝。

自己内部的知识库数据，必须整成问答集吗？

求教
有没有直接用纯文本微调的方法？
或者方便从文本自动生成QA的方法？

您好，在单卡训练时报错RuntimeError: mixed dtype (CPU): expect input to have scalar type of BFloat16

File "/root/.cache/huggingface/modules/transformers_modules/models/modeling_chatglm.py", line 624, in forward
attention_input = self.input_layernorm(hidden_states)
File "/opt/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/anaconda3/lib/python3.9/site-packages/torch/nn/modules/normalization.py", line 190, in forward
return F.layer_norm(
File "/opt/anaconda3/lib/python3.9/site-packages/torch/nn/functional.py", line 2515, in layer_norm
return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: mixed dtype (CPU): expect input to have scalar type of BFloat16

The latest code has some problems while training

Traceback (most recent call last):
File "/mnt/task_runtime/ChatGLM-Efficient-Tuning/src/train_ppo.py", line 125, in
main()
File "/mnt/task_runtime/ChatGLM-Efficient-Tuning/src/train_ppo.py", line 93, in main
responses_with_queries = ppo_trainer.generate(queries, length_sampler=output_length_sampler, **gen_kwargs)
File "/mnt/task_runtime/ChatGLM-Efficient-Tuning/src/utils/ppo.py", line 162, in generate
response = self.accelerator.unwrap_model(self.model).generate(
File "/mnt/miniconda/envs/py310/lib/python3.10/site-packages/trl/models/modeling_value_head.py", line 195, in generate
return self.pretrained_model.generate(*args, **kwargs)
File "/mnt/miniconda/envs/py310/lib/python3.10/site-packages/peft/peft_model.py", line 731, in generate
outputs = self.base_model.generate(**kwargs)
File "/mnt/miniconda/envs/py310/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/mnt/miniconda/envs/py310/lib/python3.10/site-packages/transformers/generation/utils.py", line 1454, in generate
logits_processor = self._get_logits_processor(
File "/mnt/miniconda/envs/py310/lib/python3.10/site-packages/transformers/generation/utils.py", line 935, in _get_logits_processor
processors = self._merge_criteria_processor_list(processors, logits_processor)
File "/mnt/miniconda/envs/py310/lib/python3.10/site-packages/transformers/generation/utils.py", line 957, in _merge_criteria_processor_list
if len(custom_list) == 0:
TypeError: object of type 'InvalidScoreLogitsProcessor' has no len()

怎么不支持边训练边验证保存最佳模型再做测试？

现在训练和验证分开的，不利于保存最佳模型，是否能改进一下

请问在多卡运行文件fine_tuning_chatglm6b.py时报错是什么原因呢？

alpaca_dataset.py", line 28, in init
self.eop = tokenizer.eop_token_id
AttributeError: 'ChatGLMTokenizer' object has no attribute 'eop_token_id'

用lora微调，为什么没有readme的效果

评估结果是：

***** eval metrics *****
eval_bleu-4 = 15.0234
eval_rouge-1 = 35.197
eval_rouge-2 = 15.4659
eval_rouge-l = 26.9888
eval_runtime = 0:02:57.77
eval_samples_per_second = 0.563
eval_steps_per_second = 0.073

跟原版模型几乎一样BLEU还下降了？

调了个寂寞，倒是没什么灾难性遗忘

lora训练完毕后参数加回原模型

请教如何将lora训练完毕后参数加回原模型
这样在部署的时候可以采用ChatGLM的加速方案，如https://github.com/wangzhaode/ChatGLM-MNN

RLHF 训练时报错

File "D:\Software\anaconda3\envs\chatglm\lib\site-packages\transformers\generation\utils.py", line 924, in _merge_criteria_processor_list
if len(custom_list) == 0:
TypeError: object of type 'InvalidScoreLogitsProcessor' has no len()
是transformers版本问题吗？试了几个版本都不行

Dataset doesn't exist.

FileNotFoundError: Couldn't find a dataset script at 
/content/JosephusCheung/GuanacoDataset/GuanacoDataset.py or any data file in the
same directory. Couldn't find 'JosephusCheung/GuanacoDataset' on the Hugging 
Face Hub either: FileNotFoundError: Dataset 'JosephusCheung/GuanacoDataset' 
doesn't exist on the Hub. If the repo is private or gated, make sure to log in 
with `huggingface-cli login`.

在不破坏原有对话能力的前提下，现在有pTuning后实际效果比较明显的例子吗？

第三步出现错误：RuntimeError: probability tensor contains either inf, nan or element < 0

File "/output/ChatGLM-Efficient-Tuning/src/train_ppo.py", line 114, in
main()
File "/output/ChatGLM-Efficient-Tuning/src/train_ppo.py", line 82, in main
responses_with_queries = ppo_trainer.generate(queries, length_sampler=output_length_sampler, **gen_kwargs)
File "/output/ChatGLM-Efficient-Tuning/src/utils/ppo.py", line 162, in generate
response = self.accelerator.unwrap_model(self.model).generate(
File "/usr/local/envs/chatglm_etuning/lib/python3.10/site-packages/trl/models/modeling_value_head.py", line 195, in generate
return self.pretrained_model.generate(*args, **kwargs)
File "/usr/local/envs/chatglm_etuning/lib/python3.10/site-packages/peft/peft_model.py", line 731, in generate
outputs = self.base_model.generate(**kwargs)
File "/usr/local/envs/chatglm_etuning/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/usr/local/envs/chatglm_etuning/lib/python3.10/site-packages/transformers/generation/utils.py", line 1485, in generate
return self.sample(
File "/usr/local/envs/chatglm_etuning/lib/python3.10/site-packages/transformers/generation/utils.py", line 2560, in sample
next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
RuntimeError: probability tensor contains either inf, nan or element < 0

哥要不出个懒人包吧，折腾了两天环境都没弄好总是报错

微信群人数超200了，无法直接加入

找不到arguments模块

CUDA_VISIBLE_DEVICES=0 python3 infer.py --checkpoint_dir ../output/checkpoint-2000
执行后，提示如下错误：
ModuleNotFoundError: No module named 'arguments'
请问一下，这个arguments是PIP3库里的arguments吗？还是本地的

Ryb

多卡如何进行分布式部署推理

作者大大，想问下，如果完全按照你们的数据，代码，脚本参数，可以复现出和你们一样的效果吗～

我在尝试lora时报错

我的服务器有4个32G的GPU显卡，lora时报错如下：
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! (when checking argument for argument index in method wrapper__index_select)
我需要改哪里的代码才能指定一个卡呢？

关于数据集中单个example的长度

数据集是多轮对话的，一个example的history可能比较长，是最好要限制在2048个token以内吗？

全参数微调

请教一下，有全参数微调的版本吗

BELLE中的实验表明全参数微调的效果会更好 https://github.com/LianjiaTech/BELLE

TypeError: argument of type 'NoneType' is not iterable

报错'TypeError: argument of type 'NoneType' is not iterable'

ValueError: ChatGLMForConditionalGeneration does not support gradient checkpointing.

大神怎么解决？

在chatglm-6b-int4模型上微调会出现以下错误，想问下作者，是否不支持直接在int4模型上微调？

Error:
RuntimeError: mat1 and mat2 shapes cannot be multiplied (812x4096 and 2048x12288)

大家有遇到这样的问题吗？

Using Accelerate train

Thank you for open this repo.
In example for DDP train:

accelerate config # configure the environment
accelerate launch src/finetune.py # arguments (same as above)

However. No accelerate library found in your repo.

+ from accelerate import Accelerator
+ accelerator = Accelerator()

+ model, optimizer, training_dataloader, scheduler = accelerator.prepare(
+     model, optimizer, training_dataloader, scheduler
+ )
...

4.14最新代码，尝试跑demo报数据集sum校验异常

04/15/2023 22:48:30 - INFO - utils - Loading dataset DatasetInfo(load_from='file', dataset_name=None, file_name='alpaca_gpt4_data_zh.json', file_sha1='736d3a9d0fcbb252d1e8f902920961ecfd310e41')...
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ C:\Users\tians\PycharmProjects\ChatGLM-Efficient-Tuning\finetune_chatglm.py:67 in │
│ │
│ 64 │
│ 65 │
│ 66 if name == "main": │
│ ❱ 67 │ main() │
│ 68 │
│ │
│ C:\Users\tians\PycharmProjects\ChatGLM-Efficient-Tuning\finetune_chatglm.py:22 in main │
│ │
│ 19 │ │
│ 20 │ # Prepare pretrained model and dataset │
│ 21 │ model_args, data_args, training_args, finetuning_args = prepare_args() │
│ ❱ 22 │ dataset = prepare_data(model_args, data_args, training_args) │
│ 23 │ model, tokenizer = load_pretrained(model_args, finetuning_args, is_trainable=trainin │
│ 24 │ dataset = preprocess_data(dataset, tokenizer, data_args, training_args) │
│ 25 │ data_collator = DataCollatorForChatGLM(tokenizer, model, data_args.ignore_pad_token_ │
│ │
│ C:\Users\tians\PycharmProjects\ChatGLM-Efficient-Tuning\utils.py:246 in prepare_data │
│ │
│ 243 │ │ │ data_file = os.path.join(data_args.dataset_dir, dataset_info.file_name) │
│ 244 │ │ │ extension = dataset_info.file_name.split(".")[-1] │
│ 245 │ │ │ if dataset_info.file_sha1 is not None: │
│ ❱ 246 │ │ │ │ checksum(data_file, dataset_info.file_sha1) │
│ 247 │ │ │ else: │
│ 248 │ │ │ │ logger.warning("Checksum failed: missing SHA-1 hash value in dataset_inf │
│ 249 │ │ │ raw_datasets = load_dataset( │
│ │
│ C:\Users\tians\PycharmProjects\ChatGLM-Efficient-Tuning\utils.py:228 in checksum │
│ │
│ 225 │ │ │ binary_data = datafile.read() │
│ 226 │ │ sha1 = hashlib.sha1(binary_data).hexdigest() │
│ 227 │ │ if sha1 != hash: │
│ ❱ 228 │ │ │ raise ValueError("Checksum failed for {}.".format(file_path)) │
│ 229 │ │
│ 230 │ max_samples = data_args.max_train_samples if training_args.do_train else data_args.m │
│ 231 │ all_datasets = [] # support multiple datasets │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
ValueError: Checksum failed for data\alpaca_gpt4_data_zh.json.

日志如上，请问要如何解决这个问题呢？

small dataset get 1.5s/iter
large dataset get 5.6s/iter
Why get this appearence,

AssertionError: No inf checks were recorded for this optimizer.

当运行

    $ CUDA_VISIBLE_DEVICES=0 python src/finetune.py  --do_train  --dataset alpaca_gpt4_zh  --finetuning_type freeze  --output_dir path_to_checkpoint  --per_device_train_batch_size 2  --gradient_accumulation_steps 2  --lr_scheduler_type cosine  --logging_steps 10  --save_steps 1000  --learning_rate 5e-5  --num_train_epochs 1.0   --quantization_bit=8 --fp16

出现

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /ChatGLM-Efficient-Tuning/src/finetune.py:88 in <module>               │
│                                                                                                  │
│   85                                                                                             │
│   86                                                                                             │
│   87 if __name__ == "__main__":                                                                  │
│ ❱ 88 │   main()                                                                                  │
│   89                                                                                             │
│                                                                                                  │
│ /ChatGLM-Efficient-Tuning/src/finetune.py:60 in main                   │
│                                                                                                  │
│   57 │                                                                                           │
│   58 │   # Training                                                                              │
│   59 │   if training_args.do_train:                                                              │
│ ❱ 60 │   │   train_result = trainer.train()                                                      │
│   61 │   │   trainer.log_metrics("train", train_result.metrics)                                  │
│   62 │   │   trainer.save_metrics("train", train_result.metrics)                                 │
│   63 │   │   trainer.save_state() # along with the loss values                                   │
│                                                                                                  │
│ /anaconda3/envs/py38_chat_peft/lib/python3.8/site-packages/transformers/trainer.py:16 │
│ 62 in train                                                                                      │
│                                                                                                  │
│   1659 │   │   inner_training_loop = find_executable_batch_size(                                 │
│   1660 │   │   │   self._inner_training_loop, self._train_batch_size, args.auto_find_batch_size  │
│   1661 │   │   )                                                                                 │
│ ❱ 1662 │   │   return inner_training_loop(                                                       │
│   1663 │   │   │   args=args,                                                                    │
│   1664 │   │   │   resume_from_checkpoint=resume_from_checkpoint,                                │
│   1665 │   │   │   trial=trial,                                                                  │
│                                                                                                  │
│ /anaconda3/envs/py38_chat_peft/lib/python3.8/site-packages/transformers/trainer.py:19 │
│ 91 in _inner_training_loop                                                                       │
│                                                                                                  │
│   1988 │   │   │   │   │   │   │   xm.optimizer_step(self.optimizer)                             │
│   1989 │   │   │   │   │   elif self.do_grad_scaling:                                            │
│   1990 │   │   │   │   │   │   scale_before = self.scaler.get_scale()                            │
│ ❱ 1991 │   │   │   │   │   │   self.scaler.step(self.optimizer)                                  │
│   1992 │   │   │   │   │   │   self.scaler.update()                                              │
│   1993 │   │   │   │   │   │   scale_after = self.scaler.get_scale()                             │
│   1994 │   │   │   │   │   │   optimizer_was_run = scale_before <= scale_after                   │
│                                                                                                  │
│ /anaconda3/envs/py38_chat_peft/lib/python3.8/site-packages/torch/cuda/amp/grad_scaler │
│ .py:368 in step                                                                                  │
│                                                                                                  │
│   365 │   │   if optimizer_state["stage"] is OptState.READY:                                     │
│   366 │   │   │   self.unscale_(optimizer)                                                       │
│   367 │   │                                                                                      │
│ ❱ 368 │   │   assert len(optimizer_state["found_inf_per_device"]) > 0, "No inf checks were rec   │
│   369 │   │                                                                                      │
│   370 │   │   retval = self._maybe_opt_step(optimizer, optimizer_state, *args, **kwargs)         │
│   371                                                                                            │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
AssertionError: No inf checks were recorded for this optimizer.

请问是什么问题呢？