thudm / chatglm2-6b Goto Github PK

View Code? Open in Web Editor NEW

15.5K 136.0 1.8K 7.28 MB

ChatGLM2-6B: An Open Bilingual Chat LLM | 开源双语对话语言模型

License: Other

Python 94.56% Shell 5.44%

chatglm chatglm-6b large-language-models llm

chatglm2-6b's Introduction

ChatGLM2-6B

🤗 HF Repo • 🐦 Twitter • 📃 [GLM@ACL 22] [GitHub] • 📃 [GLM-130B@ICLR 23] [GitHub]

👋 加入我们的 Slack 和 WeChat

📍在 chatglm.cn 体验更大规模的 ChatGLM 模型。

Read this in English

新一代开源模型 ChatGLM3-6B 已发布，拥有10B以下最强的基础模型，支持工具调用（Function Call）、代码执行（Code Interpreter）、Agent 任务等功能。

介绍

ChatGLM2-6B 是开源中英双语对话模型 ChatGLM-6B 的第二代版本，在保留了初代模型对话流畅、部署门槛较低等众多优秀特性的基础之上，ChatGLM2-6B 引入了如下新特性：

更强大的性能：基于 ChatGLM 初代模型的开发经验，我们全面升级了 ChatGLM2-6B 的基座模型。ChatGLM2-6B 使用了 GLM 的混合目标函数，经过了 1.4T 中英标识符的预训练与人类偏好对齐训练，评测结果显示，相比于初代模型，ChatGLM2-6B 在 MMLU（+23%）、CEval（+33%）、GSM8K（+571%）、BBH（+60%）等数据集上的性能取得了大幅度的提升，在同尺寸开源模型中具有较强的竞争力。
更长的上下文：基于 FlashAttention 技术，我们将基座模型的上下文长度（Context Length）由 ChatGLM-6B 的 2K 扩展到了 32K，并在对话阶段使用 8K 的上下文长度训练。对于更长的上下文，我们发布了 ChatGLM2-6B-32K 模型。LongBench 的测评结果表明，在等量级的开源模型中，ChatGLM2-6B-32K 有着较为明显的竞争优势。
更高效的推理：基于 Multi-Query Attention 技术，ChatGLM2-6B 有更高效的推理速度和更低的显存占用：在官方的模型实现下，推理速度相比初代提升了 42%，INT4 量化下，6G 显存支持的对话长度由 1K 提升到了 8K。
更开放的协议：ChatGLM2-6B 权重对学术研究完全开放，在填写问卷进行登记后亦允许免费商业使用。

ChatGLM2-6B 开源模型旨在与开源社区一起推动大模型技术发展，恳请开发者和大家遵守开源协议，勿将开源模型和代码及基于开源项目产生的衍生物用于任何可能给国家和社会带来危害的用途以及用于任何未经过安全评估和备案的服务。目前，本项目团队未基于 ChatGLM2-6B 开发任何应用，包括网页端、安卓、苹果 iOS 及 Windows App 等应用。

尽管模型在训练的各个阶段都尽力确保数据的合规性和准确性，但由于 ChatGLM2-6B 模型规模较小，且模型受概率随机性因素影响，无法保证输出内容的准确性，且模型易被误导。本项目不承担开源模型和代码导致的数据安全、舆情风险或发生任何模型被误导、滥用、传播、不当利用而产生的风险和责任。

更新信息

[2023/07/31] 发布 ChatGLM2-6B-32K 模型，提升对于长文本的理解能力。

[2023/07/25] 发布 CodeGeeX2 模型，基于 ChatGLM2-6B 加入代码预训练实现，代码能力全面提升。

[2023/07/04] 发布 P-Tuning v2 与全参数微调脚本，参见 P-Tuning。

友情链接

对 ChatGLM2 进行加速的开源项目：

fastllm: 全平台加速推理方案，单GPU批量推理每秒可达10000+token，手机端最低3G内存实时运行（骁龙865上约4~5 token/s）
chatglm.cpp: 类似 llama.cpp 的 CPU 量化加速推理方案，实现 Mac 笔记本上实时对话
ChatGLM2-TPU: 采用TPU加速推理方案，在算能端侧芯片BM1684X（16T@FP16，内存16G）上实时运行约5 token/s

基于或使用了 ChatGLM2-6B 的开源项目：

Chuanhu Chat: 为各个大语言模型和在线模型API提供美观易用、功能丰富、快速部署的用户界面，支持ChatGLM2-6B。

支持 ChatGLM-6B 和相关应用在线训练的示例项目：

评测结果

我们选取了部分中英文典型数据集进行了评测，以下为 ChatGLM2-6B 模型在 MMLU (英文)、C-Eval（中文）、GSM8K（数学）、BBH（英文）上的测评结果。在 evaluation 中提供了在 C-Eval 上进行测评的脚本。

MMLU

Model	Average	STEM	Social Sciences	Humanities	Others
ChatGLM-6B	40.63	33.89	44.84	39.02	45.71
ChatGLM2-6B (base)	47.86	41.20	54.44	43.66	54.46
ChatGLM2-6B	45.46	40.06	51.61	41.23	51.24
ChatGLM2-12B (base)	56.18	48.18	65.13	52.58	60.93
ChatGLM2-12B	52.13	47.00	61.00	46.10	56.05

Chat 模型使用 zero-shot CoT (Chain-of-Thought) 的方法测试，Base 模型使用 few-shot answer-only 的方法测试

C-Eval

Model	Average	STEM	Social Sciences	Humanities	Others
ChatGLM-6B	38.9	33.3	48.3	41.3	38.0
ChatGLM2-6B (base)	51.7	48.6	60.5	51.3	49.8
ChatGLM2-6B	50.1	46.4	60.4	50.6	46.9
ChatGLM2-12B (base)	61.6	55.4	73.7	64.2	59.4
ChatGLM2-12B	57.0	52.1	69.3	58.5	53.2

Chat 模型使用 zero-shot CoT 的方法测试，Base 模型使用 few-shot answer only 的方法测试

GSM8K

Model	Accuracy	Accuracy (Chinese)*
ChatGLM-6B	4.82	5.85
ChatGLM2-6B (base)	32.37	28.95
ChatGLM2-6B	28.05	20.45
ChatGLM2-12B (base)	40.94	42.71
ChatGLM2-12B	38.13	23.43

所有模型均使用 few-shot CoT 的方法测试，CoT prompt 来自 http://arxiv.org/abs/2201.11903

* 我们使用翻译 API 翻译了 GSM8K 中的 500 道题目和 CoT prompt 并进行了人工校对

BBH

Model	Accuracy
ChatGLM-6B	18.73
ChatGLM2-6B (base)	33.68
ChatGLM2-6B	30.00
ChatGLM2-12B (base)	36.02
ChatGLM2-12B	39.98

所有模型均使用 few-shot CoT 的方法测试，CoT prompt 来自 https://github.com/suzgunmirac/BIG-Bench-Hard/tree/main/cot-prompts

推理性能

ChatGLM2-6B 使用了 Multi-Query Attention，提高了生成速度。生成 2000 个字符的平均速度对比如下

Model	推理速度 (字符/秒)
ChatGLM-6B	31.49
ChatGLM2-6B	44.62

使用官方实现，batch size = 1，max length = 2048，bf16 精度，测试硬件为 A100-SXM4-80G，软件环境为 PyTorch 2.0.1

Multi-Query Attention 同时也降低了生成过程中 KV Cache 的显存占用，此外，ChatGLM2-6B 采用 Causal Mask 进行对话训练，连续对话时可复用前面轮次的 KV Cache，进一步优化了显存占用。因此，使用 6GB 显存的显卡进行 INT4 量化的推理时，初代的 ChatGLM-6B 模型最多能够生成 1119 个字符就会提示显存耗尽，而 ChatGLM2-6B 能够生成至少 8192 个字符。

量化等级	编码 2048 长度的最小显存	生成 8192 长度的最小显存
FP16 / BF16	13.1 GB	12.8 GB
INT8	8.2 GB	8.1 GB
INT4	5.5 GB	5.1 GB

ChatGLM2-6B 利用了 PyTorch 2.0 引入的 torch.nn.functional.scaled_dot_product_attention 实现高效的 Attention 计算，如果 PyTorch 版本较低则会 fallback 到朴素的 Attention 实现，出现显存占用高于上表的情况。

我们也测试了量化对模型性能的影响。结果表明，量化对模型性能的影响在可接受范围内。

量化等级	Accuracy (MMLU)	Accuracy (C-Eval dev)
BF16	45.47	53.57
INT4	43.13	50.30

ChatGLM2-6B 示例

相比于初代模型，ChatGLM2-6B 多个维度的能力都取得了提升，以下是一些对比示例。更多 ChatGLM2-6B 的可能，等待你来探索发现！

数理逻辑

知识推理

长文档理解

使用方式

环境安装

首先需要下载本仓库：

git clone https://github.com/THUDM/ChatGLM2-6B
cd ChatGLM2-6B

然后使用 pip 安装依赖：

pip install -r requirements.txt

其中 transformers 库版本推荐为 4.30.2，torch 推荐使用 2.0 及以上的版本，以获得最佳的推理性能。

代码调用

可以通过如下代码调用 ChatGLM2-6B 模型来生成对话：

>>> from transformers import AutoTokenizer, AutoModel
>>> tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm2-6b", trust_remote_code=True)
>>> model = AutoModel.from_pretrained("THUDM/chatglm2-6b", trust_remote_code=True, device='cuda')
>>> model = model.eval()
>>> response, history = model.chat(tokenizer, "你好", history=[])
>>> print(response)
你好👋!我是人工智能助手 ChatGLM2-6B,很高兴见到你,欢迎问我任何问题。
>>> response, history = model.chat(tokenizer, "晚上睡不着应该怎么办", history=history)
>>> print(response)
晚上睡不着可能会让你感到焦虑或不舒服,但以下是一些可以帮助你入睡的方法:

1. 制定规律的睡眠时间表:保持规律的睡眠时间表可以帮助你建立健康的睡眠习惯,使你更容易入睡。尽量在每天的相同时间上床,并在同一时间起床。
2. 创造一个舒适的睡眠环境:确保睡眠环境舒适,安静,黑暗且温度适宜。可以使用舒适的床上用品,并保持房间通风。
3. 放松身心:在睡前做些放松的活动,例如泡个热水澡,听些轻柔的音乐,阅读一些有趣的书籍等,有助于缓解紧张和焦虑,使你更容易入睡。
4. 避免饮用含有咖啡因的饮料:咖啡因是一种刺激性物质,会影响你的睡眠质量。尽量避免在睡前饮用含有咖啡因的饮料,例如咖啡,茶和可乐。
5. 避免在床上做与睡眠无关的事情:在床上做些与睡眠无关的事情,例如看电影,玩游戏或工作等,可能会干扰你的睡眠。
6. 尝试呼吸技巧:深呼吸是一种放松技巧,可以帮助你缓解紧张和焦虑,使你更容易入睡。试着慢慢吸气,保持几秒钟,然后缓慢呼气。

如果这些方法无法帮助你入睡,你可以考虑咨询医生或睡眠专家,寻求进一步的建议。

从本地加载模型

以上代码会由 transformers 自动下载模型实现和参数。完整的模型实现在 Hugging Face Hub。如果你的网络环境较差，下载模型参数可能会花费较长时间甚至失败。此时可以先将模型下载到本地，然后从本地加载。

从 Hugging Face Hub 下载模型需要先安装Git LFS，然后运行

git clone https://huggingface.co/THUDM/chatglm2-6b

如果你从 Hugging Face Hub 上下载 checkpoint 的速度较慢，可以只下载模型实现

GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/THUDM/chatglm2-6b

然后从这里手动下载模型参数文件，并将下载的文件替换到本地的 chatglm2-6b 目录下。

将模型下载到本地之后，将以上代码中的 THUDM/chatglm2-6b 替换为你本地的 chatglm2-6b 文件夹的路径，即可从本地加载模型。

模型的实现仍然处在变动中。如果希望固定使用的模型实现以保证兼容性，可以在 from_pretrained 的调用中增加 revision="v1.0" 参数。v1.0 是当前最新的版本号，完整的版本列表参见 Change Log。

网页版 Demo

可以通过以下命令启动基于 Gradio 的网页版 demo：

python web_demo.py

可以通过以下命令启动基于 Streamlit 的网页版 demo：

streamlit run web_demo2.py

网页版 demo 会运行一个 Web Server，并输出地址。在浏览器中打开输出的地址即可使用。经测试，基于 Streamlit 的网页版 Demo 会更流畅。

命令行 Demo

运行仓库中 cli_demo.py：

python cli_demo.py

程序会在命令行中进行交互式的对话，在命令行中输入指示并回车即可生成回复，输入 clear 可以清空对话历史，输入 stop 终止程序。

API 部署

首先需要安装额外的依赖 pip install fastapi uvicorn，然后运行仓库中的 api.py：

python api.py

默认部署在本地的 8000 端口，通过 POST 方法进行调用

curl -X POST "http://127.0.0.1:8000" \
     -H 'Content-Type: application/json' \
     -d '{"prompt": "你好", "history": []}'

得到的返回值为

{
  "response":"你好👋！我是人工智能助手 ChatGLM2-6B，很高兴见到你，欢迎问我任何问题。",
  "history":[["你好","你好👋！我是人工智能助手 ChatGLM2-6B，很高兴见到你，欢迎问我任何问题。"]],
  "status":200,
  "time":"2023-03-23 21:38:40"
}

感谢 @hiyouga 实现了 OpenAI 格式的流式 API 部署，可以作为任意基于 ChatGPT 的应用的后端，比如 ChatGPT-Next-Web。可以通过运行仓库中的openai_api.py 进行部署：

python openai_api.py

进行 API 调用的示例代码为

import openai
if __name__ == "__main__":
    openai.api_base = "http://localhost:8000/v1"
    openai.api_key = "none"
    for chunk in openai.ChatCompletion.create(
        model="chatglm2-6b",
        messages=[
            {"role": "user", "content": "你好"}
        ],
        stream=True
    ):
        if hasattr(chunk.choices[0].delta, "content"):
            print(chunk.choices[0].delta.content, end="", flush=True)

低成本部署

模型量化

默认情况下，模型以 FP16 精度加载，运行上述代码需要大概 13GB 显存。如果你的 GPU 显存有限，可以尝试以量化方式加载模型，使用方法如下：

model = AutoModel.from_pretrained("THUDM/chatglm2-6b-int4",trust_remote_code=True).cuda()

模型量化会带来一定的性能损失，经过测试，ChatGLM2-6B 在 4-bit 量化下仍然能够进行自然流畅的生成。量化模型的参数文件也可以从这里手动下载。

CPU 部署

如果你没有 GPU 硬件的话，也可以在 CPU 上进行推理，但是推理速度会更慢。使用方法如下（需要大概 32GB 内存）

model = AutoModel.from_pretrained("THUDM/chatglm2-6b", trust_remote_code=True).float()

如果你的内存不足的话，也可以使用量化后的模型

model = AutoModel.from_pretrained("THUDM/chatglm2-6b-int4",trust_remote_code=True).float()

在 cpu 上运行量化后的模型需要安装 gcc 与 openmp。多数 Linux 发行版默认已安装。对于 Windows ，可在安装 TDM-GCC 时勾选 openmp。 Windows 测试环境 gcc 版本为 TDM-GCC 10.3.0， Linux 为 gcc 11.3.0。在 MacOS 上请参考 Q1。

Mac 部署

对于搭载了 Apple Silicon 或者 AMD GPU 的 Mac，可以使用 MPS 后端来在 GPU 上运行 ChatGLM2-6B。需要参考 Apple 的官方说明安装 PyTorch-Nightly（正确的版本号应该是2.x.x.dev2023xxxx，而不是 2.x.x）。

目前在 MacOS 上只支持从本地加载模型。将代码中的模型加载改为从本地加载，并使用 mps 后端：

model = AutoModel.from_pretrained("your local path", trust_remote_code=True).to('mps')

加载半精度的 ChatGLM2-6B 模型需要大概 13GB 内存。内存较小的机器（比如 16GB 内存的 MacBook Pro），在空余内存不足的情况下会使用硬盘上的虚拟内存，导致推理速度严重变慢。此时可以使用量化后的模型 chatglm2-6b-int4。因为 GPU 上量化的 kernel 是使用 CUDA 编写的，因此无法在 MacOS 上使用，只能使用 CPU 进行推理。为了充分使用 CPU 并行，还需要单独安装 OpenMP。

在 Mac 上进行推理也可以使用 ChatGLM.cpp

多卡部署

如果你有多张 GPU，但是每张 GPU 的显存大小都不足以容纳完整的模型，那么可以将模型切分在多张GPU上。首先安装 accelerate: pip install accelerate，然后通过如下方法加载模型：

from utils import load_model_on_gpus
model = load_model_on_gpus("THUDM/chatglm2-6b", num_gpus=2)

即可将模型部署到两张 GPU 上进行推理。你可以将 num_gpus 改为你希望使用的 GPU 数。默认是均匀切分的，你也可以传入 device_map 参数来自己指定。

协议

本仓库的代码依照 Apache-2.0 协议开源，ChatGLM2-6B 模型的权重的使用则需要遵循 Model License。ChatGLM2-6B 权重对学术研究完全开放，在填写问卷进行登记后亦允许免费商业使用。

引用

如果你觉得我们的工作有帮助的话，请考虑引用下列论文，ChatGLM2-6B 的论文会在近期公布，敬请期待～

@article{zeng2022glm,
  title={Glm-130b: An open bilingual pre-trained model},
  author={Zeng, Aohan and Liu, Xiao and Du, Zhengxiao and Wang, Zihan and Lai, Hanyu and Ding, Ming and Yang, Zhuoyi and Xu, Yifan and Zheng, Wendi and Xia, Xiao and others},
  journal={arXiv preprint arXiv:2210.02414},
  year={2022}
}

@inproceedings{du2022glm,
  title={GLM: General Language Model Pretraining with Autoregressive Blank Infilling},
  author={Du, Zhengxiao and Qian, Yujie and Liu, Xiao and Ding, Ming and Qiu, Jiezhong and Yang, Zhilin and Tang, Jie},
  booktitle={Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
  pages={320--335},
  year={2022}
}

chatglm2-6b's People

Contributors

Stargazers

Watchers

Forkers

yzxzero xmy123 lipiji victorshawfan wangleigit001 af-74413592 shenhao-stu ppnorain cifangyiquan neverstoplearn wurentidai hongwen-sun knowledgehacker redpintings sunlin-xiaonai zdyshine hehexiong xuexidi winning1120xx iwanglei1 conanwsz zky001 mxcyixuan lili0710432 zhangshuhao0928 aaron-mindverse cyt1984 yanzihan1 junglebella qzl164 nanqiai l5276261 onehumanaha to-be-architect shenqi453 edgeye itsharex jj4jj zjq9683 like74162 openandrus robin-ai-ml jadeluo hiyouga sonnet0524 vinlic lyhiving 2132660698 asksasasa83 xxl-git springwings jacky68147527 techthiyanes yangbiao11 catherinezhou adambear wjy-github callanwu xiedongmingming wang-hua-2019 qinjx wangyy161 tanxueshi blackwilliam msymsy sxm1129 horsedongmin donghaiyu xinxiangbobby litprice bikong2 wendongj thelongestusernameofall mars-wei chenlei1812 lyonru29 canqiang ignorancesmile xaviernoder shaowentian minghsuanwu jackstephen bangbangda zjjdesky thenew01 icodebug0 ldsxp yathon mark-libn zzhsec yuri2peter gxj0116 rosefun shangzchao tonywang-sh yongye507 hzzhang-nlp nolineartang rei1mu hyzwz

chatglm2-6b's Issues

请告诉我这个提示是什么意思？ Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.

Is your feature request related to a problem? Please describe.

No response

Solutions

Setting pad_token_id to eos_token_id:2 for open-end generation. 如何修改参数？

Additional context

No response

为嘛微调之后2不如1好呢？

Is your feature request related to a problem? Please describe.

No response

Solutions

为嘛微调之后2不如1好呢？

Additional context

No response

模型微调是不是可以参考chatglm-6b中的微调方式

Is your feature request related to a problem? Please describe.

模型微调是不是可以参考chatglm-6b中的微调方式

Solutions

模型微调是不是可以参考chatglm-6b中的微调方式

Additional context

No response

[BUG/Help] 疑似测试数据泄露，CEval榜单结果不够靠谱

Is there an existing issue for this?

I have searched the existing issues

Current Behavior

疑似测试数据泄露，llm榜单结果不够靠谱

Expected Behavior

No response

Steps To Reproduce

疑似测试数据泄露，llm榜单结果不够靠谱

Environment

- OS:
- Python:
- Transformers:
- PyTorch:
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :

Anything else?

No response

[BUG/Help] <title>RuntimeError: Internal: src/sentencepiece_processor.cc(1101) [model_proto->ParseFromArray(serialized.data(), serialized.size())]

Is there an existing issue for this?

I have searched the existing issues

Current Behavior

这个错误怎么出现的，怎么解决？看ChatGLM 是ice_text.model这个文件没下载对，但 huggingface chatglm2 没看到这个文件

Expected Behavior

如何解决

Steps To Reproduce

chatglm2

Environment

- OS:
- Python:3
- Transformers:
- PyTorch:
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :

Anything else?

如何解决

[Help] <CPU模型运行时，如何提升核利用率>

Is there an existing issue for this?

I have searched the existing issues

Current Behavior

模型在CPU上启动起来后，每次进行chat，观察机器的CPU利用率恰好是50%

比如服务器是12个CPU core，在chat过程中，同时只有6个core在进行计算

Expected Behavior

调整哪个参数可以提高chat时的CPU利用率？

Steps To Reproduce

按教程运行demo或者api
发起chat
观察服务器CPU利用率情况

Environment

- OS:CentOS
- Python:3.9
- Transformers:4.30.2
- PyTorch:2.0.1
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :False

Anything else?

No response

评测

Is your feature request related to a problem? Please describe.

请问评测脚本的demo计划开源吗？

Solutions

评测脚本开源可以帮助大家快速上手评测。

Additional context

No response

[BUG/Help] <title>

Is there an existing issue for this?

I have searched the existing issues

Current Behavior

Is it a bug?

Expected Behavior

as above

Steps To Reproduce

as above

Environment

- OS:
- Python:
- Transformers:
- PyTorch:
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :

Anything else?

No response

[BUG/Help] Torch not compiled with CUDA enabled

Is there an existing issue for this?

I have searched the existing issues

Current Behavior

Traceback (most recent call last):
File "F:\ChatGLM2-6B\web_demo.py", line 6, in
model = AutoModel.from_pretrained("THUDM/chatglm2-6b", trust_remote_code=True, device='cuda')
File "C:\Users\admin\anaconda3\envs\chatglm2\lib\site-packages\transformers\models\auto\auto_factory.py", line 466, in from_pretrained
return model_class.from_pretrained(
File "C:\Users\admin\anaconda3\envs\chatglm2\lib\site-packages\transformers\modeling_utils.py", line 2498, in from_pretrained
model = cls(config, *model_args, **model_kwargs)
File "C:\Users\admin/.cache\huggingface\modules\transformers_modules\THUDM\chatglm2-6b\a6d54fac46dff2db65d53416c207a4485ca6bd40\modeling_chatglm.py", line 767, in init
self.transformer = ChatGLMModel(config, empty_init=empty_init, device=device)
File "C:\Users\admin/.cache\huggingface\modules\transformers_modules\THUDM\chatglm2-6b\a6d54fac46dff2db65d53416c207a4485ca6bd40\modeling_chatglm.py", line 690, in init
self.embedding = init_method(Embedding, config, **init_kwargs)
File "C:\Users\admin\anaconda3\envs\chatglm2\lib\site-packages\torch\nn\utils\init.py", line 52, in skip_init
return module_cls(*args, **kwargs).to_empty(device=final_device)
File "C:\Users\admin\anaconda3\envs\chatglm2\lib\site-packages\torch\nn\modules\module.py", line 1024, in to_empty
return self._apply(lambda t: torch.empty_like(t, device=device))
File "C:\Users\admin\anaconda3\envs\chatglm2\lib\site-packages\torch\nn\modules\module.py", line 797, in _apply
module._apply(fn)
File "C:\Users\admin\anaconda3\envs\chatglm2\lib\site-packages\torch\nn\modules\module.py", line 820, in _apply
param_applied = fn(param)
File "C:\Users\admin\anaconda3\envs\chatglm2\lib\site-packages\torch\nn\modules\module.py", line 1024, in
return self.apply(lambda t: torch.empty_like(t, device=device))
File "C:\Users\admin\anaconda3\envs\chatglm2\lib\site-packages\torch\cuda_init.py", line 239, in _lazy_init
raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled

Expected Behavior

No response

Steps To Reproduce

1、conda create --name chatglm2 python=3.10
2、conda activate chatglm2
3、pip install -r requirements.txt

Environment

- OS:windows10
- Python:3.10
- Transformers:transformers==4.27.1
- PyTorch:2.0
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :
(chatglm2) F:\ChatGLM2-6B>python -c "import torch; print(torch.cuda.is_available())"
False

Anything else?

No response

[BUG/Help] <cli_demo.py seems to have unuse import readline>

Is there an existing issue for this?

I have searched the existing issues

Current Behavior

in the line 5 import readline seems useless and cause ModuleNotFoundError

Expected Behavior

No response

Steps To Reproduce

just run the cli_demo.py

Environment

- OS:win 10
- Python:3.11
- Transformers:4.27.1
- PyTorch:2.0.1+cu118
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :true

Anything else?

It seems a lapsus calami, or am I missing the function of readline?

lm-evaluation-harness无法测试

Is there an existing issue for this?

I have searched the existing issues

Current Behavior

lm-evaluation-harness/lm_eval/models/huggingface.py", line 340, in _create_auto_tokenizer
tokenizer.pad_token = tokenizer.eos_token
AttributeError: can't set attribute

Expected Behavior

No response

Steps To Reproduce

python main.py --model hf-causal-experimental --model_args pretrained=/nvme/syx/chatglm2-6b,use_accelerate=True --tasks lambada_standard --batch_size 4 --num_fewshot 0 --no_cache --write_out --output_path sf_results/hendrycksTest.json --device auto

Environment

- OS:Ubuntu
- Python:3.9.16
- Transformers:4.26.1
- PyTorch:1.12
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :True

Anything else?

No response

[BUG/Help] 已经下载chatglm2-6b模型，但是python web_demo.py抛出路径错误

Is there an existing issue for this?

I have searched the existing issues

Current Behavior

如果是：
tokenizer = AutoTokenizer.from_pretrained("chatglm2-6b/", trust_remote_code=True) model = AutoModel.from_pretrained("chatglm2-6b/", trust_remote_code=True, device='cuda')
则报错：
PS C:\Users\joven\source\Github\ChatGLM2-6B-main\ChatGLM2-6B-main> python web_demo.py Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision. Traceback (most recent call last): File "C:\Users\joven\source\Github\ChatGLM2-6B-main\ChatGLM2-6B-main\web_demo.py", line 5, in <module> tokenizer = AutoTokenizer.from_pretrained("chatglm2-6b/", trust_remote_code=True) File "C:\Users\joven\miniconda3\lib\site-packages\transformers\models\auto\tokenization_auto.py", line 663, in from_pretrained tokenizer_class = get_class_from_dynamic_module( File "C:\Users\joven\miniconda3\lib\site-packages\transformers\dynamic_module_utils.py", line 399, in get_class_from_dynamic_module return get_class_in_module(class_name, final_module.replace(".py", "")) File "C:\Users\joven\miniconda3\lib\site-packages\transformers\dynamic_module_utils.py", line 157, in get_class_in_module shutil.copy(f"{module_dir}/{module_file_name}", tmp_dir) File "C:\Users\joven\miniconda3\lib\shutil.py", line 417, in copy copyfile(src, dst, follow_symlinks=follow_symlinks) File "C:\Users\joven\miniconda3\lib\shutil.py", line 254, in copyfile with open(src, 'rb') as fsrc: FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\joven\\.cache\\huggingface\\modules\\transformers_modules\\chatglm2-6b/chatglm2-6b/tokenization_chatglm.py'

反正正反斜杠，全路径，都试过了。求好心人解答。

Expected Behavior

No response

Steps To Reproduce

根据文档步骤来的。
chatglm2-6b模型大小：23.2 GB

Environment

- OS:Windows11
- Python:py 3.10.10
- Transformers:4.27.1
- PyTorch:2.0
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :

Anything else?

No response

api.py和web_demo.py单卡OK，怎么多卡部署？

Is your feature request related to a problem? Please describe.

使用ChatGLM-6B里面的Utils.py进行多卡部署，device_map报错

Solutions

Additional context

能提供容器化部署吗？

M1芯片Mac加载模型时报错

Is there an existing issue for this?

I have searched the existing issues

Current Behavior

通过mps后端加载
tokenizer = AutoTokenizer.from_pretrained("/本地路径示例/chatglm-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("/本地路径示例/chatglm-6b", trust_remote_code=True).half().to('mps')

提示：UserWarning: MPS: no support for int64 min/max ops, casting it to int32
然后系统就一直卡住了，没有输出

已安装mps

Expected Behavior

No response

Steps To Reproduce

111

Environment

- OS:MacOS Ventura
- Python:3.10
- Transformers:
- PyTorch:
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :False

Anything else?

No response

[BUG/Help] When I save tokenizer to local directory, I got an error.

Is there an existing issue for this?

I have searched the existing issues

Current Behavior

When I save tokenizer to local directory, I got an error as follows:

My code:

from transformers import AutoTokenizer, AutoModel

model_name = 'THUDM/chatglm2-6b-int4'
model = AutoModel.from_pretrained(model_name, trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)

model_path = './model'
tokenizer.save_pretrained(model_path)
model.save_pretrained(model_path)

Error:

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ in <module>:7                                                                                    │
│                                                                                                  │
│   4 current_directory = os.path.dirname(current_file_path)                                       │
│   5 model_path = os.path.join(current_directory, 'models', model_name)                           │
│   6                                                                                              │
│ ❱ 7 tokenizer.save_pretrained(model_path)                                                        │
│   8 model.save_pretrained(model_path)                                                            │
│   9                                                                                              │
│                                                                                                  │
│ /home/yaofeng/.local/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:2205   │
│ in save_pretrained                                                                               │
│                                                                                                  │
│   2202 │   │                                                                                     │
│   2203 │   │   file_names = (tokenizer_config_file, special_tokens_map_file)                     │
│   2204 │   │                                                                                     │
│ ❱ 2205 │   │   save_files = self._save_pretrained(                                               │
│   2206 │   │   │   save_directory=save_directory,                                                │
│   2207 │   │   │   file_names=file_names,                                                        │
│   2208 │   │   │   legacy_format=legacy_format,                                                  │
│                                                                                                  │
│ /home/yaofeng/.local/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:2253   │
│ in _save_pretrained                                                                              │
│                                                                                                  │
│   2250 │   │   │   │   f.write(out_str)                                                          │
│   2251 │   │   │   │   logger.info(f"added tokens file saved in {added_tokens_file}")            │
│   2252 │   │                                                                                     │
│ ❱ 2253 │   │   vocab_files = self.save_vocabulary(save_directory, filename_prefix=filename_pref  │
│   2254 │   │                                                                                     │
│   2255 │   │   return file_names + vocab_files + (added_tokens_file,)                            │
│   2256                                                                                           │
│                                                                                                  │
│ /home/yaofeng/.cache/huggingface/modules/transformers_modules/THUDM/chatglm2-6b-int4/3cbefb15043 │
│ a18dfb3f773eba7de8c6b67c69dd6/tokenization_chatglm.py:137 in save_vocabulary                     │
│                                                                                                  │
│   134 │   │   else:                                                                              │
│   135 │   │   │   vocab_file = save_directory                                                    │
│   136 │   │                                                                                      │
│ ❱ 137 │   │   with open(self.vocab_file, 'rb') as fin:                                           │
│   138 │   │   │   proto_str = fin.read()                                                         │
│   139 │   │                                                                                      │
│   140 │   │   with open(vocab_file, "wb") as writer:                                             │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
AttributeError: 'ChatGLMTokenizer' object has no attribute 'vocab_file'

What should I do next?

Expected Behavior

No response

Steps To Reproduce

from transformers import AutoTokenizer, AutoModel

model_name = 'THUDM/chatglm2-6b-int4'
model = AutoModel.from_pretrained(model_name, trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)

model_path = './model'
tokenizer.save_pretrained(model_path)
model.save_pretrained(model_path)

Environment

- OS: Ubuntu 20.04 (WSL2)
- Python: 3.10.6
- Transformers: 4.30.2
- PyTorch: 1.13.1+cu117
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) : True

Anything else?

No response

有时候输出会很奇怪，全是逗号

Is there an existing issue for this?

I have searched the existing issues

Current Behavior

Expected Behavior

No response

Steps To Reproduce

python web_demo.py

正常聊天

Environment

- OS:Ubuntu 20.04
- Python:3.10
- Transformers:4.29.1
- PyTorch:2.01
- CUDA Support : True

Anything else?

No response

[Feature]关于微调的问题

Is your feature request related to a problem? Please describe.

目前ChatGLM2-6B的微调方式，是采用和ChatGLM-6B一样的微调方式吗？微调代码什么时候公布呀？

Solutions

None

Additional context

None

[BUG/Help] 长文本时，指令如果放在前面会被忽略

Is there an existing issue for this?

I have searched the existing issues

Current Behavior

在头部写总结下下面文章

在尾部写一个提问

实际回答了提问，没有理会头部总结下面文章

Expected Behavior

或者有什么技巧隔离开资料和指令

Steps To Reproduce

same up

Environment

- OS:linux
- Python:
- Transformers:
- PyTorch:
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :

Anything else?

none

[Feature] <title>web.py加入重复惩罚与多轮对话会更好

Is your feature request related to a problem? Please describe.

web.py有加入重复惩罚与多轮对话的计画吗?

Solutions

加入重复惩罚与多轮对话

Additional context

No response

请问商用是付费的还是免费的啊

Is your feature request related to a problem? Please describe.

No response

Solutions

如题

Additional context

No response

请问这次的千亿参数模型会开源吗

用过今年的chatglm 130B，感觉好于chatglm v1 6B

希望开发者可以提供千亿规模参数的下载/体验/申请途径😎

[Feature] 能提供模型的国内下载源吗？

Is your feature request related to a problem? Please describe.

能提供模型的国内下载源吗？

Solutions

提供一个网盘地址的下载源

Additional context

No response

[BUG/Help] <ConnectionResetError: [Errno 104] Connection reset by peer>

Is there an existing issue for this?

I have searched the existing issues

Current Behavior

Expected Behavior

ConnectionResetError: [Errno 104] Connection reset by peer
urllib3.exceptions.ProtocolError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))
requests.exceptions.ConnectionError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))
报了3个错误！如上图！！

Steps To Reproduce

1、git clone https://github.com/THUDM/ChatGLM2-6B
2、cd ChatGLM2-6B
3、pip install -r requirements.txt
4、python web_demo.py

之前安装过ChatGLM-6B，可以正常使用。

求支持！！

Environment

- OS:
- Python:
- Transformers:
- PyTorch:
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :

Anything else?

No response

[BUG/Help] numpy.object

Is there an existing issue for this?

I have searched the existing issues

Current Behavior

Issue with depreciated object of numpy

Expected Behavior

No response

Steps To Reproduce

When running the following python 3.8 script:
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained(".\THUDM\chatglm2-6b", trust_remote_code=True)
model = AutoModel.from_pretrained(".\THUDM\chatglm2-6b", trust_remote_code=True, device='cuda')
model = model.eval()
response, history = model.chat(tokenizer, "你好", history=[])
print(response)
response, history = model.chat(tokenizer, "晚上睡不着应该怎么办", history=history)
print(response)

I got:

Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a revision is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
C:\Users\polyt\AppData\Roaming\Python\Python38\site-packages\tensorboard\compat\tensorflow_stub\dtypes.py:569: FutureWarning: In the future np.object will be defined as the corresponding NumPy scalar.
(np.object, string),
Traceback (most recent call last):
File "test.py", line 3, in
model = AutoModel.from_pretrained(".\THUDM\chatglm2-6b", trust_remote_code=True, device='cuda')
File "C:\Users\polyt.conda\envs\ChatGLM2-6B\lib\site-packages\transformers\models\auto\auto_factory.py", line 462, in from_pretrained
model_class = get_class_from_dynamic_module(
File "C:\Users\polyt.conda\envs\ChatGLM2-6B\lib\site-packages\transformers\dynamic_module_utils.py", line 399, in get_class_from_dynamic_module
return get_class_in_module(class_name, final_module.replace(".py", ""))
File "C:\Users\polyt.conda\envs\ChatGLM2-6B\lib\site-packages\transformers\dynamic_module_utils.py", line 177, in get_class_in_module
module = importlib.import_module(module_path)
File "C:\Users\polyt.conda\envs\ChatGLM2-6B\lib\importlib_init_.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 1014, in _gcd_import
File "", line 991, in find_and_load
File "", line 975, in find_and_load_unlocked
File "", line 671, in load_unlocked
File "", line 843, in exec_module
File "", line 219, in call_with_frames_removed
File "C:\Users\polyt/.cache\huggingface\modules\transformers_modules\chatglm2-6b\modeling_chatglm.py", line 21, in
from transformers.modeling_utils import PreTrainedModel
File "C:\Users\polyt.conda\envs\ChatGLM2-6B\lib\site-packages\transformers\modeling_utils.py", line 83, in
from accelerate import version as accelerate_version
File "C:\Users\polyt.conda\envs\ChatGLM2-6B\lib\site-packages\accelerate_init.py", line 3, in
from .accelerator import Accelerator
File "C:\Users\polyt.conda\envs\ChatGLM2-6B\lib\site-packages\accelerate\accelerator.py", line 40, in
from .tracking import LOGGER_TYPE_TO_CLASS, GeneralTracker, filter_trackers
File "C:\Users\polyt.conda\envs\ChatGLM2-6B\lib\site-packages\accelerate\tracking.py", line 42, in
from torch.utils import tensorboard
File "C:\Users\polyt.conda\envs\ChatGLM2-6B\lib\site-packages\torch\utils\tensorboard_init.py", line 12, in
from .writer import FileWriter, SummaryWriter # noqa: F401
File "C:\Users\polyt.conda\envs\ChatGLM2-6B\lib\site-packages\torch\utils\tensorboard\writer.py", line 13, in
from tensorboard.summary.writer.event_file_writer import EventFileWriter
File "C:\Users\polyt\AppData\Roaming\Python\Python38\site-packages\tensorboard\summary_init.py", line 22, in
from tensorboard.summary import v1 # noqa: F401
File "C:\Users\polyt\AppData\Roaming\Python\Python38\site-packages\tensorboard\summary\v1.py", line 23, in
from tensorboard.plugins.histogram import summary as histogram_summary
File "C:\Users\polyt\AppData\Roaming\Python\Python38\site-packages\tensorboard\plugins\histogram\summary.py", line 35, in
from tensorboard.plugins.histogram import summary_v2
File "C:\Users\polyt\AppData\Roaming\Python\Python38\site-packages\tensorboard\plugins\histogram\summary_v2.py", line 35, in
from tensorboard.util import tensor_util
File "C:\Users\polyt\AppData\Roaming\Python\Python38\site-packages\tensorboard\util\tensor_util.py", line 20, in
from tensorboard.compat.tensorflow_stub import dtypes, compat, tensor_shape
File "C:\Users\polyt\AppData\Roaming\Python\Python38\site-packages\tensorboard\compat\tensorflow_stub_init.py", line 22, in
from .dtypes import as_dtype # noqa
File "C:\Users\polyt\AppData\Roaming\Python\Python38\site-packages\tensorboard\compat\tensorflow_stub\dtypes.py", line 569, in
(np.object, string),
File "C:\Users\polyt.conda\envs\ChatGLM2-6B\lib\site-packages\numpy_init.py", line 305, in getattr
raise AttributeError(former_attrs[attr])
AttributeError: module 'numpy' has no attribute 'object'.
np.object was a deprecated alias for the builtin object. To avoid this error in existing code, use object by itself. Doing this will not modify any behavior and is safe.
The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at:
https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations

Environment

- OS: Windows 11
- Python: 3.8.16
- Transformers: 4.27.1
- PyTorch: 2.0.1
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) : True

Anything else?

No response

[BUG]一个逻辑推理题回答错误

Is there an existing issue for this?

I have searched the existing issues

Current Behavior

假设有一个池塘，里面有无穷多的水。现有2个空水壶，容积分别为5升和6升。问题是如何只用这2个水壶从池塘里取得3升的水。

可以按照以下步骤只用这两个水壶从池塘里取得3升水:

用容积为6升的水壶装满水。
将6升水壶中的水倒入容积为5升的水壶中,此时容积为5升的水壶中有1升水。
将池塘中的水倒入容积为6升的水壶中,此时容积为6升的水壶中有1升水。
容积为6升的水壶中还剩余1升水。
将池塘中的水倒入容积为6升的水壶中,此时容积为6升的水壶中有2升水。
容积为6升的水壶中还剩余1升水。
将池塘中的水倒入容积为6升的水壶中,此时容积为6升的水壶中有3升水。
此时容积为6升的水壶中没有剩余的水。
通过以上步骤,我们可以只用这两个水壶从池塘里取得3升水。

Expected Behavior

第二步错了，应该是6升里面剩余1升

Steps To Reproduce

input the question on the

Environment

- OS:linux
- Python:
- Transformers:
- PyTorch:
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :

Anything else?

No response

RuntimeError: "arange_cpu" not implemented for 'Half'

Is your feature request related to a problem? Please describe.

有朋友报这种错吗，是因为torch版本不对导致的吗

Solutions

有朋友报这种错吗，是因为torch版本不对导致的吗

Additional context

有朋友报这种错吗，是因为torch版本不对导致的吗

[Help] 关于readme中优化显存占用的描述

Is there an existing issue for this?

I have searched the existing issues

Current Behavior

请问下这句话怎么理解：
“此外，ChatGLM2-6B 采用 Causal Mask 进行对话训练，连续对话时可复用前面轮次的 KV Cache，进一步优化了显存占用。”，

为什么说采用 Causal Mask 进行对话训练，连续对话时可复用前面轮次的 KV Cache能够优化显存占用？
谢谢！

Expected Behavior

No response

Steps To Reproduce

请问下这句话怎么理解：
“此外，ChatGLM2-6B 采用 Causal Mask 进行对话训练，连续对话时可复用前面轮次的 KV Cache，进一步优化了显存占用。”，

为什么说采用 Causal Mask 进行对话训练，连续对话时可复用前面轮次的 KV Cache能够优化显存占用？
谢谢！

Environment

- OS:
- Python:
- Transformers:
- PyTorch:
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :

Anything else?

No response

SFT 训练数据量及类型分布

Is your feature request related to a problem? Please describe.

No response

Solutions

请问是否有关于 SFT 训练数据量、数据类型分布的技术报告？

Additional context

No response

请问基座模型 ChatGLM2-6B (base)未来会开源吗？

当前的模型权重应该是base模型经过RLHF后的版本，请问base模型未来有开源的计划吗？如果有，请问基座模型的开源协议（含商用）是否与当前已开源的ChatGLM2-6B一致？

[BUG/Help] 介绍上说支持 cpu，其实不支持

Is there an existing issue for this?

I have searched the existing issues

Current Behavior

介绍上说支持 cpu，其实不支持，连相关文件都没得了
复制 1 中的相关文件过去，还是不行。
发现 quantization.py 中根本没有 cpu 相关内容
quantization.py 替换成老的，运行提示：

AttributeError: 'GLMTransformer' object has no attribute 'device'

Expected Behavior

支持纯 cpu 运行

Steps To Reproduce

web_demo.py 中修改为
model = AutoModel.from_pretrained("THUDM/chatglm2-6b-int4", trust_remote_code=True).float()
然后运行
python .\web_demo.py

Environment

- OS: Windows 11
- Python: 3.10.0
- Transformers: 4.27.1
- PyTorch: 2.0.0
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) : False

Anything else?

No response

[BUG/Help] 用什么指令可以让模型写反义句

Is there an existing issue for this?

I have searched the existing issues

Current Behavior

给的指令是
"""请写出下面句子的5个反义句
民生银行应向客户事先送达有关机关的协助冻结、扣划存款通知书，方有权办理冻结、扣划手续。
"""

得到如下回复：

事先送达有关机关的协助冻结、扣划存款通知书不是民生银行的义务。
民生银行没有权利办理冻结、扣划手续，除非事先送达有关机关的协助冻结、扣划存款通知书。
向客户送达有关机关的协助冻结、扣划存款通知书不是民生银行的职责。
民生银行不能办理冻结、扣划手续，除非事先得到有关机关的协助。
事先送达有关机关的协助冻结、扣划存款通知书是民生银行的限制条件。

感觉并没有把反义写出来。不知构造什么样的指令能让它工作得好点？

Expected Behavior

No response

Steps To Reproduce

temperature = 0.95
max_length=600

original_llm = load_model_on_gpus(original_model_path, config)
original_llm.eval()

original_response, _ = original_llm.chat(tokenizer=tokenizer, query=query,max_length=max_length, temperature=temperature)

Environment

- OS: centos 7.4
- Python: 3.9
- Transformers:4.29.2
- PyTorch:1.13.1+cu117
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) : yes

Anything else?

No response

chatglm2-6b-int4的模型，为什么需要12G以上GPU内存？

Is there an existing issue for this?

I have searched the existing issues

Current Behavior

chatglm-6b-int4只需要7G的GPU内存，就可以运行。
但是chatglm2-6b-int4实测需要12G以上的GPU内存。
有没有办法可以降低到7G左右的GPU内存?

Expected Behavior

No response

Steps To Reproduce

===web_demo.py===
tokenizer = AutoTokenizer.from_pretrained("/data/chatglm/chatglm2-6b-int4", trust_remote_code=True)
model = AutoModel.from_pretrained("/data/chatglm/chatglm2-6b-int4", trust_remote_code=True, device='cuda')
===web_demo.py===

python web_demo.py

Environment

- OS:Ubuntu 22.04.1 LTS
- Python:3.9.17
- Transformers:4.27.1
- PyTorch:2.0.1+cu117
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :

Anything else?

No response

[Help] 如何进行ptuning训练

Is there an existing issue for this?

I have searched the existing issues

Current Behavior

训练后报错，AttributeError: 'ChatGLMModel' object has no attribute 'prefix_encoder'

Expected Behavior

No response

Steps To Reproduce

运行chatglm之前版本的ptuning来进行训练，修改了基底模型

Environment

- OS:
- Python:
- Transformers:
- PyTorch:
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :

Anything else?

No response

[Feature] <模型文件是否有提供国内较快的下载地址>

Is your feature request related to a problem? Please describe.

No response

Solutions

辛苦提供国内下载速度较快的模型文件地址，类似chatglm-6b中的清华云地址，谢谢~

Additional context

No response

[Feature] 11GB 显存还是报CUDA out of memory

Is your feature request related to a problem? Please describe.

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 108.00 MiB (GPU 0; 11.00 GiB total capacity; 10.29 GiB already allocated; 0 bytes free; 10.29 GiB reserved in total by PyTorch)

Solutions

要能跑起来该怎么设置或者修改代码呢？谢谢！

Additional context

No response

[BUG/Help] 无法正常调用model.get_input_embeddings()

Is there an existing issue for this?

I have searched the existing issues

Current Behavior

无法正常调用model.get_input_embeddings()，原因是AttributeError: 'ChatGLMModel' object has no attribute 'transformer'。由于huggingface官方给了getattr()函数一个默认值，导致错误信息比较少，只返回一个not implemented error。删掉默认值就可以发现真正错误原因。
目测get_input_embeddings()方法是通过transformer这个属性作为路径往回找embedding，但新版的模型失去了这个路径，所以取不到embedding。求修复！

Expected Behavior

No response

Steps To Reproduce

具体的错误复现很简单，只要加载chatglm2版本的模型，再使用model.get_input_embeddings()就可以。

Environment

- OS: linux
- Python: 3.9
- Transformers: 4.30.1
- PyTorch: 2+
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :

Anything else?

No response

[Feature] 希望能有ptuning 支持

Is your feature request related to a problem? Please describe.

尝试使用旧版本官方ptuning代码进行微调，结果报错，估计是因为模型结构改变了，报错信息如下：

│   130 │   if model_args.pre_seq_len is not None:                                                 │
│   131 │   │   # P-tuning v2                                                                      │
│   132 │   │   model = model.half()                                                               │
│ ❱ 133 │   │   model.transformer.prefix_encoder.float()                                           │
│   134 │   else:                                                                                  │
│   135 │   │   # Finetune                                                                         │
│   136 │   │   model = model.float()

Solutions

希望以后官方能放出词版本的 ptuning代码，谢谢。

Additional context

No response

[BUG/Help] 疑似测试数据泄露，llm榜单结果不够靠谱

Is there an existing issue for this?

I have searched the existing issues

Current Behavior

Expected Behavior

No response

Steps To Reproduce

疑似测试数据泄露，llm榜单结果不够靠谱

Environment

- OS:
- Python:
- Transformers:
- PyTorch:
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :

Anything else?

[BUG/Help] 模型微调时的输入格式是什么？

Current Behavior

(1) ChatGLM我采用的是 "输入[gMASK] <s>输出</s>"，ChatGLM2是不是不再需要'[gMask]'，甚至不需要<s>? 能给一个模板供参考么？比如说我希望有"instruct: 需求; input: 输入信息; output: 输出"，应该如何设计模板以契合原本的训练过程？
(2) 另外，目前模型通过tokenizer.bos_token_id无法调用到bos（其他特殊符类似），而实际上词表中是有bos的，可以麻烦合并一下么？
谢谢！

Expected Behavior

No response

Steps To Reproduce

print(tokenizer.get_vocab()['<s>'])
print(tokenizer.bos_token_id)

[Help] <ChatGLM2-6B 和 ChatGLM2-6B (base) 有什么区别呢>

Is there an existing issue for this?

I have searched the existing issues

Current Behavior

看到实验对比，有一个疑问。

ChatGLM2-6B和 ChatGLM2-6B (base) 有什么区别呢？本次开放的是 ChatGLM2-6B吧

Expected Behavior

No response

Steps To Reproduce

没有。

Environment

- OS:
- Python:
- Transformers:
- PyTorch:
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :

Anything else?

No response

cli_demo.py在进行几轮对话之后，会出现TypeError

Is there an existing issue for this?

I have searched the existing issues

Current Behavior

Traceback (most recent call last):
File "/root/ChatGLM-6B-main/cli_demo.py", line 58, in
main()
File "/root/ChatGLM-6B-main/cli_demo.py", line 43, in main
for response, history in model.stream_chat(tokenizer, query, history=history):
File "/root/miniconda3/envs/myconda/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 35, in generator_context
response = gen.send(None)
File "/root/.cache/huggingface/modules/transformers_modules/chatglm2-6b/modeling_chatglm.py", line 964, in stream_chat
inputs = self.build_inputs(tokenizer, query, history=history)
File "/root/.cache/huggingface/modules/transformers_modules/chatglm2-6b/modeling_chatglm.py", line 917, in build_inputs
inputs = tokenizer([prompt], return_tensors="pt")
File "/root/miniconda3/envs/myconda/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 2548, in call
encodings = self._call_one(text=text, text_pair=text_pair, **all_kwargs)
File "/root/miniconda3/envs/myconda/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 2634, in _call_one
return self.batch_encode_plus(
File "/root/miniconda3/envs/myconda/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 2825, in batch_encode_plus
return self._batch_encode_plus(
File "/root/miniconda3/envs/myconda/lib/python3.9/site-packages/transformers/tokenization_utils.py", line 733, in _batch_encode_plus
first_ids = get_input_ids(ids)
File "/root/miniconda3/envs/myconda/lib/python3.9/site-packages/transformers/tokenization_utils.py", line 700, in get_input_ids
tokens = self.tokenize(text, **kwargs)
File "/root/miniconda3/envs/myconda/lib/python3.9/site-packages/transformers/tokenization_utils.py", line 547, in tokenize
tokenized_text.extend(self._tokenize(token))
File "/root/.cache/huggingface/modules/transformers_modules/chatglm2-6b/tokenization_chatglm.py", line 104, in _tokenize
return self.tokenizer.tokenize(text)
File "/root/.cache/huggingface/modules/transformers_modules/chatglm2-6b/tokenization_chatglm.py", line 32, in tokenize
return self.sp_model.EncodeAsPieces(s)
File "/root/miniconda3/envs/myconda/lib/python3.9/site-packages/sentencepiece/init.py", line 545, in EncodeAsPieces
return self.Encode(input=input, out_type=str, **kwargs)
File "/root/miniconda3/envs/myconda/lib/python3.9/site-packages/sentencepiece/init.py", line 531, in Encode
return self._EncodeAsPieces(input, enable_sampling, nbest_size,
File "/root/miniconda3/envs/myconda/lib/python3.9/site-packages/sentencepiece/init.py", line 316, in _EncodeAsPieces
return _sentencepiece.SentencePieceProcessor__EncodeAsPieces(self, text, enable_sampling, nbest_size, alpha, add_bos, add_eos, reverse, emit_unk_piece)
TypeError: not a string

Expected Behavior

No response

Steps To Reproduce

直接运行了cli_demo.py并且在前两轮对话正常，后面会出现TypeError

Environment

- OS:
- Python:3.9.16
- Transformers:4.29.1
- PyTorch:2.0
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :

Anything else?

No response

M1 pro芯片Mac加载模型问题

Is there an existing issue for this?

I have searched the existing issues

Current Behavior

请教各位老师

Expected Behavior

No response

Steps To Reproduce

cli_demo和Web_demo代码调整为：
tokenizer = AutoTokenizer.from_pretrained("model/chatglm2-6b-int4", trust_remote_code=True)
model = AutoModel.from_pretrained("model/chatglm2-6b-int4", trust_remote_code=True).to('mps')
（已经安装mac最新的PyTorch-Nightly和transformers=4.30)
报错信息是cuda相关，信息如下：
Failed to load cpm_kernels:Unknown platform: darwin
Traceback (most recent call last):
File "/Users/lwo2002/ChatGLM2-6B/cli_demo.py", line 8, in
model = AutoModel.from_pretrained("model/chatglm2-6b-int4", trust_remote_code=True).to('mps')
File "/Users/lwo2002/miniforge3/envs/chatglm2/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 479, in from_pretrained
return model_class.from_pretrained(
File "/Users/lwo2002/miniforge3/envs/chatglm2/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2675, in from_pretrained
model = cls(config, *model_args, **model_kwargs)
File "/Users/lwo2002/.cache/huggingface/modules/transformers_modules/chatglm2-6b-int4/modeling_chatglm.py", line 772, in init
self.quantize(self.config.quantization_bit, empty_init=True)
File "/Users/lwo2002/.cache/huggingface/modules/transformers_modules/chatglm2-6b-int4/modeling_chatglm.py", line 1105, in quantize
self.transformer.encoder = quantize(self.transformer.encoder, bits, empty_init=empty_init, device=device,
File "/Users/lwo2002/.cache/huggingface/modules/transformers_modules/chatglm2-6b-int4/quantization.py", line 157, in quantize
weight=layer.self_attention.query_key_value.weight.to(torch.cuda.current_device()),
File "/Users/lwo2002/miniforge3/envs/chatglm2/lib/python3.10/site-packages/torch/cuda/init.py", line 707, in current_device
_lazy_init()
File "/Users/lwo2002/miniforge3/envs/chatglm2/lib/python3.10/site-packages/torch/cuda/init.py", line 258, in _lazy_init
raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled
请各位老师指导，谢谢。

Environment

- OS:mac m1 pro 
- Python:3.10
- Transformers:4.30.2
- PyTorch:PyTorch-Nightly
- mps Support:

Anything else?

No response

no special tokens

Is there an existing issue for this?

I have searched the existing issues

Current Behavior

eos token is none

Expected Behavior

No response

Steps To Reproduce

SFT model

Environment

- OS:
- Python:
- Transformers:
- PyTorch:
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :

Anything else?

cat tokenizer_config.json
{
"name_or_path": "THUDM/chatglm2-6b",
"bos_token": "",
"eop_token": "",
"eos_token": "",
"gmask_token": "[gMASK]",
"mask_token": "[MASK]",
"pad_token": "",
"unk_token": "",
"remove_space": false,
"do_lower_case": false,
"tokenizer_class": "ChatGLMTokenizer",
"num_image_tokens": 0,
"auto_map": {
"AutoTokenizer": [
"tokenization_chatglm.ChatGLMTokenizer",
null
]
}
}

实现了chatglm2-6b模型LoRA微调

实现了 ChatGLM2-6B 的lora微调，可以用来做领域微调，它的SFT微调方法跟chatglm基本一致，需要改下special tokens, lm_head 和enable_input_require_grads 就可以适配（下面代码改好了的）。

支持THUDM/chatglm2-6b微调项目地址：https://github.com/shibing624/MedicalGPT

该项目还实现了GPT模型训练，包括二次预训练、有监督微调、奖励建模、强化学习训练。

运行以下指令即可实现 belle 数据集指令微调（instruction-tuning）：

CUDA_VISIBLE_DEVICES=0 python3 supervised_finetuning.py \
    --model_type chatglm \
    --model_name_or_path THUDM/chatglm2-6b \
    --train_file_dir ./data/finetune \
    --validation_file_dir ./data/finetune \
    --per_device_train_batch_size 2 \
    --per_device_eval_batch_size 1 \
    --do_train \
    --do_eval \
    --use_peft True \
    --fp16 \
    --max_train_samples 1000 \
    --max_eval_samples 10 \
    --num_train_epochs 1 \
    --learning_rate 2e-5 \
    --warmup_ratio 0.05 \
    --weight_decay 0.05 \
    --logging_strategy steps \
    --logging_steps 10 \
    --eval_steps 50 \
    --evaluation_strategy steps \
    --save_steps 500 \
    --save_strategy steps \
    --save_total_limit 3 \
    --gradient_accumulation_steps 1 \
    --preprocessing_num_workers 1 \
    --max_source_length 128 \
    --max_target_length 128 \
    --output_dir outputs-sft-chatglm2-6b-v1 \
    --overwrite_output_dir \
    --ddp_timeout 30000 \
    --logging_first_step True \
    --target_modules query_key_value \
    --lora_rank 8 \
    --lora_alpha 16 \
    --lora_dropout 0.05 \
    --torch_dtype float16 \
    --device_map auto \
    --report_to tensorboard \
    --ddp_find_unused_parameters False \
    --gradient_checkpointing True

eval_loss可以稳定下降，模型预测测试ok。

Additional context

No response

[BUG/Help] <CausalLM训练过程输入部分的loss是否回传>

Is there an existing issue for this?

I have searched the existing issues

Current Behavior

@Sengxian @cenyk1230 请问CausalLM训练过程输入部分的loss是否回传？输入部分是否也计算loss，多轮对话如果只传递一条完整样本loss是不是也全部回传？

Expected Behavior

No response

Steps To Reproduce

CausalLM训练过程输入部分的loss是否回传？输入部分是否也计算loss，多轮对话如果只传递一条完整样本loss是不是也全部回传？

Environment

- OS:
- Python:
- Transformers:
- PyTorch:
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :

Anything else?

No response

[BUG/Help] <title>是否可以复用ChatGLM-6B的方式做微调？性能如何？

Is there an existing issue for this?

I have searched the existing issues

Current Behavior

指 ptuning-v2 的方式：https://github.com/THUDM/ChatGLM-6B/blob/main/ptuning/README.md

Expected Behavior

No response

Steps To Reproduce

Not yet.

Environment

OS: Ubuntu 20.04
Python: 3.8
Transformers: 4.26.1
PyTorch: 1.12
CUDA Support: True

Anything else?

No response

[Help] 关于GLM训练是否还使用ACL2022原始论文中的那个Self-attention mask

Is there an existing issue for this?

I have searched the existing issues

Current Behavior

https://arxiv.org/pdf/2103.10360.pdf
ACL2022 GLM的特殊预训练任务设计

Expected Behavior

No response

Steps To Reproduce

非常好的工作！问一下chatglm现在的训练工作还用到ACL2022那篇文章中GLM的特殊的self-attention mask预训练任务设计吗？

Environment

- OS:
- Python:
- Transformers:
- PyTorch:
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :

Anything else?

No response

提取姓名时，有时会把提示history中的结果带回来

Is there an existing issue for this?

I have searched the existing issues

Current Behavior

history = [("张三出生于北京。","姓名:[张三]"),
("李四接见五五","姓名:[李四，王五]")]

当传入这样的提示时，提取的姓名中会把张三，李四和王五返回

Expected Behavior

No response

Steps To Reproduce

text = """
这里是长度为2k的文本
"""
query_cn = f"""
{text}\n\n
你是一个优秀的语言学家，任务是从给定的句子中标记多个人实体。你从上面的文本中，提取人的姓名，以例子中的格式输出。
"""
history = [("张三出生于北京。","姓名:[张三]"),
("李四接见五五","姓名:[李四，王五]")]
data = {"query": query_cn, "history": history}

返回的结果中，除text中的姓名外，还会带有张三，李四，王五

Environment

- OS:Ubuntu 20.04
- Python:3.8
- Transformers:4.27.1
- PyTorch:2.0.0
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :True

Anything else?

No response

内存还是不足，有人可以分享量化后的模型吗？

Is there an existing issue for this?

I have searched the existing issues

Current Behavior

这是之前 chatglm-6b-int4 的代码：

# 默认情况下，模型以 FP16 精度加载，运行上述代码需要大概 13GB 显存。
# model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).half().cuda()

# 按需修改，目前只支持 4/8 bit 量化
# model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).quantize(4).half().cuda()

# 如果你的内存不足的话，可以直接加载量化后的模型，仅需大概 5.2GB 的内存：
model = AutoModel.from_pretrained("THUDM/chatglm-6b-int4", trust_remote_code=True).half().cuda()

Expected Behavior

No response

Steps To Reproduce

【占位符】

Environment

- OS: Linux 5.15.107+
- Python: 3.10.12
- Transformers: 4.27.1
- PyTorch: 2.0.1+cu118
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) : True

Anything else?

【占位符】