kmno4-zx / huanhuan-chat Goto Github PK

View Code? Open in Web Editor NEW

373.0 373.0 38.0 177.29 MB

Chat-甄嬛是利用《甄嬛传》剧本中所有关于甄嬛的台词和语句，基于ChatGLM2进行LoRA微调得到的模仿甄嬛语气的聊天语言模型。

Python 4.61% Shell 0.11% Jupyter Notebook 95.28%

huanhuan-chat's Introduction

My Visitor.

About Me

🌱 I'm a graduate student, studying in Henan Polytechnic University (HPU), and leaving soon.
💬 My research direction is global discrete grid and method of generation Voronoi diagram
⭐ Datawhale members, InternLM community IOPMC SIG RAG Manager

github contribution grid snake animation

Interest 👨🏽‍💻

NLP : Large Language Model, Prompt Engineering and more things about NLP.
Language : Python, Pytorch and Markdown.
Theory : Voronoi diagram algorithm.

Open Source Experience 👯

As a manager ：

self-llm ： 《开源大模型食用指南》 更适合中国宝宝的大模型部署指南，Github 4.3k star，并多次登顶 Github Trending！
huanhuan-chat : A chat robot based on ChatGLM2 likes Zhenhuan。
AMchat ： AM (Advanced Mathematics) chat 是一个集成了数学知识和高等数学习题及其解答的大语言模型。
d2l-ai-solutions-manual : The answers to the exercises after class Dive into Deep Learning. 动手学深度学习 习题解答项目~
tiny-universe : 《大模型白盒子构建指南》致力于从零手搓大模型相关任务，如：RAG、Agent、Eval等等。

As a member :

prompt-engineering-for-developers : A chinese tutorial of ChatGPT for Developers, contains three classes of AndrewNg.
Datawhale NLP 夏令营-Baseline: A baseline for the iFLYTEK algorithm competition, a core class in DataWhale AI summer camp.4
InternLM-tutorial ：书生·浦语大模型实战营，第二节《轻松分钟玩转书生·浦语大模型趣味 Demo》负责人~

Competition

chat嬛嬛，获得书生·浦语大模型挑战赛（春季赛）Top12，创意应用奖。
AMchat，获得书生·浦语大模型挑战赛（春季赛）Top12，创意应用奖。
科大讯飞算法赛-基于论文摘要的文本分类与关键词抽取挑战赛: NLP competition of iFLYTEK, Top 3.
科大讯飞算法赛-人岗匹配挑战赛: NLP competition of iFLYTEK, Top3.
星火杯认知大模型场景创新赛：LLM competition of iFLYTEK, Top 50, still in competition.

GitHub Stats 📫

huanhuan-chat's People

Contributors

Stargazers

Watchers

huanhuan-chat's Issues

Why is the input text written in "Instruction" instead of "input"?

微调后没起作用

hi，我用默认的lora微调脚本微调2400step后，使用gui部署。向他提问“你是谁”，回答是清华大学的聊天机器人。而使用仓库预训练的lora模型则回答是嬛嬛。这看上去是训练没起到作用？

pip install -r requirement.txt 安装需要进行如下的修改

直接下载回来安装的 requirement.txt 会存在多个包依赖的报错；
修改以下几个包的版本信息可以顺利安装：

diff requirements.txt requirements.txt.bak
59,61c59,61
< mkl-fft
< mkl-random
< mkl-service

mkl-fft==1.3.6
mkl-random==1.2.2
mkl-service==2.4.0
69c69
< numpy==1.24.4

numpy==1.25.2
128,129c128,129
< torchaudio
< torchvision

torchaudio==0.12.1
torchvision==0.13.1

24G GPU can use for classffication in this project?

关于huanhuan-chat微调报错的问题

因为刚入门大模型有很多东西不是很懂，我想咨询一下关于微调时候报错NotImplementedError: Loading a dataset cached in a LocalFileSystem is not supported.的问题。您有时间解答一下吗？谢谢！

训练参数没有改，错误如下：
从../dataset/train/lora/huanhuan.json加载数据集失败

NotImplementedError( "Loading a dataset cached in a LocalFileSystem is not supported. ")

搭建一个小程序/网站

哈喽！huanhuan-chat创始人你好！我github用的不多所以不太懂怎么私信，但你感兴趣把这个包装出一个全栈的项目嘛？目前来说让这个模型在本地跑起来还是挺需要对计算机的了解的，而大部分看甄嬛传的估计也不懂。围绕他做一个网站或者小程序可以让他有更大的传播度。如果你感兴趣的话欢迎你联系我！

几个问题

hi，很棒的开源项目，我最近对此也比较感兴趣，有几个问题想请教一下：

我看训练数据中对于“你是谁”和“你是”两个问题进行了重采样，是强行让模型学到这部分知识吗？我在想可以通过提示词的方式去控制人设吗？
微调后人物语气生成的场景受问题影响大吗，比如某些问题生成会差一些之类的吗？
sft之后会有灾难性遗忘吗？

error: the following arguments are required: --output_dir

请问train.py中的output_dir该怎么修改

Error Report: pip install no such file or directory

pip install -r requirements.txt 报错 no such file or directory

requirements.txt 内引入了很多静态路径, 部分如下: 想问下是在虚拟机还是云台内运行? 还是 ?

bash generation.sh报错：Error communicating with OpenAI:exceeded with url: vl/chat/completions (Caused by SSLError(CertificateError("hostnameapi.openai.com'

cd ~/huanhuan-chat/generation_dataset
bash generation.sh
报错如下：

本项目用到了ChatGPT的api功能吗？是不是必须开通Plus才能使用？
还有，即便开通了Plus，如果网络还是连不上，该怎么办呢？

使用多卡微调时，提示数据不在同一张显卡上，报错：
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:7 and cuda:0! (when checking argument for argument target in method wrapper_CUDA_nll_loss_forward)