Giter Club home page Giter Club logo

llm_rlhf's Introduction

llm reinforcement learning

Realize the reinforcement learning training for gpt2 llama bloom,cpm-ant and so on.

update information

    06-13 fix llama resize_token_embeddings
    06-01 支持lora deepspeed 训练,0.1.9 和 0.1.10合并
    05-27 add qlora transformers>=4.30

install

python >= 3.10
  • pip install -U -r requirements.txt
  • 如果无法安装,可以切换官方源 pip install -i https://pypi.org/simple -U -r requirements.txt

weigtht select one is suitable for you

支持且不限于以下权重

data sample

{
	"prompt": "\n\nHuman:你能描述如何清除笔记本计算机屏幕上的指纹和污垢的步骤吗?\n\nAssistant:是的,也许。为了清洁你的屏幕,你首先需要使用微纤维布或软溼布,轻轻地擦拭屏幕表面。接下来,你会想抓住一个软,无毛巾,微纤维清洁布,轻轻地擦回屏幕上去除指纹和污垢。\n\nHuman:我可以将伊索普罗素酒精喷射到布上,并以这种方式清洗它吗?\n\nAssistant:",
	"response": " 是的,你可以这样做,以帮助布从屏幕上收集更多的污垢。 请确保始终使用干净的软布,而不是一块粗糙的、粗糙的或结构化的材料,并确保它是无毛布的。",
	"chosen": " 是的,你可以这样做,以帮助布从屏幕上收集更多的污垢。 请确保始终使用干净的软布,而不是一块粗糙的、粗糙的或结构化的材料,并确保它是无毛布的。",
	"rejected": " 是的,你可以直接将它喷射到布上。"
}

infer

# infer_finetuning.py 推理微调模型
# infer_lora_finetuning.py 推理微调模型
# infer_ptuning.py 推理p-tuning-v2微调模型
 python infer_finetuning.py

training

    #制作数据
    python data_utils.py
    注: num_process_worker 为多进程制作数据 , 如果数据量较大 , 适当调大至cpu数量
    dataHelper.make_dataset_with_args(data_args.train_file,mixed_data=False, shuffle=True,mode='train',num_process_worker=0)
    
    #训练
    python train.py

训练参数

训练参数

友情链接

纯粹而干净的代码

llm_rlhf's People

Contributors

ssbuild avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

yyht

llm_rlhf's Issues

原理请教

rewardmodel是打分模型,可否用人工代替?
若人工代替,则只需要组建三元组和对应分数,即可用强化学习的思路训练模型对吗?

conflicting dependencies

Requirement already satisfied: py-cpuinfo in /home/liuhaiying/anaconda3/envs/ss/lib/python3.10/site-packages (from deepspeed->-r requirements.txt (line 3)) (9.0.0)
Requirement already satisfied: pydantic<2.0.0 in /home/liuhaiying/anaconda3/envs/ss/lib/python3.10/site-packages (from deepspeed->-r requirements.txt (line 3)) (1.10.7)
Requirement already satisfied: torch in /home/liuhaiying/anaconda3/envs/ss/lib/python3.10/site-packages (from deepspeed->-r requirements.txt (line 3)) (2.0.1)
INFO: pip is looking at multiple versions of aigc-zoo to determine which version is compatible with other requirements. This could take a while.
ERROR: Cannot install -r requirements.txt (line 7) and deep_training<0.1.12 and >=0.1.11 because these package versions have conflicting dependencies.

The conflict is caused by:
The user requested deep_training<0.1.12 and >=0.1.11
aigc-zoo 0.2.0.post1 depends on deep_training~=0.2.0.post0

To fix this you could try to:

  1. loosen the range of package versions you've specified
  2. remove package versions to allow pip attempt to solve the dependency conflict

ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts

[notice] A new release of pip is available: 23.1.2 -> 23.2.1
[notice] To update, run: pip install --upgrade pip

zero_to_fp32.py 权重转换失败

│ /data1/yrh/llm_rlhf/ilql_rlhf/zero_to_fp32.py:498 in get_fp32_state_dict_from_zero_checkpoint │
│ │
│ 495 │ │ │ with open(latest_path, 'r') as fd: │
│ 496 │ │ │ │ tag = fd.read().strip() │
│ 497 │ │ else: │
│ ❱ 498 │ │ │ raise ValueError(f"Unable to find 'latest' file at {latest_path}") │
│ 499 │ │
│ 500 │ ds_checkpoint_dir = os.path.join(checkpoint_dir, tag)

ValueError: Unable to find 'latest' file

修改为latest

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 64: invalid start byte

About Qlora

Now I have implemented Qlora for SFT and reward model but I am quite confused when I do Qlora for PPO, do you plan to integrate PPO into repo?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.