Giter Club home page Giter Club logo

alpaca-rlhf's People

Contributors

l294265421 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

alpaca-rlhf's Issues

Fix pad_token_id bug

很感谢您的代码!
关于
alpaca_rlhf/deepspeed_chat/training/utils/data/data_utils.py#DataCollatorRLHF#call
Fix pad_token_id bug
有一个疑惑的地方,可以看到data_utils.py中class PromptDataset(Dataset)函数最后一行,step3的return为
self.prompt_dataset[idx]["input_ids"],self.prompt_dataset[idx]["attention_mask"], self.pad_token_id
所以data[-1][-1]应该就是self.pad_token_id,原作者代码应该是没有bug的。
希望作者这里也可以确认一下~是否是我理解的bug

A question about setting tokens

why set tokenizer.pad_token_id = 0 ?
llama model vocabl pad_token="<0x00>": 3 ,unk_token="": 0.
Why not set it to 3 here?
I think it should be set to tokenizer.pad_token_id = 3.
I hope everyone can answer for me,thank

v100 step3 oom

您好,我在运行第三步的时候一直报oom这个错误,具体信息如下
File "/opt/conda/envs/mplug_owl/lib/python3.10/site-packages/torch/nn/modules/module.py", line 987, in convert
return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 250.00 MiB (GPU 6; 31.75 GiB total capacity; 30.95 GiB already allocated; 21.75 MiB free; 30.98 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

我使用的是8卡的v100,我将batchsize相关的都调成1了,max_seq_len设小了,lora_dim设成了1,lora_module_name设置成q_proj,请问还有其他办法来优化吗?

deepspeed.initialize的一些疑惑

您好,请教您一个问题,我在deepspeed.initialize的时候发现,一旦initialize后模型的权重就自动为空了,这个符合预期吗?尤其在训练开始之前有个地方还要执行_generate_sequence。感觉非常疑惑

Steps

Hey how are you ? First of all thank you for us to provide this repo. I have same question for steps.

Are we going to choose every step here one by one? Are we going step by step?

Or will we choose one of these steps and test the results accordingly?

Also, I want to design a Chatbot in a ConversationAI style. How should the data be for this? It keeps it as generate as History, but how do we set them in the data? Well, I’m creating them in my mind. Can you help me with this too??

If there is anything I can’t think of or you want to contribute, I would appreciate it if you add it.

Thank you for everthing

stop at step2 evaluation_reward

Firstly, thank you for your contributions. I consistently pause (but do not exit) at the evaluation_reward during the training of step 2. Hence, I am wondering if there is something wrong. Perhaps the condition args.global_rank == 0 is unnecessary? Any suggestions would be greatly appreciated. Thank you.

v100训练时显存oom

您好,我用v100训练sft和rm时都说显存不够无法运行,具体报错信息如下:
OutOfMemoryError: CUDA out of memory. Tried to allocate 16.00 MiB (GPU 1; 31.75
GiB total capacity; 29.88 GiB already allocated; 11.75 MiB free; 29.98 GiB
reserved in total by PyTorch) If reserved memory is >> allocated memory try
setting max_split_size_mb to avoid fragmentation. See documentation for Memory

我已经将per_device_train_batch_size和per_device_eval_batch_size调到1了,但仍然提示说显存不够,请问有什么办法解这个问题吗?

训练效果怎么样

作者,你好,进行RLHF后,模型效果怎么样的,在什么方面上会有提升?

reward model在v100上训练时会卡住不动

step: 82 loss:0.83251953125, correct_predictions: 0.0, reward: -0.50390625 r_reward: -0.487060546875
step: 83 loss:0.76611328125, correct_predictions: 0.0, reward: -0.492919921875 r_reward: -0.492431640625
step: 84 loss:0.7578125, correct_predictions: 0.0, reward: -0.5439453125 r_reward: -0.5361328125
step: 85 loss:0.83251953125, correct_predictions: 1.0, reward: -0.464111328125 r_reward: -0.467529296875
step: 86 loss:1.537109375, correct_predictions: 1.0, reward: -0.509765625 r_reward: -0.51708984375
step: 87 loss:0.6142578125, correct_predictions: 0.0, reward: -0.5087890625 r_reward: -0.48291015625
step: 88 loss:0.5380859375, correct_predictions: 0.0, reward: -0.451171875 r_reward: -0.44921875
[2023-05-17 14:28:38,358] [INFO] [logging.py:96:log_dist] [Rank 0] step=90, skipped=7, lr=[0.0004994156634161006], mom=[(0.9, 0.95)]
[2023-05-17 14:28:38,359] [INFO] [timer.py:199:stop] epoch=0/micro_step=90/global_step=90, RunningAvgSamplesPerSec=14.808713117576154, CurrSamplesPerSec=14.958361511223531, MemAllocated=12.34GB, MaxMemAllocated=22.87GB
step: 89 loss:0.67333984375, correct_predictions: 1.0, reward: -0.435302734375 r_reward: -0.43994140625
step: 90 loss:0.35107421875, correct_predictions: 1.0, reward: -0.421875 r_reward: -0.457275390625
step: 91 loss:0.7763671875, correct_predictions: 1.0, reward: -0.439453125 r_reward: -0.442138671875
step: 92 loss:0.69091796875, correct_predictions: 1.0, reward: -0.440185546875 r_reward: -0.46826171875
step: 93 loss:0.355712890625, correct_predictions: 1.0, reward: -0.432373046875 r_reward: -0.455078125
step: 94 loss:0.607421875, correct_predictions: 1.0, reward: -0.425537109375 r_reward: -0.427734375
step: 95 loss:0.87060546875, correct_predictions: 0.0, reward: -0.4775390625 r_reward: -0.468017578125
step: 96 loss:0.7841796875, correct_predictions: 1.0, reward: -0.39013671875 r_reward: -0.404541015625
step: 97 loss:1.23828125, correct_predictions: 0.0, reward: -0.40869140625 r_reward: -0.36572265625
step: 98 loss:0.87890625, correct_predictions: 0.0, reward: -0.445556640625 r_reward: -0.42333984375
[2023-05-17 14:28:43,804] [INFO] [logging.py:96:log_dist] [Rank 0] step=100, skipped=7, lr=[0.0004992664502959351], mom=[(0.9, 0.95)]
[2023-05-17 14:28:43,805] [INFO] [timer.py:199:stop] epoch=0/micro_step=100/global_step=100, RunningAvgSamplesPerSec=14.80846616069343, CurrSamplesPerSec=14.749032318751523, MemAllocated=12.34GB, MaxMemAllocated=22.87GB
step: 99 loss:0.7666015625, correct_predictions: 0.0, reward: -0.384033203125 r_reward: -0.382080078125


您好,打扰您一下,我在v100上训练reward model训练时,每次卡在第99步就停止不动了,程序不报错,也没有结束。
用nvidia-smi查看后,显卡也是占满的情况,麻烦您帮忙看一下。
ps:程序运行执行速度很快,但是到step 99后就不再动了

这个是我的执行命令:
nohup deepspeed --num_gpus 8 /home/rlhf/alpaca_rlhf/deepspeed_chat/training/step2_reward_model_finetuning/main.py --data_output_path /home/rlhf/alpaca_rlhf/deepspeed_chat/training/step2_reward_model_finetuning/data_output --model_name_or_path decapoda-research/llama-7b-hf --num_padding_at_beginning 0 --per_device_train_batch_size 1 --per_device_eval_batch_size 1 --learning_rate 5e-4 --num_train_epochs 1 --gradient_accumulation_steps 1 --num_warmup_steps 0 --zero_stage 2 --deepspeed --output_dir /home/rlhf/alpaca_rlhf/deepspeed_chat/training/step2_reward_model_finetuning/data_output --lora_dim 2 --lora_module_name q_proj,k_proj --only_optimize_lora > nohup1.txt &.

关于Step3中是否需要把生成的answer中eos后面token mask掉

您好!十分感谢您的开源贡献!
我在看您的代码过程中发现您在rlhf中,把模型生成的eos后,后面的所有token mask掉了。这里我有一个疑惑就是:

对于critic model而言,他虽然知道eos后面是pad的,但是也会为pad的hidden state过fc输出一个分数,而并不是不计算分数;
对于actor model而言,只知道每个token所对应的value,用value和reward来监督,但是不知道这个token是否被mask,也不知道这个token是否是pad;
而这样用pad的分来监督actor会不会是更不准确的呢

Step 3: Actor model和Reward model使用不同的tokenizer

作者您好,首先感谢开源。
我在训练第三阶段的时候,用40G显存的GPU无法加载actor model=llama-7b, reward model =llama-7b,会有OOM的问题,因此我尝试把reward model改为更小的bloom1.7b。但是两个模型不互通tokenizer,在step 3,create model的阶段,加载了不同的tokenizer,然而在计算critic_loss的时候,是不是需要把数据转化为critic tokenizer下的表示,然后再计算critic loss?还是说用actor tokenizer处理的数据计算critic loss时不会有影响?
再次感谢!

element 0 of tensors does not require grad and does not have a grad_fn

我的运行脚本如下:
CUDA_VISIBLE_DEVICES=0,1,2,3 deepspeed /data/bill.bi/alpaca-rlhf/alpaca_rlhf/deepspeed_chat/training/step3_rlhf_finetuning/main.py --data_path /data/bill.bi/RLHFDataset --data_output_path /data/bill.bi/tmp/ --actor_model_name_or_path decapoda-research/llama-7b-hf --tokenizer_name_or_path /data/bill.bi/tmp/rlhf/critic --critic_model_name_or_path /data/bill.bi/tmp/rlhf/critic --num_padding_at_beginning 0 --per_device_train_batch_size 4 --actor_learning_rate 9.85e-6 --critic_learning_rate 5e-6 --ppo_epochs 1 --gradient_accumulation_steps 1 --num_warmup_steps 0 --actor_zero_stage 2 --critic_zero_stage 2 --deepspeed --critic_gradient_checkpointing --actor_gradient_checkpointing --output_dir /data/bill.bi/tmp/rlhf/final --actor_lora_dim 8 --actor_lora_module_name q_proj,k_proj,gate_proj,up_proj --critic_lora_dim 8 --critic_lora_module_name q_proj,k_proj,gate_proj,up_proj --only_optimize_lora --max_prompt_seq_len 1024 1>train_step3.log 2>&1

在执行step3的时候,遇到这个报错,具体的栈信息如下:

Traceback (most recent call last):
File "/data/bill.bi/alpaca-rlhf/alpaca_rlhf/deepspeed_chat/training/step3_rlhf_finetuning/main.py", line 563, in
main()
File "/data/bill.bi/alpaca-rlhf/alpaca_rlhf/deepspeed_chat/training/step3_rlhf_finetuning/main.py", line 476, in main
actor_loss, critic_loss = trainer.train_rlhf(exp_data)
File "/data/bill.bi/alpaca-rlhf/alpaca_rlhf/deepspeed_chat/training/step3_rlhf_finetuning/ppo_trainer.py", line 187, in train_rlhf
self.actor_model.backward(actor_loss)
File "/data/bill.bi/miniconda3/envs/deepspeed/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
ret_val = func(*args, **kwargs)
File "/data/bill.bi/miniconda3/envs/deepspeed/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1862, in backward
self.optimizer.backward(loss, retain_graph=retain_graph)
File "/data/bill.bi/miniconda3/envs/deepspeed/lib/python3.10/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 1901, in backward
self.loss_scaler.backward(loss.float(), retain_graph=retain_graph)
File "/data/bill.bi/miniconda3/envs/deepspeed/lib/python3.10/site-packages/deepspeed/runtime/fp16/loss_scaler.py", line 63, in backward
scaled_loss.backward(retain_graph=retain_graph)
File "/data/bill.bi/miniconda3/envs/deepspeed/lib/python3.10/site-packages/torch/_tensor.py", line 487, in backward
torch.autograd.backward(
File "/data/bill.bi/miniconda3/envs/deepspeed/lib/python3.10/site-packages/torch/autograd/init.py", line 200, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

训练问题

请问模型加载时,做模型并行化操作吗?我发现我直接在deepspeed-chat中跑7B的模型都会爆显存,显卡是A100 80G。

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.