ablustrund / loramoe Goto Github PK

LoRAMoE: Revolutionizing Mixture of Experts for Maintaining World Knowledge in Language Model Alignment

Python 99.32% Shell 0.01% C++ 0.05% Cuda 0.59% C 0.02% Cython 0.01%

loramoe's Introduction

Hi there 👋

I am a student at Fudan NLP Group.

Currently, I focus on Natural Language Processing, particularly in reward modeling, model alignment (RLHF) and pre-trained language models.

✨Please click here to find out more information about me 👉 http://shihandou.com

Contact: [email protected]

loramoe's People

Contributors

Stargazers

Watchers

Forkers

fcr09 liukairong2023 apollohuang1 mok0102 xb-chang yuxiang-guo zj56 wandugu jie311 seujung tarrett honghuangneu

loramoe's Issues

请问训练完毕，如何进行generation

我看repo并没有修改generate函数，是在评测时不需要task_type吗？这样会导致报错吧
希望能有eval的代码

os.environ["CUDA_VISIBLE_DEVICES"] = "2"
from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM, AutoTokenizer, LlamaForCausalLM, LlamaTokenizer

output_dir = "./output/0308_debug_format_for_opensource/checkpoint-561"
peft_model_path = os.path.join(output_dir, "sft_lora_model")  # change checkpoint path

peftconfig = PeftConfig.from_pretrained(peft_model_path)

model_base = LlamaForCausalLM.from_pretrained(peftconfig.base_model_name_or_path,
                                             device_map = "auto",
                                            )

tokenizer = LlamaTokenizer.from_pretrained(peftconfig.base_model_name_or_path,
                                          add_bos_token = True,
                                          add_eos_token = False  # always False for inference
                                          )

new_model = PeftModel.from_pretrained(model_base, peft_model_path)

print("Peft model loaded")

import torch
def generate_response(prompt, model):
  encoded_input = tokenizer(prompt, return_tensors="pt", add_special_tokens=True)
  task_types=0
  task_types = torch.tensor(data = task_types)
  encoded_input["task_types"]=task_types
  model_inputs = encoded_input.to('cuda')
  print(model_inputs)
  generated_ids = model.generate(**model_inputs,
                                 max_new_tokens=20,
                                 min_new_tokens=1,
                                 do_sample=False,
                                 pad_token_id=tokenizer.eos_token_id)

  decoded_output = tokenizer.batch_decode(generated_ids)

  return decoded_output[0].replace(prompt, "")

prompt = """Spanish: Período de validez después de abierto el envase: 10 horas.
English:"""

generate_response(prompt, new_model)

Error info:

{'input_ids': tensor([[    1, 10432, 29901,  2431, 29983,  8144,   316,   659,   680, 29920,
         11006,   316,   633, 25449,   560,  8829,   559, 29901, 29871, 29896,
         29900,  4029,   294, 29889,    13, 24636, 29901]], device='cuda:0'), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1, 1]], device='cuda:0'), 'task_types': tensor(0, device='cuda:0')}
Output exceeds the [size limit](command:workbench.action.openSettings?[). Open the full output data [in a text editor](command:workbench.action.openLargeOutput?4edebf64-8b8d-479e-88d4-531631ec5757)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[4], line 22
     17   return decoded_output[0].replace(prompt, "")
     19 prompt = """Spanish: Período de validez después de abierto el envase: 10 horas.
     20 English:"""
---> 22 generate_response(prompt, new_model)

Cell In[4], line 9, in generate_response(prompt, model)
      7 model_inputs = encoded_input.to('cuda')
      8 print(model_inputs)
----> 9 generated_ids = model.generate(**model_inputs,
     10                                max_new_tokens=20,
     11                                min_new_tokens=1,
     12                                do_sample=False,
     13                                pad_token_id=tokenizer.eos_token_id)
     15 decoded_output = tokenizer.batch_decode(generated_ids)
     17 return decoded_output[0].replace(prompt, "")

File ~/LoRAMoE/peft/peft_model.py:587, in PeftModelForCausalLM.generate(self, **kwargs)
    585 self.base_model.prepare_inputs_for_generation = self.prepare_inputs_for_generation
    586 try:
--> 587     outputs = self.base_model.generate(**kwargs)
    588 except:
    589     self.base_model.prepare_inputs_for_generation = self.base_model_prepare_inputs_for_generation
...
-> 2349 next_token_logits = outputs.logits[:, -1, :]
   2351 # pre-process distribution
   2352 next_tokens_scores = logits_processor(input_ids, next_token_logits)

AttributeError: 'str' object has no attribute 'logits'

训练数据中的"task_type"的作用

{
     "instruction": "Passage: The * storekeepers * stayed in town to run their stores and lived in the rooms behind # them # .\nDoes the pronoun # them # refer to * storekeepers *?\nA. Yes\nB. No\nAnswer: ",
     "input": "",
     "output": "B",
     "task_type": 0
}

你好，请问"task_type"的作用是什么？假设有8个Lora专家，那么"task_type": 0表示这条数据，倾向于被第一个专家处理？

训练保存下来的模型不是完整的模型，无法使用opencompass评估

您好，我最近复现了您的代码，我看您所说的评估结果是使用opencompass评估框架得到的，但是我发现我最终训练保存下来的模型似乎不是完整的模型，只是相当于lora模块，并没有把最终训练好的lora模块给合并在一起，因此我无法直接使用opencompass框架进行评估，请问可以提供一下完整的评估流程吗？

作者您好，首先感谢您的开源。基于您的代码，我在约2.5M的多任务数据集上进行SFT，并期望获得如LoRAMoE文献中那样显著强于朴素LoRA的效果。
然而事实上，我发现当没有设置L_{lbc}损失时，LoRAMoE和LoRA在我构建的多任务benchmark上并没有显著差异，具体表现为难分伯仲，想问问作者是否遇到过类似情况呢。
此外，目前moe普遍使用topk策略，因此我非常有兴趣想知道您出于什么目的并未使用这一策略，它拥有更少的激活参数和更快的推理速度。
再次感谢您对于该项目的分享！

[bug] 代码运行出错！！急！急！非常感谢！

代码修改：1. 将模型换成了llama3

2. 修改了device

报错情况：Traceback (most recent call last):
File "/home/xsong/SiameseModelTrain/LoRAMoE/run_loramoe.py", line 569, in
main()
File "/home/xsong/SiameseModelTrain/LoRAMoE/run_loramoe.py", line 345, in main
tokenizer = LlamaTokenizer.from_pretrained(model_args.tokenizer_name_or_path, **tokenizer_kwargs)
File "/home/xsong/SiameseModelTrain/LoRAMoE/transformers/tokenization_utils_base.py", line 1825, in from_pretrained
return cls._from_pretrained(
File "/home/xsong/SiameseModelTrain/LoRAMoE/transformers/tokenization_utils_base.py", line 1988, in _from_pretrained
tokenizer = cls(*init_inputs, init_kwargs)
File "/home/xsong/SiameseModelTrain/LoRAMoE/transformers/models/llama/tokenization_llama.py", line 96, in init
self.sp_model.Load(vocab_file)
File "/home/xsong/anaconda3/envs/newloramoe/lib/python3.10/site-packages/sentencepiece/init.py", line 961, in Load
return self.LoadFromFile(model_file)
File "/home/xsong/anaconda3/envs/newloramoe/lib/python3.10/site-packages/sentencepiece/init.py", line 316, in LoadFromFile
return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
TypeError: not a string
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1415452) of binary: /home/xsong/anaconda3/envs/newloramoe/bin/python
Traceback (most recent call last):
File "/home/xsong/anaconda3/envs/newloramoe/bin/torchrun", line 33, in
sys.exit(load_entry_point('torch==2.0.1', 'console_scripts', 'torchrun')())
File "/home/xsong/anaconda3/envs/newloramoe/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/init**.py", line 346, in wrapper
return f(*args, kwargs)
File "/home/xsong/anaconda3/envs/newloramoe/lib/python3.10/site-packages/torch/distributed/run.py", line 794, in main
run(args)
File "/home/xsong/anaconda3/envs/newloramoe/lib/python3.10/site-packages/torch/distributed/run.py", line 785, in run
elastic_launch(
File "/home/xsong/anaconda3/envs/newloramoe/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 134, in call**
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/xsong/anaconda3/envs/newloramoe/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

run_loramoe.py FAILED
报错情况：Traceback (most recent call last):
File "/home/xsong/SiameseModelTrain/LoRAMoE/run_loramoe.py", line 569, in
main()
File "/home/xsong/SiameseModelTrain/LoRAMoE/run_loramoe.py", line 345, in main
tokenizer = LlamaTokenizer.from_pretrained(model_args.tokenizer_name_or_path, **tokenizer_kwargs)
File "/home/xsong/SiameseModelTrain/LoRAMoE/transformers/tokenization_utils_base.py", line 1825, in from_pretrained
return cls._from_pretrained(
File "/home/xsong/SiameseModelTrain/LoRAMoE/transformers/tokenization_utils_base.py", line 1988, in _from_pretrained
tokenizer = cls(*init_inputs, init_kwargs)
File "/home/xsong/SiameseModelTrain/LoRAMoE/transformers/models/llama/tokenization_llama.py", line 96, in init
self.sp_model.Load(vocab_file)
File "/home/xsong/anaconda3/envs/newloramoe/lib/python3.10/site-packages/sentencepiece/init.py", line 961, in Load
return self.LoadFromFile(model_file)
File "/home/xsong/anaconda3/envs/newloramoe/lib/python3.10/site-packages/sentencepiece/init.py", line 316, in LoadFromFile
return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
TypeError: not a string
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1415452) of binary: /home/xsong/anaconda3/envs/newloramoe/bin/python
Traceback (most recent call last):
File "/home/xsong/anaconda3/envs/newloramoe/bin/torchrun", line 33, in
sys.exit(load_entry_point('torch==2.0.1', 'console_scripts', 'torchrun')())
File "/home/xsong/anaconda3/envs/newloramoe/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/init**.py", line 346, in wrapper
return f(*args, kwargs)
File "/home/xsong/anaconda3/envs/newloramoe/lib/python3.10/site-packages/torch/distributed/run.py", line 794, in main
run(args)
File "/home/xsong/anaconda3/envs/newloramoe/lib/python3.10/site-packages/torch/distributed/run.py", line 785, in run
elastic_launch(
File "/home/xsong/anaconda3/envs/newloramoe/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 134, in call**
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/xsong/anaconda3/envs/newloramoe/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

run_loramoe.py FAILED

请问微调实验需要多少显存

使用opencompass评估模型出错

我按照您的指示，修改opencmpass两个文件后，评估在LoRAMOE上面微调过后的模型在mmlu上面的表现，但是出现了以下错误：

我该如何解决上面的问题。

Potential Bug in Paper or Code

Hi, thanks for your awesome work.

When I am reading the code plus the paper, I have noticed a potential bug, in the paper, the localized balancing constraint loss is defined as $var^2/mean$.