optimalscale / lmflow Goto Github PK

View Code? Open in Web Editor NEW

8.0K 72.0 807.0 24.94 MB

An Extensible Toolkit for Finetuning and Inference of Large Foundation Models. Large Models for All.

Home Page: https://optimalscale.github.io/LMFlow/

License: Apache License 2.0

Shell 6.49% Python 93.44% Dockerfile 0.07%

chatgpt deep-learning instruction-following language-model pretrained-models pytorch transformer

lmflow's Introduction

LMFlow

English | 简体中文 | Español | 日本語 | 한국어 | हिंदी

An extensible, convenient, and efficient toolbox for finetuning large machine learning models, designed to be user-friendly, speedy and reliable, and accessible to the entire community.

Latest News

[2024-03-27] 🚀 Support LISA, enabling 7B training in 24G memory without offloading! 🚀
[2023-09-11] Support speculative decoding. Check out speculative_decoding for the usage and acceleration details.
[2023-08-14] Support long context inference with position interpolation (Linear & NTK scaling ) for LLaMA models. Check out postion_interpolation for more details.
[2023-08-07] Support Flash Attention-2. Check out flash_attention for more details.
[2023-08-02] Support Llama2, ChatGLM2, and Baichuan models.

More news...

[2023-07-23] LMFlow multimodal chatbot is now available! Support multimodal inputs of images and texts. Online Demo is also provided (We hold the service on a single GPU, hence one may experience "queuing" or "application busy" sometimes when multiple users are accessing at the same time, please wait and attempt again later when such event happens)
[2023-06-22] LMFlow paper is out! Check out our implementation details at https://arxiv.org/abs/2306.12420
[2023-06-16] Our finetuned Robin-33B-V2 scored an impressive 64.1 on the Huggingface LLM leaderboard in our offline evaluation, outperforming major open-source LLMs! All checkpoints (7B, 13B, 33B, and 65B) are released! Checkout the performance here.
[2023-06-07] LMFlow is now officially available on PyPI! Install it with pip install lmflow-finetune!
[2023-05-30] Release Robin-13B-v2 and Robin-33B-v2!
[2023-05-15] Release LMFlow-data, the training dataset of Robin-7B-v2. A new test data is also released.
[2023-05-09] Release Robin-7B-v2, achieving competitive performance on chitchat, commonsense reasoning and instruction-following tasks. Refer to our comprehensive study.
[2023-05-08] Release LMFlow Benchmark, an automatic evaluation framework for open-source chat-style LLMs. Benchmark results on 31 popular models are reported. Participate in LMFlow Benchmark.
[2023-04-21] Release Robin-7B (based on LLaMA-7B), and two models for commercial use: Parakeets-2.7B (based on GPT-NEO-2.7B) and Cokatoo-7B (based on StableLM-7B) Download here
[2023-04-15] Inference: Support streaming output and ChatGLM.
[2023-04-10] We propose a new alignment algorithm: Reward rAnked FineTuning (RAFT), which is more efficient than conventional (PPO-based) RLHF. [Paper]
[2023-04-02] Web service is online!
[2023-04-01] Release three instruction-tuned checkpoints and three medical checkpoints in model zoo: LLaMA-7B-tuned, LLaMA-13B-tuned, LLaMA-33B-tuned, LLaMA-7B-medical, LLaMA-13B-medical, and LLaMA-33B-medical.
[2023-03-27] Support full tuning and lora tuning for all decoder models.
[2023-03-27] Tasked tuned model beats ChatGPT on medical domain.
[2023-03-27] Release code and checkpoints - version 0.0.1! Our tasked-tuned model beats ChatGPT on medical domain.

LMFlow

Quick Start

Setup

Our package has been tested on Linux OS (Ubuntu 20.04). Other OS platforms (MacOS, Windows) are not fully tested, where you may encounter unexpected errors. If you are using LMFlow for the first time, we recommend you to try on a Linux machine or Google Colab.

CUDA versions 10.3-11.7 are supported in versions v0.0.5 or older. For CUDA versions greater than 11.7, one can use our stable branch >= v0.0.6.

git clone https://github.com/OptimalScale/LMFlow.git
cd LMFlow
conda create -n lmflow python=3.9 -y
conda activate lmflow
conda install mpi4py
bash install.sh

Prepare Dataset

Please refer to our doc.

Finetuning (Full)

Full training updates all the parameters to finetune a language model. Here is an example to finetune a GPT-2 base model.

cd data && ./download.sh alpaca && cd -

./scripts/run_finetune.sh \
  --model_name_or_path gpt2 \
  --dataset_path data/alpaca/train \
  --output_model_path output_models/finetuned_gpt2

Finetuning (LISA)

LISA is a memory-efficient finetuning algorithm that allows tradeoff between memory and the number of randomly unfreezed layers. This script currently is only tested in single gpus. Please stay tuned for our latest updates 😄

cd data && ./download.sh alpaca && cd -

./scripts/run_finetune_with_lisa.sh \
  --model_name_or_path meta-llama/Llama-2-7b-hf \
  --dataset_path data/alpaca/train \
  --output_model_path output_models/finetuned_llama \
  --lisa_activated_layers 1 \
  --lisa_interval_steps 20

Finetuning (LoRA)

LoRA is a parameter-efficient finetuning algorithm and is more efficient than full finetuning.

cd data && ./download.sh alpaca && cd -

# Saves lora only
./scripts/run_finetune_with_lora.sh \
  --model_name_or_path facebook/galactica-1.3b \
  --dataset_path data/alpaca/train \
  --output_lora_path output_models/finetuned_galactica_lora

# Saves lora and merges into original model
./scripts/run_finetune_with_lora_save_aggregated_weights.sh \
  --model_name_or_path facebook/galactica-1.3b \
  --dataset_path data/alpaca/train \
  --output_model_path output_models/finetuned_galactica

Inference

After finetuning, you can run the following command to chat with the model.

./scripts/run_chatbot.sh output_models/finetuned_gpt2

Deployment

If you want to deploy your own model locally, we provide a gradio-based UI for building chatbots. Running the following command will launch the demo for robin-7b:

pip install gradio
python ./examples/chatbot_gradio.py --deepspeed configs/ds_config_chatbot.json --model_name_or_path YOUR-LLAMA  --lora_model_path ./robin-7b --prompt_structure "A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions.###Human: {input_text}###Assistant:"       --end_string "#" --max_new_tokens 200

Evaluation

LMFlow Benchmark is an automatic evaluation framework for open-source large language models. We use negative log likelihood (NLL) as the metric to evaluate different aspects of a language model: chitchat, commonsense reasoning, and instruction following abilities.

You can directly run the LMFlow benchmark evaluation to obtain the results to participate in the LLM comparision. For example, to run GPT2 XL, one may execute

./scripts/run_benchmark.sh --model_name_or_path gpt2-xl

--model_name_or_path is required, you may fill in huggingface model name or local model path here.

To check the evaluation results, you may check benchmark.log in ./output_dir/gpt2-xl_lmflow_chat_nll_eval, ./output_dir/gpt2-xl_all_nll_eval and ./output_dir/gpt2-xl_commonsense_qa_eval.

Supported Features

Finetune Acceleration & Memory Optimization

LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning

LISA is a novel and memory-efficient training strategy for large language models that outperforms existing methods like LoRA by selectively freezing layers during optimization. Check out LISA for more details.
In LMFLow, activate LISA using --use_lisa 1 in your training command. Control the number of activation layers with --lisa_activated_layers 2, and adjust the freezing layers interval using --lisa_step_interval 20.
LoRA

LoRA is a parameter-efficient finetuning algorithm and is more efficient than full finetuning. Check out finetuning-lora for more details.
FlashAttention

LMFlow supports both FlashAttention-1 and the latest FlashAttention-2. Check out flash_attention for more details.
Gradient Checkpointing

Gradient checkpointing is a memory optimization technique that trades compute for memory. It is useful when the model is too large to fit into GPU memory. Use it by just adding --gradient_checkpointing to your training command.
Deepspeed Zero3

LMFlow supports Deepspeed Zero-3 Offload. We provide an example deepspeed config, and you can directly use it.

Inference Acceleration

LLaMA Inference on CPU

Thanks to the great efforts of llama.cpp. It is possible for everyone to run their LLaMA models on CPU by 4-bit quantization. We provide a script to convert LLaMA LoRA weights to .pt files. You only need to use convert-pth-to-ggml.py in llama.cpp to perform quantization.
FlashAttention

LMFlow supports both FlashAttention-1 and the latest FlashAttention-2. Check out flash_attention for more details.

Long Context

Position Interpolation for LLaMA Models

Now LMFlow supports the latest Linear & NTK (Neural Kernel theory) scaling techniques for LLaMA models. Check out postion_interpolation for more details.

Model Customization

Vocabulary Extension

Now you can train your own sentencepiece tokenizer and merge it with model's origin hf tokenizer. Check out vocab_extension for more details.

Multimodal

Multimodal Chatbot

LMFlow supports multimodal inputs of images and texts. Check out our LMFlow multimodal chatbot.

Support

If you need any help, please submit a Github issue.

License

The code included in this project is licensed under the Apache 2.0 license. If you wish to use the codes and models included in this project for commercial purposes, please sign this document to obtain authorization.

Citation

If you find this repository useful, please consider giving ⭐ and citing our paper:

@article{diao2023lmflow,
  title={Lmflow: An extensible toolkit for finetuning and inference of large foundation models},
  author={Diao, Shizhe and Pan, Rui and Dong, Hanze and Shum, Ka Shun and Zhang, Jipeng and Xiong, Wei and Zhang, Tong},
  journal={arXiv preprint arXiv:2306.12420},
  year={2023}
}

@article{dong2023raft,
  title={Raft: Reward ranked finetuning for generative foundation model alignment},
  author={Dong, Hanze and Xiong, Wei and Goyal, Deepanshu and Pan, Rui and Diao, Shizhe and Zhang, Jipeng and Shum, Kashun and Zhang, Tong},
  journal={arXiv preprint arXiv:2304.06767},
  year={2023}
}

@article{pan2024lisa,
  title={LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning}, 
  author={Pan, Rui and Liu, Xiang and Diao, Shizhe and Pi, Renjie and Zhang, Jipeng and Han, Chi and Zhang, Tong},
  journal={arXiv preprint arXiv:2403.17919},
  year={2024}
}

lmflow's People

Contributors

Stargazers

Watchers

Forkers

dumpmemory xlsean machinelearningsystem rayjue meme-virus seanxml closegoingaway vocabvictor pckennethma fskeo blue0rigin judgementc people-art huoshanchizi yibit balcklive michaelyuancb miaojinshuai kxgong huahuady arcomu romitavia beyondchenlin userzhongjieli tengyuantuohai-113 apollohuang1 ii0 bytebility haikuoxin benliao qianyouliang t-bagwell efjerryyang baibaomen brduizhang wxaaron qiuli848 jiutian12 lannerate wjn1996 rose-cao wangzhi723 wohall2016 ddingwang12 noqsan zhiyuzone dpamk zgq91 codingonion tonywang-sh sxyseo janglichao overbestfitting zhoulei163 johngeng-xj catherinezhou ycz886 integritynoble learning-group1 dansonc chenyangbin llcing o0windseed0o charleschen2006 anjing137 mxy218 a5225662 shw2018 asclepiusinformatica bobthedev rhkdgh255 ccc-ai0 carabob xiaobaishu0097 kzke bise86 xiaochendan acproject guoxs duanzhihua bzp92 intothephone singlag edenbuaa candys-yang stackcn hongwen-sun wccccp nanqiai felixgrey zk524 cluluxiu dianziman messixpro jayjayhust alitarmy shanhedian2017 wozaimalubian enddlesswm petercao

lmflow's Issues

请问下这个错误是什么意思？win10系统，在运行 pip install -e .时发生

(lmflow) PS C:\dev\LMFlow> pip install -e .
Obtaining file:///C:/dev/LMFlow
Preparing metadata (setup.py) ... done
Collecting peft@ git+https://github.com/huggingface/peft@df0e1fb
Cloning https://github.com/huggingface/peft (to revision df0e1fb) to c:\users\zhengshuchang\appdata\local\temp\pip-install-9cde5r5s\peft_d2963c5c488c4da6aa4bfcf8d2db8f1d
Running command git clone --filter=blob:none --quiet https://github.com/huggingface/peft 'C:\Users\zhengshuchang\AppData\Local\Temp\pip-install-9cde5r5s\peft_d2963c5c488c4da6aa4bfcf8d2db8f1d'
WARNING: Did not find branch or tag 'df0e1fb', assuming revision or ref.
Running command git checkout -q df0e1fb
Resolved https://github.com/huggingface/peft to commit df0e1fb
Installing build dependencies ... done
Getting requirements to build wheel ... done
Preparing metadata (pyproject.toml) ... done
Collecting trl@ git+https://github.com/lvwerra/trl.git#egg=trl-0.4.1
Cloning https://github.com/lvwerra/trl.git to c:\users\zhengshuchang\appdata\local\temp\pip-install-9cde5r5s\trl_ef2eb4c9fb8241c083f1a3357c8a3760
Running command git clone --filter=blob:none --quiet https://github.com/lvwerra/trl.git 'C:\Users\zhengshuchang\AppData\Local\Temp\pip-install-9cde5r5s\trl_ef2eb4c9fb8241c083f1a3357c8a3760'
Resolved https://github.com/lvwerra/trl.git to commit ed87942a47f26d15e823ca7674737be02e48cc0a
Preparing metadata (setup.py) ... done
Collecting transformers@ git+https://github.com/huggingface/transformers@c612628
Cloning https://github.com/huggingface/transformers (to revision c612628) to c:\users\zhengshuchang\appdata\local\temp\pip-install-9cde5r5s\transformers_6ac4cbbfe28a4bba8cd5cd04e6218993
Running command git clone --filter=blob:none --quiet https://github.com/huggingface/transformers 'C:\Users\zhengshuchang\AppData\Local\Temp\pip-install-9cde5r5s\transformers_6ac4cbbfe28a4bba8cd5cd04e6218993'
error: unable to create file templates/adding_a_missing_tokenization_test/cookiecutter-template-{{cookiecutter.modelname}}/test_tokenization_{{cookiecutter.lowercase_modelname}}.py: Filename too long
fatal: unable to checkout working tree
warning: Clone succeeded, but checkout failed.
You can inspect what was checked out with 'git status'
and retry with 'git restore --source=HEAD :/'

error: subprocess-exited-with-error

× git clone --filter=blob:none --quiet https://github.com/huggingface/transformers 'C:\Users\zhengshuchang\AppData\Local\Temp\pip-install-9cde5r5s\transformers_6ac4cbbfe28a4bba8cd5cd04e6218993' did not run successfully.
│ exit code: 128
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

note: This error originates from a subprocess, and is likely not a problem with pip.

微信加入的二维码有问题

扫二维码，弹出广告，然后是一个领取儿童阅读课程的页面，不清楚其它是不是这种情况，还请检查一下这个二维码的地址是不是被人截获了。

decide epoch num

Hello,

Probably a trivial question:
The fine-tuning does not take a batch_size. It looks like input datasets are somehow grouped. Is there any best practice to decide a proper epoch num for finetuning in LMFlow? (e.g., how to compute the num of epoch for passing the entire dataset)?

'gbk' codec can't decode byte 0xae in position 728: illegal multibyte sequence

      readme_contents = fp.read().strip()
  UnicodeDecodeError: 'gbk' codec can't decode byte 0xae in position 728: illegal multibyte sequence

在pip install -e . 这一步中

error: PermissionError: [WinError 5] 拒绝访问。: '.\\deepspeed\\ops\\csrc'

你好.安装deepspeed==0.8.3的时候总是遇到这个错误,试着降下版本用了0.3.9安装没问题但是运行run_chatbot时弹出torch._six包不存在,请问下我该如何安装deepspeed最新版的,在windows上.谢谢.

About the use of chatbot.sh

Hi, when I download the opened model and use ./scripts/run_chatbot.sh model_path, I obtain the following error:
ValueError: weight is on the meta device, we need a value to put in on 1.
I do not find the solutions about this problem, could you please help to solve the problem?

ERROR: No matching distribution found for lmflow.datasets when running [python app.py]

(base) u20@u20:~/LMFlow/service$ python app.py
Traceback (most recent call last):
File "app.py", line 4, in
from lmflow.datasets.dataset import Dataset
ModuleNotFoundError: No module named 'lmflow.datasets'

(base) u20@u20:~/LMFlow/service$ pip install lmflow.datasets
ERROR: Could not find a version that satisfies the requirement lmflow.datasets (from versions: none)
ERROR: No matching distribution found for lmflow.datasets

service/app.py could not start on Mac OS system

Here is the message after run python3.9 app.py in the service directory:
Seems the AutoModel module used "hostname -I" to determine the hostname, but this command option is not available on Mac OS system. Instead, "hostname -f" should be used.

===
Failed to use RAM optimized load. Automatically use original load instead.
[2023-04-02 18:56:51,278] [INFO] [comm.py:634:init_distributed] Not using the DeepSpeed or dist launchers, attempting to detect MPI environment...
hostname: illegal option -- I
usage: hostname [-fs] [name-of-host]
Traceback (most recent call last):
File "/Users/clark/Source/Github/LMFlow/service/app.py", line 34, in
model = AutoModel.get_model(model_args, tune_strategy='none', ds_config=ds_config)
File "/Users/clark/Source/Github/LMFlow/src/lmflow/models/auto_model.py", line 14, in get_model
return HFDecoderModel(model_args, *args, **kwargs)
File "/Users/clark/Source/Github/LMFlow/src/lmflow/models/hf_decoder_model.py", line 223, in init
deepspeed.init_distributed()
File "/Users/clark/Source/Github/lm_env/lib/python3.9/site-packages/deepspeed/comm/comm.py", line 642, in init_distributed
mpi_discovery(distributed_port=distributed_port, verbose=verbose)
File "/Users/clark/Source/Github/lm_env/lib/python3.9/site-packages/deepspeed/comm/comm.py", line 672, in mpi_discovery
result = subprocess.check_output(hostname_cmd, shell=True)
File "/usr/local/Cellar/[email protected]/3.9.16/Frameworks/Python.framework/Versions/3.9/lib/python3.9/subprocess.py", line 424, in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
File "/usr/local/Cellar/[email protected]/3.9.16/Frameworks/Python.framework/Versions/3.9/lib/python3.9/subprocess.py", line 528, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['hostname -I']' returned non-zero exit status 1.

ValueError: We need an `offload_dir` to dispatch this model according to this `device_map` > ```python

          > ```python

lora_model_path

thank you! I follow your step, it moves on, but raise new issue.

(lmflow) u20@u20:~/LMFlow/service$ python app.py
Loading checkpoint shards: 100%|██████████████████| 2/2 [00:21<00:00, 10.81s/it]
Traceback (most recent call last):
File "/home/u20/LMFlow/service/app.py", line 36, in
model = AutoModel.get_model(model_args, tune_strategy='none', ds_config=ds_config)
File "/home/u20/LMFlow/src/lmflow/models/auto_model.py", line 14, in get_model
return HFDecoderModel(model_args, *args, **kwargs)
File "/home/u20/LMFlow/src/lmflow/models/hf_decoder_model.py", line 192, in init
self.backend_model = PeftModel.from_pretrained(
File "/home/u20/miniconda3/envs/lmflow/lib/python3.9/site-packages/peft/peft_model.py", line 177, in from_pretrained
model = dispatch_model(model, device_map=device_map)
File "/home/u20/miniconda3/envs/lmflow/lib/python3.9/site-packages/accelerate/big_modeling.py", line 342, in dispatch_model
raise ValueError(
ValueError: We need an offload_dir to dispatch this model according to this device_map, the following submodules need to be offloaded: base_model.model.model.layers.22, base_model.model.model.layers.23, base_model.model.model.layers.24, base_model.model.model.layers.25, base_model.model.model.layers.26, base_model.model.model.layers.27, base_model.model.model.layers.28, base_model.model.model.layers.29, base_model.model.model.layers.30, base_model.model.model.layers.31, base_model.model.model.norm, base_model.model.lm_head.

after google, the answer may be this : oobabooga/text-generation-webui#383
is it because my llama7b is original model is too large and without quantize and compressed?

do you have a solution for this? thank you

Originally posted by @alexhmyang in #67 (comment)

Native Transformer+PEFT on chinese not working

Env: nvidia-docker ngc pytorch @ transformers 4.28 peft 0.30
LoRA: Hu (湖羊) LLaMA-7B
Code in use:

from transformers import LlamaForCausalLM, LlamaTokenizer 
import torch
ckpt = 'llama-7b-hf'
device = torch.device('cuda')
model = LlamaForCausalLM.from_pretrained(ckpt, device_map='auto', load_in_8bit=False)
tokenizer = LlamaTokenizer.from_pretrained(ckpt)

from peft import get_peft_config, get_peft_model, LoraConfig, TaskType
peft_config = LoraConfig.from_pretrained("llama7b-lora-380k")
model = get_peft_model(model, peft_config)

prompt = "华中科技大学在哪里？"
inputs = tokenizer(prompt, return_tensors="pt")["input_ids"].to("cuda")
with torch.no_grad():
      outputs = model.generate(input_ids=inputs, max_new_tokens=10)
      print(tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True)[0])

The answer is total incorrent.

华中科技大学在哪里？/# Code/# Code/# Code/# Code/# Code

It the problem related with the default llama tokenizer?

test_hf_decoder_model.py failed

error message is as follow:
Error Traceback (most recent call last): File "/home/lidong/git/LMFlow/tests/models/test_hf_decoder_model.py", line 100, in test_inference outputs = model.inference(inputs,min_length=5, max_length=100,temperature=0.0, do_sample=False) File "/home/lidong/git/LMFlow/src/lmflow/models/hf_decoder_model.py", line 350, in inference outputs = self.ds_engine.module.generate( File "/home/lidong/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/home/lidong/anaconda3/envs/lmflow/lib/python3.9/site-packages/transformers/generation/utils.py", line 1406, in generate return self.greedy_search( File "/home/lidong/anaconda3/envs/lmflow/lib/python3.9/site-packages/transformers/generation/utils.py", line 2201, in greedy_search outputs = self( File "/home/lidong/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/home/lidong/anaconda3/envs/lmflow/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) File "/home/lidong/anaconda3/envs/lmflow/lib/python3.9/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 1075, in forward transformer_outputs = self.transformer( File "/home/lidong/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/home/lidong/anaconda3/envs/lmflow/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) File "/home/lidong/anaconda3/envs/lmflow/lib/python3.9/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 899, in forward outputs = block( File "/home/lidong/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/home/lidong/anaconda3/envs/lmflow/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) File "/home/lidong/anaconda3/envs/lmflow/lib/python3.9/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 389, in forward attn_outputs = self.attn( File "/home/lidong/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/home/lidong/anaconda3/envs/lmflow/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) File "/home/lidong/anaconda3/envs/lmflow/lib/python3.9/site-packages/transformers/models/gpt2/modeling_gpt2.py", line 311, in forward query, key, value = self.c_attn(hidden_states).split(self.split_size, dim=2) File "/home/lidong/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/home/lidong/anaconda3/envs/lmflow/lib/python3.9/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) File "/home/lidong/anaconda3/envs/lmflow/lib/python3.9/site-packages/transformers/pytorch_utils.py", line 114, in forward x = torch.addmm(self.bias, x.view(-1, x.size(-1)), self.weight) RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling cublasCreate(handle)`

Ran 1 test in 16.964s

FAILED (errors=1)

Process finished with exit code 1`

ValueError: Your setup doesn't support bf16/gpu. You need torch>=1.10, using Ampere GPU with cuda>=11.0

GPU：2080Ti

ubantu pip install -e . 出错

Preparing metadata (setup.py) ... done
Collecting peft@ git+https://github.com/huggingface/peft@df0e1fb
Cloning https://github.com/huggingface/peft (to revision df0e1fb) to /tmp/pip-install-yacztqcm/peft_c9bba61f7ce240f98ed543e0f1a73dc7
Running command git clone --filter=blob:none --quiet https://github.com/huggingface/peft /tmp/pip-install-yacztqcm/peft_c9bba61f7ce240f98ed543e0f1a73dc7
fatal: 无法访问 'https://github.com/huggingface/peft/'：GnuTLS recv error (-110): The TLS connection was non-properly terminated.
error: subprocess-exited-with-error

× git clone --filter=blob:none --quiet https://github.com/huggingface/peft /tmp/pip-install-yacztqcm/peft_c9bba61f7ce240f98ed543e0f1a73dc7 did not run successfully.
│ exit code: 128
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

note: This error originates from a subprocess, and is likely not a problem with pip.

ERROR: No matching distribution found for deepspeed==0.8.3

After run :

pip install -e .

errors raised:

ERROR: Could not find a version that satisfies the requirement deepspeed==0.8.3 (from lmflow) (from versions: 0.3.1.dev1, 0.3.1.dev2, 0.3.1.dev3, 0.3.1.dev4, 0.3.1.dev5, 0.3.1.dev6, 0.3.1.dev7, 0.3.1.dev8, 0.3.1, 0.3.2, 0.3.3, 0.3.4, 0.3.5, 0.3.6, 0.3.7, 0.3.8, 0.3.9, 0.3.10, 0.3.11, 0.3.12, 0.3.13, 0.3.14, 0.3.15, 0.3.16, 0.4.0, 0.4.1, 0.4.2, 0.4.3, 0.4.4, 0.4.5, 0.5.0, 0.5.1, 0.5.2, 0.5.3, 0.5.4, 0.5.5, 0.5.6, 0.5.7, 0.5.8, 0.5.9, 0.5.10, 0.6.0, 0.6.1, 0.6.3, 0.6.4, 0.6.5, 0.6.6, 0.6.7, 0.7.0, 0.7.1, 0.7.2, 0.7.3, 0.7.4, 0.7.5, 0.7.6, 0.7.7, 0.8.0, 0.8.1, 0.8.2)
ERROR: No matching distribution found for deepspeed==0.8.3

how to fix??

/home/HD16/park/LMFlow/scripts/examples/finetune.py': [Errno 2] No such file or directory

(lmflow) venusai@localhost  /home/HD16/park/LMFlow/scripts   main  ./run_finetune.sh
[2023-04-03 16:22:21,802] [WARNING] [runner.py:186:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
[2023-04-03 16:22:22,391] [INFO] [runner.py:548:main] cmd = /home/venusai/anaconda3/envs/lmflow/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMF19 --master_addr=127.0.0.1 --master_port=11000 --enable_each_rank_log=None examples/finetune.py --model_name_or_path gpt2 --dataset_path /home/HD16/park/LMFlow/data/alpaca/train --output_dir /home/HD16/park/LMFlow/output_models/finetune --overwrite_output_dir --num_train_epochs 0.01 --learning_rate 2e-5 --block_size 512 --per_device_train_batch_size 1 --deepspeed configs/ds_config_zero3.json --bf16 --run_name finetune --validation_split_percentage 0 --logging_steps 20 --do_train --ddp_timeout 72000 --save_steps 5000 --dataloader_num_workers 1
[2023-04-03 16:22:23,014] [INFO] [launch.py:142:main] WORLD INFO DICT: {'localhost': [0]}
[2023-04-03 16:22:23,014] [INFO] [launch.py:148:main] nnodes=1, num_local_procs=1, node_rank=0
[2023-04-03 16:22:23,014] [INFO] [launch.py:161:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0]})
[2023-04-03 16:22:23,014] [INFO] [launch.py:162:main] dist_world_size=1
[2023-04-03 16:22:23,014] [INFO] [launch.py:164:main] Setting CUDA_VISIBLE_DEVICES=0
/home/venusai/anaconda3/envs/lmflow/bin/python: can't open file '/home/HD16/park/LMFlow/scripts/examples/finetune.py': [Errno 2] No such file or directory
[2023-04-03 16:22:24,023] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 26066
[2023-04-03 16:22:24,026] [ERROR] [launch.py:324:sigkill_handler] ['/home/venusai/anaconda3/envs/lmflow/bin/python', '-u', 'examples/finetune.py', '--local_rank=0', '--model_name_or_path', 'gpt2', '--dataset_path', '/home/HD16/park/LMFlow/data/alpaca/train', '--output_dir', '/home/HD16/park/LMFlow/output_models/finetune', '--overwrite_output_dir', '--num_train_epochs', '0.01', '--learning_rate', '2e-5', '--block_size', '512', '--per_device_train_batch_size', '1', '--deepspeed', 'configs/ds_config_zero3.json', '--bf16', '--run_name', 'finetune', '--validation_split_percentage', '0', '--logging_steps', '20', '--do_train', '--ddp_timeout', '72000', '--save_steps', '5000', '--dataloader_num_workers', '1'] exits with return code = 2

error: metadata-generation-failed

Hello LMFlow Team,

Kudos for your work!

The error appears while Setup via Git. Could you please let me know any possible way to fix it?
The code I ran is:
pip install -e .

The error is as follows:
Traceback (most recent call last):
File "", line 2, in
File "", line 34, in
File "C:\Users\qingd\LMFlow\setup.py", line 24, in
readme_contents = fp.read().strip()
File "C:\Users\qingd\anaconda3\envs\lmflow\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 1300: character maps to
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed
Encountered error while generating package metadata.
See above for output.
note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

Best!

Finetuning Chinese models

看到你们展示的都是英文的交互界面，请问是否可以用中文来训练？这里也有一个问题，如果基于LLaMa训练的话，中文instruct是否有效？期待回答。谢谢。

can't allocate memory

DefaultCPUAllocator: can't allocate memory: you tried to allocate 4009910272 bytes. Error code 12 (Cannot allocate memory)

bash ./scripts/run_finetune.sh

deepspeed ${deepspeed_args} \
  examples/finetune.py \
    --model_name_or_path decapoda-research/llama-7b-hf \
    --lora_model_path /home/dm/projects/LMFlow/output_models/llama7b-lora-380k \
    --dataset_path ${dataset_path} \
    --output_dir ${output_dir} --overwrite_output_dir \
    --local_rank=4 \
    --num_train_epochs 0.01 \
    --learning_rate 2e-5 \
    --block_size 256 \
    --use_ram_optimized_load False \
    --per_device_train_batch_size 1 \
    --deepspeed configs/ds_config_zero3.json \
    --run_name finetune \
    --validation_split_percentage 0 \
    --logging_steps 20 \
    --do_train \
    --ddp_timeout 72000 \
    --save_steps 5000 \
    --dataloader_num_workers 1 \
    | tee ${log_dir}/train.log \
    2> ${log_dir}/train.err

[2023-04-03 22:11:36,967] [WARNING] [runner.py:186:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
[2023-04-03 22:11:38,845] [INFO] [runner.py:550:main] cmd = /home/dm/.miniconda3/envs/LLM/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMCwgMSwgMiwgM119 --master_addr=127.0.0.1 --master_port=11000 --enable_each_rank_log=None examples/finetune.py --model_name_or_path decapoda-research/llama-7b-hf --lora_model_path /home/dm/projects/LMFlow/output_models/llama7b-lora-380k --dataset_path /home/dm/projects/LMFlow/data/example_dataset/train --output_dir /home/dm/projects/LMFlow/output_models/finetune --overwrite_output_dir --local_rank=4 --per_device_train_batch_size 1 --per_device_eval_batch_size 1 --num_train_epochs 0.01 --learning_rate 2e-5 --block_size 256 --use_ram_optimized_load False --per_device_train_batch_size 1 --deepspeed configs/ds_config_zero3.json --run_name finetune --validation_split_percentage 0 --logging_steps 20 --do_train --ddp_timeout 72000 --save_steps 5000 --dataloader_num_workers 1
[2023-04-03 22:11:40,811] [INFO] [launch.py:142:main] WORLD INFO DICT: {'localhost': [0, 1, 2, 3]}
[2023-04-03 22:11:40,811] [INFO] [launch.py:148:main] nnodes=1, num_local_procs=4, node_rank=0
[2023-04-03 22:11:40,811] [INFO] [launch.py:161:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0, 1, 2, 3]})
[2023-04-03 22:11:40,811] [INFO] [launch.py:162:main] dist_world_size=4
[2023-04-03 22:11:40,811] [INFO] [launch.py:164:main] Setting CUDA_VISIBLE_DEVICES=0,1,2,3

===================================BUG REPORT===================================

Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
===================================BUG REPORT===================================================================================================================

Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues

================================================================================
===================================BUG REPORT===================================

Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues===================================BUG REPORT===================================

================================================================================Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues

================================================================================
/home/dm/.miniconda3/envs/LLM/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: /home/dm/.miniconda3/envs/LLM did not contain libcudart.so as expected! Searching further paths...
 warn(msg)
/home/dm/.miniconda3/envs/LLM/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: /home/dm/.miniconda3/envs/LLM did not contain libcudart.so as expected! Searching further paths...
 warn(msg)
/home/dm/.miniconda3/envs/LLM/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: /home/dm/.miniconda3/envs/LLM did not contain libcudart.so as expected! Searching further paths...
 warn(msg)
/home/dm/.miniconda3/envs/LLM/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: /home/dm/.miniconda3/envs/LLM did not contain libcudart.so as expected! Searching further paths...
 warn(msg)
/home/dm/.miniconda3/envs/LLM/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: Compute capability < 7.5 detected! Only slow 8-bit matmul is supported for your GPU!
 warn(msg)
/home/dm/.miniconda3/envs/LLM/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: Compute capability < 7.5 detected! Only slow 8-bit matmul is supported for your GPU!
 warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so

CUDA SETUP: Highest compute capability among GPUs detected: 6.1CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so

CUDA SETUP: Detected CUDA version 114CUDA SETUP: Highest compute capability among GPUs detected: 6.1

CUDA SETUP: Detected CUDA version 114
/home/dm/.miniconda3/envs/LLM/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: Compute capability < 7.5 detected! Only slow 8-bit matmul is supported for your GPU!
 warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 6.1
CUDA SETUP: Detected CUDA version 114
CUDA SETUP: Loading binary /home/dm/.miniconda3/envs/LLM/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda114_nocublaslt.so...
CUDA SETUP: Loading binary /home/dm/.miniconda3/envs/LLM/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda114_nocublaslt.so...
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 6.1
CUDA SETUP: Detected CUDA version 114
/home/dm/.miniconda3/envs/LLM/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: Compute capability < 7.5 detected! Only slow 8-bit matmul is supported for your GPU!
 warn(msg)
CUDA SETUP: Loading binary /home/dm/.miniconda3/envs/LLM/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda114_nocublaslt.so...
CUDA SETUP: Loading binary /home/dm/.miniconda3/envs/LLM/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda114_nocublaslt.so...
[2023-04-03 22:11:56,238] [INFO] [comm.py:652:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
04/03/2023 22:11:57 - WARNING - lmflow.pipeline.finetuner - Process rank: 0, device: cuda:0, n_gpu: 1distributed training: True, 16-bits training: False
04/03/2023 22:11:58 - WARNING - lmflow.pipeline.finetuner - Process rank: 1, device: cuda:1, n_gpu: 1distributed training: True, 16-bits training: False
04/03/2023 22:11:58 - WARNING - datasets.builder - Found cached dataset json (/home/dm/.cache/huggingface/datasets/json/default-5ae8ba371b9f2d27/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)
ModelArguments(model_name_or_path='decapoda-research/llama-7b-hf', lora_model_path='/home/dm/projects/LMFlow/output_models/llama7b-lora-380k', model_type=None, config_overrides=None, config_name=None, tokenizer_name=None, cache_dir=None, use_fast_tokenizer=True, model_revision='main', use_auth_token=False, torch_dtype=None, use_lora=False, lora_r=8, lora_alpha=32, lora_dropout=0.1, use_ram_optimized_load=False)
04/03/2023 22:11:58 - WARNING - lmflow.pipeline.finetuner - Process rank: 3, device: cuda:3, n_gpu: 1distributed training: True, 16-bits training: False
04/03/2023 22:11:58 - WARNING - lmflow.pipeline.finetuner - Process rank: 2, device: cuda:2, n_gpu: 1distributed training: True, 16-bits training: False
04/03/2023 22:11:59 - WARNING - datasets.builder - Found cached dataset json (/home/dm/.cache/huggingface/datasets/json/default-5ae8ba371b9f2d27/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)
ModelArguments(model_name_or_path='decapoda-research/llama-7b-hf', lora_model_path='/home/dm/projects/LMFlow/output_models/llama7b-lora-380k', model_type=None, config_overrides=None, config_name=None, tokenizer_name=None, cache_dir=None, use_fast_tokenizer=True, model_revision='main', use_auth_token=False, torch_dtype=None, use_lora=False, lora_r=8, lora_alpha=32, lora_dropout=0.1, use_ram_optimized_load=False)
04/03/2023 22:11:59 - WARNING - datasets.builder - Found cached dataset json (/home/dm/.cache/huggingface/datasets/json/default-5ae8ba371b9f2d27/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)
ModelArguments(model_name_or_path='decapoda-research/llama-7b-hf', lora_model_path='/home/dm/projects/LMFlow/output_models/llama7b-lora-380k', model_type=None, config_overrides=None, config_name=None, tokenizer_name=None, cache_dir=None, use_fast_tokenizer=True, model_revision='main', use_auth_token=False, torch_dtype=None, use_lora=False, lora_r=8, lora_alpha=32, lora_dropout=0.1, use_ram_optimized_load=False)
04/03/2023 22:12:04 - WARNING - datasets.builder - Found cached dataset json (/home/dm/.cache/huggingface/datasets/json/default-5ae8ba371b9f2d27/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)
ModelArguments(model_name_or_path='decapoda-research/llama-7b-hf', lora_model_path='/home/dm/projects/LMFlow/output_models/llama7b-lora-380k', model_type=None, config_overrides=None, config_name=None, tokenizer_name=None, cache_dir=None, use_fast_tokenizer=True, model_revision='main', use_auth_token=False, torch_dtype=None, use_lora=False, lora_r=8, lora_alpha=32, lora_dropout=0.1, use_ram_optimized_load=False)
[2023-04-03 22:12:40,880] [INFO] [partition_parameters.py:415:__exit__] finished initializing model with 6.74B parameters
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 33/33 [01:36<00:00,  2.92s/it]
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 33/33 [01:36<00:00,  2.92s/it]
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 33/33 [01:36<00:00,  2.92s/it]
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 33/33 [01:36<00:00,  2.92s/it]
04/03/2023 22:14:21 - WARNING
..................
- datasets.fingerprint - Parameter 'function'=<function HFDecoderModel.tokenize.<locals>.tokenize_function at 0x7f8e405aeca0> of the transform datasets.arrow_dataset.Dataset._map_single couldn't be hashed properly, a random hash was used instead. Make sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and caching to work. If you reuse this transform, the caching mechanism will consider it to be different from the previous calls and recompute everything. This warning is only showed once. Subsequent hashing failures won't be showed.
04/03/2023 22:14:21 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at /home/dm/.cache/huggingface/datasets/json/default-5ae8ba371b9f2d27/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-1c80317fa3b1799d.arrow
04/03/2023 22:14:21 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at /home/dm/.cache/huggingface/datasets/json/default-5ae8ba371b9f2d27/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-440a0d2293dee367.arrow
04/03/2023 22:14:21 - WARNING - datasets.fingerprint - Parameter 'function'=<function HFDecoderModel.tokenize.<locals>.tokenize_function at 0x7f63180cdca0> of the transform datasets.arrow_dataset.Dataset._map_single couldn't be hashed properly, a random hash was used instead. Make sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and caching to work. If you reuse this transform, the caching mechanism will consider it to be different from the previous calls and recompute everything. This warning is only showed once. Subsequent hashing failures won't be showed.
04/03/2023 22:14:21 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at /home/dm/.cache/huggingface/datasets/json/default-5ae8ba371b9f2d27/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-1c80317fa3b1799d.arrow
04/03/2023 22:14:21 - WARNING - datasets.fingerprint - Parameter 'function'=<function HFDecoderModel.tokenize.<locals>.tokenize_function at 0x7f2ee8477e50> of the transform datasets.arrow_dataset.Dataset._map_single couldn't be hashed properly, a random hash was used instead. Make sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and caching to work. If you reuse this transform, the caching mechanism will consider it to be different from the previous calls and recompute everything. This warning is only showed once. Subsequent hashing failures won't be showed.
04/03/2023 22:14:21 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at /home/dm/.cache/huggingface/datasets/json/default-5ae8ba371b9f2d27/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-1c80317fa3b1799d.arrow
04/03/2023 22:14:21 - WARNING - datasets.fingerprint - Parameter 'function'=<function HFDecoderModel.tokenize.<locals>.tokenize_function at 0x7f35d1f88ca0> of the transform datasets.arrow_dataset.Dataset._map_single couldn't be hashed properly, a random hash was used instead. Make sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and caching to work. If you reuse this transform, the caching mechanism will consider it to be different from the previous calls and recompute everything. This warning is only showed once. Subsequent hashing failures won't be showed.
04/03/2023 22:14:21 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at /home/dm/.cache/huggingface/datasets/json/default-5ae8ba371b9f2d27/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-1c80317fa3b1799d.arrow
04/03/2023 22:14:21 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at /home/dm/.cache/huggingface/datasets/json/default-5ae8ba371b9f2d27/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-440a0d2293dee367.arrow
04/03/2023 22:14:21 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at /home/dm/.cache/huggingface/datasets/json/default-5ae8ba371b9f2d27/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-440a0d2293dee367.arrow
04/03/2023 22:14:21 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at /home/dm/.cache/huggingface/datasets/json/default-5ae8ba371b9f2d27/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-440a0d2293dee367.arrow
Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Using /home/dm/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Using /home/dm/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Using /home/dm/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Using /home/dm/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /home/dm/.cache/torch_extensions/py39_cu117/cpu_adam/build.ninja...
Building extension module cpu_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module cpu_adam...
Time to load cpu_adam op: 1.1844513416290283 seconds
Loading extension module cpu_adam...
Time to load cpu_adam op: 1.2648406028747559 seconds
Loading extension module cpu_adam...
Time to load cpu_adam op: 1.2672991752624512 seconds
Loading extension module cpu_adam...
Time to load cpu_adam op: 1.2600975036621094 seconds
Using /home/dm/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Using /home/dm/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Using /home/dm/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Using /home/dm/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Emitting ninja build file /home/dm/.cache/torch_extensions/py39_cu117/utils/build.ninja...
Building extension module utils...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module utils...
Time to load utils op: 0.5386121273040771 seconds
Loading extension module utils...
Time to load utils op: 0.3074460029602051 seconds
Loading extension module utils...
Loading extension module utils...
Time to load utils op: 0.6129136085510254 seconds
Time to load utils op: 0.6200845241546631 seconds
Parameter Offload: Total persistent parameters: 266240 in 65 params

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/dm/projects/LMFlow/examples/finetune.py:71 in <module>                                     │
│                                                                                                  │
│   68                                                                                             │
│   69                                                                                             │
│   70 if __name__ == '__main__':                                                                  │
│ ❱ 71 │   main()                                                                                  │
│   72                                                                                             │
│                                                                                                  │
│ /home/dm/projects/LMFlow/examples/finetune.py:67 in main                                         │
│                                                                                                  │
│   64 │   │   )                                                                                   │
│   65 │                                                                                           │
│   66 │   # Finetuning                                                                            │
│ ❱ 67 │   tuned_model = finetuner.tune(model=model, lm_dataset=lm_dataset)                        │
│   68                                                                                             │
│   69                                                                                             │
│   70 if __name__ == '__main__':                                                                  │
│                                                                                                  │
│ /home/dm/projects/LMFlow/src/lmflow/pipeline/finetuner.py:232 in tune                            │
│                                                                                                  │
│   229 │   │   │   │   checkpoint = training_args.resume_from_checkpoint                          │
│   230 │   │   │   elif last_checkpoint is not None:                                              │
│   231 │   │   │   │   checkpoint = last_checkpoint                                               │
│ ❱ 232 │   │   │   train_result = trainer.train(resume_from_checkpoint=checkpoint)                │
│   233 │   │   │                                                                                  │
│   234 │   │   │   if not model_args.use_lora:                                                    │
│   235 │   │   │   │   trainer.save_model()  # Saves the tokenizer too for easy upload            │
│                                                                                                  │
│ /home/dm/.miniconda3/envs/LLM/lib/python3.9/site-packages/transformers-4.28.0.dev0-py3.9.egg/tra │
│ nsformers/trainer.py:1639 in train                                                               │
│                                                                                                  │
│   1636 │   │   inner_training_loop = find_executable_batch_size(                                 │
│   1637 │   │   │   self._inner_training_loop, self._train_batch_size, args.auto_find_batch_size  │
│   1638 │   │   )                                                                                 │
│ ❱ 1639 │   │   return inner_training_loop(                                                       │
│   1640 │   │   │   args=args,                                                                    │
│   1641 │   │   │   resume_from_checkpoint=resume_from_checkpoint,                                │
│   1642 │   │   │   trial=trial,                                                                  │
│                                                                                                  │
│ /home/dm/.miniconda3/envs/LLM/lib/python3.9/site-packages/transformers-4.28.0.dev0-py3.9.egg/tra │
│ nsformers/trainer.py:1708 in _inner_training_loop                                                │
│                                                                                                  │
│   1705 │   │   │   or self.fsdp is not None                                                      │
│   1706 │   │   )                                                                                 │
│   1707 │   │   if args.deepspeed:                                                                │
│ ❱ 1708 │   │   │   deepspeed_engine, optimizer, lr_scheduler = deepspeed_init(                   │
│   1709 │   │   │   │   self, num_training_steps=max_steps, resume_from_checkpoint=resume_from_c  │
│   1710 │   │   │   )                                                                             │
│   1711 │   │   │   self.model = deepspeed_engine.module                                          │
│                                                                                                  │
│ /home/dm/.miniconda3/envs/LLM/lib/python3.9/site-packages/transformers-4.28.0.dev0-py3.9.egg/tra │
│ nsformers/deepspeed.py:378 in deepspeed_init                                                     │
│                                                                                                  │
│   375 │   │   "lr_scheduler": lr_scheduler,                                                      │
│   376 │   }                                                                                      │
│   377 │                                                                                          │
│ ❱ 378 │   deepspeed_engine, optimizer, _, lr_scheduler = deepspeed.initialize(**kwargs)          │
│   379 │                                                                                          │
│   380 │   if resume_from_checkpoint is not None:                                                 │
│   381 │   │   # it's possible that the user is trying to resume from model_path, which doesn't   │
│                                                                                                  │
│ /home/dm/.miniconda3/envs/LLM/lib/python3.9/site-packages/deepspeed/__init__.py:125 in           │
│ initialize                                                                                       │
│                                                                                                  │
│   122 │   assert model is not None, "deepspeed.initialize requires a model"                      │
│   123 │                                                                                          │
│   124 │   if not isinstance(model, PipelineModule):                                              │
│ ❱ 125 │   │   engine = DeepSpeedEngine(args=args,                                                │
│   126 │   │   │   │   │   │   │   │    model=model,                                              │
│   127 │   │   │   │   │   │   │   │    optimizer=optimizer,                                      │
│   128 │   │   │   │   │   │   │   │    model_parameters=model_parameters,                        │
│                                                                                                  │
│ /home/dm/.miniconda3/envs/LLM/lib/python3.9/site-packages/deepspeed/runtime/engine.py:340 in     │
│ __init__                                                                                         │
│                                                                                                  │
│    337 │   │   │   model_parameters = list(model_parameters)                                     │
│    338 │   │                                                                                     │
│    339 │   │   if has_optimizer:                                                                 │
│ ❱  340 │   │   │   self._configure_optimizer(optimizer, model_parameters)                        │
│    341 │   │   │   self._configure_lr_scheduler(lr_scheduler)                                    │
│    342 │   │   │   self._report_progress(0)                                                      │
│    343 │   │   elif self.zero_optimization():                                                    │
│                                                                                                  │
│ /home/dm/.miniconda3/envs/LLM/lib/python3.9/site-packages/deepspeed/runtime/engine.py:1298 in    │
│ _configure_optimizer                                                                             │
│                                                                                                  │
│   1295 │   │   optimizer_wrapper = self._do_optimizer_sanity_check(basic_optimizer)              │
│   1296 │   │                                                                                     │
│   1297 │   │   if optimizer_wrapper == ZERO_OPTIMIZATION:                                        │
│ ❱ 1298 │   │   │   self.optimizer = self._configure_zero_optimizer(basic_optimizer)              │
│   1299 │   │   elif optimizer_wrapper == AMP:                                                    │
│   1300 │   │   │   amp_params = self.amp_params()                                                │
│   1301 │   │   │   log_dist(f"Initializing AMP with these params: {amp_params}", ranks=[0])      │
│                                                                                                  │
│ /home/dm/.miniconda3/envs/LLM/lib/python3.9/site-packages/deepspeed/runtime/engine.py:1599 in    │
│ _configure_zero_optimizer                                                                        │
│                                                                                                  │
│   1596 │   │   │   │   log_dist(f'Creating {model_dtype} ZeRO stage {zero_stage} optimizer',     │
│   1597 │   │   │   │   │   │    ranks=[0])                                                       │
│   1598 │   │   │   │   from deepspeed.runtime.zero.stage3 import DeepSpeedZeroOptimizer_Stage3   │
│ ❱ 1599 │   │   │   │   optimizer = DeepSpeedZeroOptimizer_Stage3(                                │
│   1600 │   │   │   │   │   self.module,                                                          │
│   1601 │   │   │   │   │   optimizer,                                                            │
│   1602 │   │   │   │   │   timers=timers,                                                        │
│                                                                                                  │
│ /home/dm/.miniconda3/envs/LLM/lib/python3.9/site-packages/deepspeed/runtime/zero/stage3.py:312   │
│ in __init__                                                                                      │
│                                                                                                  │
│    309 │   │   │   f'Largest partitioned param numel = {largest_partitioned_param_numel}',       │
│    310 │   │   │   force=False)                                                                  │
│    311 │   │                                                                                     │
│ ❱  312 │   │   self._setup_for_real_optimizer()                                                  │
│    313 │   │   self.grad_position = {}                                                           │
│    314 │   │   self.set_grad_positions()                                                         │
│    315                                                                                           │
│                                                                                                  │
│ /home/dm/.miniconda3/envs/LLM/lib/python3.9/site-packages/deepspeed/runtime/zero/stage3.py:371   │
│ in _setup_for_real_optimizer                                                                     │
│                                                                                                  │
│    368 │   │                                                                                     │
│    369 │   │   see_memory_usage("Before initializing optimizer states", force=True)              │
│    370 │   │                                                                                     │
│ ❱  371 │   │   self.initialize_optimizer_states()                                                │
│    372 │   │   see_memory_usage("After initializing optimizer states", force=True)               │
│    373 │   │   dist.barrier()                                                                    │
│    374                                                                                           │
│                                                                                                  │
│ /home/dm/.miniconda3/envs/LLM/lib/python3.9/site-packages/deepspeed/runtime/zero/stage3.py:924   │
│ in initialize_optimizer_states                                                                   │
│                                                                                                  │
│    921 │   │   │   │   self._optimizer_states_and_gradient_swap_in(i, timer_names)               │
│    922 │   │   │                                                                                 │
│    923 │   │   │   if self.offload_optimizer and not swappable_optimizer_subgroup:               │
│ ❱  924 │   │   │   │   subgroup_gradient_buffer = torch.zeros(num_elements,                      │
│    925 │   │   │   │   │   │   │   │   │   │   │   │   │      dtype=gradient_dtype,              │
│    926 │   │   │   │   │   │   │   │   │   │   │   │   │      device=self.device)                │
│    927 │   │   │   │   if self.offload_optimizer_pin_memory:                                     │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: [enforce fail at alloc_cpu.cpp:75] err == 0. DefaultCPUAllocator: can't allocate memory: you tried to allocate 4009910272 bytes. Error code 12 (Cannot allocate memory)
[2023-04-03 22:15:03,293] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 126329
[2023-04-03 22:15:03,739] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 126330
[2023-04-03 22:15:05,078] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 126331
[2023-04-03 22:15:06,340] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 126332
[2023-04-03 22:15:06,341] [ERROR] [launch.py:324:sigkill_handler] ['/home/dm/.miniconda3/envs/LLM/bin/python', '-u', 'examples/finetune.py', '--local_rank=3', '--model_name_or_path', 'decapoda-research/llama-7b-hf', '--lora_model_path', '/home/dm/projects/LMFlow/output_models/llama7b-lora-380k', '--dataset_path', '/home/dm/projects/LMFlow/data/example_dataset/train', '--output_dir', '/home/dm/projects/LMFlow/output_models/finetune', '--overwrite_output_dir', '--local_rank=4', '--per_device_train_batch_size', '1', '--per_device_eval_batch_size', '1', '--num_train_epochs', '0.01', '--learning_rate', '2e-5', '--block_size', '256', '--use_ram_optimized_load', 'False', '--per_device_train_batch_size', '1', '--deepspeed', 'configs/ds_config_zero3.json', '--run_name', 'finetune', '--validation_split_percentage', '0', '--logging_steps', '20', '--do_train', '--ddp_timeout', '72000', '--save_steps', '5000', '--dataloader_num_workers', '1'] exits with return code = 1

how to run the download model with cmd and web service

i have finished transfer llama 7b model to llama-7b-hf (following 4.1 LLaMA Checkpoint )

now how to run the llama 7b model with web service and cmd?

I tried to change the path in app.py , but it points to the huggingface url ( which i wish to point to my local model path)

cd ./service python app.py

medical model checkpoints

multi-gpu & batch inference

Hello,

Could you consider add multi-gpu inference support to the codebase? DeepSpeed should have provided an interface to this functionality. But I met some errors when adapting the code snippet in hf_decoder_model.py.

Besides, the inference code in examples/chatbot.py can only take one input at a time. Is it possible to allow batch inference?

Thanks!

微信群进不去

微信群进不去了，请问能拉一下吗谢谢

support for cuda11.3

I notice that the requirements.txt need torch2.0 which is only supported by cuda 11.7 or higher, so does this project support torch 1.x with cuda 11.3?

Make this even more affordable

Hi, just came across this repo, great work and a wonderful vision!

"The LLaMA 33B (LoRA) performance is achieved with only ~16h finetuning on the training split of PubMedQA and MedMCQA with a single 8 * A100 server."

I did some experiments on PubMedQA and achieved 75.6% accuracy with a single 24G GPU in 3 hours. You may take a look at https://github.com/Pilot-LH/expert, it's mainly based on https://github.com/johnsmith0031/alpaca_lora_4bit

Looking forward to better and even more affordable model from the community!

KeyError: 'llama'

Traceback (most recent call last):
File "/media/data/3/temp/lcq_temp/code/LMFlow-main/examples/finetune.py", line 71, in
main()
File "/media/data/3/temp/lcq_temp/code/LMFlow-main/examples/finetune.py", line 56, in main
model = AutoModel.get_model(model_args)
File "/media/data/3/temp/lcq_temp/code/LMFlow-main/src/lmflow/models/auto_model.py", line 14, in get_model
return HFDecoderModel(model_args, *args, **kwargs)
File "/media/data/3/temp/lcq_temp/code/LMFlow-main/src/lmflow/models/hf_decoder_model.py", line 113, in init
config = AutoConfig.from_pretrained(model_args.model_name_or_path, **config_kwargs)
File "/home/temp/.conda/envs/lmflow/lib/python3.9/site-packages/transformers/models/auto/configuration_auto.py", line 917, in from_pretrained
config_class = CONFIG_MAPPING[config_dict["model_type"]]
File "/home/temp/.conda/envs/lmflow/lib/python3.9/site-packages/transformers/models/auto/configuration_auto.py", line 623, in getitem
raise KeyError(key)
KeyError: 'llama'

wandb login

Building extension module utils...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module utils...
Time to load utils op: 0.20666718482971191 seconds
Parameter Offload: Total persistent parameters: 121344 in 98 params
Using /root/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.00025916099548339844 seconds
wandb: ERROR api_key not configured (no-tty). call wandb.login(key=[your_api_key])
Traceback (most recent call last):
File "/share/Train_llama/LMFlow/examples/finetune.py", line 70, in
main()
File "/share/Train_llama/LMFlow/examples/finetune.py", line 66, in main
tuned_model = finetuner.tune(model=model, lm_dataset=lm_dataset)
File "/share/Train_llama/LMFlow/src/lmflow/pipeline/finetuner.py", line 232, in tune
train_result = trainer.train(resume_from_checkpoint=checkpoint)
File "/opt/conda/envs/lmflow/lib/python3.9/site-packages/transformers/trainer.py", line 1639, in train
return inner_training_loop(
File "/opt/conda/envs/lmflow/lib/python3.9/site-packages/transformers/trainer.py", line 1822, in _inner_training_loop
self.control = self.callback_handler.on_train_begin(args, self.state, self.control)
File "/opt/conda/envs/lmflow/lib/python3.9/site-packages/transformers/trainer_callback.py", line 353, in on_train_begin
return self.call_event("on_train_begin", args, state, control)
File "/opt/conda/envs/lmflow/lib/python3.9/site-packages/transformers/trainer_callback.py", line 397, in call_event
result = getattr(callback, event)(
File "/opt/conda/envs/lmflow/lib/python3.9/site-packages/transformers/integrations.py", line 764, in on_train_begin
self.setup(args, state, model, **kwargs)
File "/opt/conda/envs/lmflow/lib/python3.9/site-packages/transformers/integrations.py", line 738, in setup
self._wandb.init(
File "/opt/conda/envs/lmflow/lib/python3.9/site-packages/wandb/sdk/wandb_init.py", line 1140, in init
wi.setup(kwargs)
File "/opt/conda/envs/lmflow/lib/python3.9/site-packages/wandb/sdk/wandb_init.py", line 288, in setup
wandb_login._login(
File "/opt/conda/envs/lmflow/lib/python3.9/site-packages/wandb/sdk/wandb_login.py", line 298, in _login
wlogin.prompt_api_key()
File "/opt/conda/envs/lmflow/lib/python3.9/site-packages/wandb/sdk/wandb_login.py", line 228, in prompt_api_key
raise UsageError("api_key not configured (no-tty). call " + directive)
wandb.errors.UsageError: api_key not configured (no-tty). call wandb.login(key=[your_api_key])

"python3 ./examples/chatbot.py" error

tangjinou@tangjinou-MS-7C37:~/ml/nlp/LMFlow$ python3 ./examples/chatbot.py
2023-04-04 05:04:55.485866: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Traceback (most recent call last):
File "./examples/chatbot.py", line 11, in
from transformers import HfArgumentParser
ImportError: cannot import name 'HfArgumentParser' from 'transformers' (/home/tangjinou/.local/lib/python3.8/site-packages/transformers/init.py)

find two problems in the codebase

It seems text2text format do not supported by the tokenize function of model. In hf_decoder_model.py file, tokenize_function only encode the first column of dataset, which means the "output" column is ignored.
when I set disable_group_texts=True, I find that in the 146 line of finetuner.py file, "total_length = (total_length // block_size) * block_size", if i have a sequence smaller than block_size, it will result the total_length=0, which ignore the example.

service/app.py cannot run on system without Nvidia/CUDA libary

It seems that the app.py assumes that the web service is always running on a system with CUDA library, which is not necessary the case. Change line 31 below to detect whether CUDA is avaliable would solve the problem:

line 31:
torch.cuda.set_device(local_rank)
updated line 31:
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

But I am not sure if I make it right for system with CUDA.

error when i run run_evaluate.py

Using /root/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
/root/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/utils/cpp_extension.py:366: UserWarning:

                           !! WARNING !!

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++ 4.8.5) may be ABI-incompatible with PyTorch!
Please use a compiler that is ABI-compatible with GCC 5.0 and above.
See https://gcc.gnu.org/onlinedocs/libstdc++/manual/abi.html.

See https://gist.github.com/goldsborough/d466f43e8ffc948ff92de7486c5216d6
for instructions on how to install GCC 5 or higher.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

                          !! WARNING !!

warnings.warn(ABI_INCOMPATIBILITY_WARNING.format(compiler))
Emitting ninja build file /root/.cache/torch_extensions/py39_cu117/utils/build.ninja...
Building extension module utils...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/2] c++ -MMD -MF flatten_unflatten.o.d -DTORCH_EXTENSION_NAME=utils -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -isystem /root/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/include -isystem /root/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /root/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/include/TH -isystem /root/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/include/THC -isystem /root/anaconda3/envs/lmflow/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++17 -c /root/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/ops/csrc/utils/flatten_unflatten.cpp -o flatten_unflatten.o
FAILED: flatten_unflatten.o
c++ -MMD -MF flatten_unflatten.o.d -DTORCH_EXTENSION_NAME=utils -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -isystem /root/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/include -isystem /root/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /root/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/include/TH -isystem /root/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/include/THC -isystem /root/anaconda3/envs/lmflow/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++17 -c /root/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/ops/csrc/utils/flatten_unflatten.cpp -o flatten_unflatten.o
c++: 错误：unrecognized command line option ‘-std=c++17’
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
File "/root/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1893, in _run_ninja_build
subprocess.run(
File "/root/anaconda3/envs/lmflow/lib/python3.9/subprocess.py", line 528, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/root/LMFlow/examples/evaluate.py", line 33, in
model = AutoModel.get_model(model_args, tune_strategy='none', ds_config=ds_config)
File "/root/LMFlow/examples/lmflow/models/auto_model.py", line 14, in get_model
return HFDecoderModel(model_args, *args, **kwargs)
File "/root/LMFlow/examples/lmflow/models/hf_decoder_model.py", line 224, in init
self.ds_engine = deepspeed.initialize(model=self.backend_model, config_params=ds_config)[0]
File "/root/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/init.py", line 125, in initialize
engine = DeepSpeedEngine(args=args,
File "/root/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 347, in init
self.optimizer = self._configure_bf16_optimizer(optimizer=None)
File "/root/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1501, in _configure_bf16_optimizer
optimizer = BF16_Optimizer(
File "/root/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/runtime/bf16_optimizer.py", line 66, in init
util_ops = UtilsBuilder().load()
File "/root/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/ops/op_builder/builder.py", line 485, in load
return self.jit_load(verbose)
File "/root/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/ops/op_builder/builder.py", line 520, in jit_load
op_module = load(
File "/root/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1284, in load
return _jit_compile(
File "/root/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1509, in _jit_compile
_write_ninja_file_and_build_library(
File "/root/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1624, in _write_ninja_file_and_build_library
_run_ninja_build(
File "/root/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1909, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error building extension 'utils'
[2023-04-03 20:03:43,762] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 81900
[2023-04-03 20:03:43,763] [ERROR] [launch.py:324:sigkill_handler] ['/root/anaconda3/envs/lmflow/bin/python', '-u', 'examples/evaluate.py', '--local_rank=0', '--answer_type', 'medmcqa', '--model_name_or_path', 'gpt2-large', '--dataset_path', 'data/MedQA-USMLE/validation', '--deepspeed', 'examples/ds_config.json'] exits with return code = 1
(lmflow) [root@localhost LMFlow]# pip list
Package Version

accelerate 0.17.1
aiohttp 3.8.4
aiosignal 1.3.1
appdirs 1.4.4
async-timeout 4.0.2
attrs 22.2.0
certifi 2022.12.7
charset-normalizer 3.1.0
click 8.1.3
cmake 3.26.0
datasets 2.10.1
deepspeed 0.8.3
dill 0.3.6
docker-pycreds 0.4.0
filelock 3.10.0
Flask 2.2.3
Flask-Cors 3.0.10
frozenlist 1.3.3
fsspec 2023.3.0
gitdb 4.0.10
GitPython 3.1.31
hjson 3.1.0
huggingface-hub 0.13.2
idna 3.4
importlib-metadata 6.1.0
itsdangerous 2.1.2
Jinja2 3.1.2
lit 15.0.7
MarkupSafe 2.1.2
mpi4py 3.1.4
mpmath 1.3.0
multidict 6.0.4
multiprocess 0.70.14
networkx 3.0
ninja 1.11.1
numpy 1.24.2
nvidia-cublas-cu11 11.10.3.66
nvidia-cuda-cupti-cu11 11.7.101
nvidia-cuda-nvrtc-cu11 11.7.99
nvidia-cuda-runtime-cu11 11.7.99
nvidia-cudnn-cu11 8.5.0.96
nvidia-cufft-cu11 10.9.0.58
nvidia-curand-cu11 10.2.10.91
nvidia-cusolver-cu11 11.4.0.1
nvidia-cusparse-cu11 11.7.4.91
nvidia-nccl-cu11 2.14.3
nvidia-nvtx-cu11 11.7.91
packaging 23.0
pandas 1.5.3
pathtools 0.1.2
peft 0.2.0
pip 23.0.1
protobuf 4.22.1
psutil 5.9.4
py-cpuinfo 9.0.0
pyarrow 11.0.0
pydantic 1.10.6
python-dateutil 2.8.2
pytz 2022.7.1
PyYAML 6.0
regex 2022.10.31
requests 2.28.2
responses 0.18.0
sentencepiece 0.1.97
sentry-sdk 1.17.0
setproctitle 1.3.2
setuptools 65.6.3
six 1.16.0
smmap 5.0.0
sympy 1.11.1
tokenizers 0.13.2
torch 2.0.0
tqdm 4.65.0
transformers 4.27.4
triton 2.0.0
trl 0.4.1
typing_extensions 4.5.0
urllib3 1.26.15
wandb 0.14.0
Werkzeug 2.2.3
wheel 0.38.4
xxhash 3.2.0
yarl 1.8.2
zipp 3.15.0

run sh download.sh all failed on win10 and WSL

after pip install -e . "fatal: unable to access 'https://github.com/huggingface/peft/': Failed to connect to github.com Connection timed out"

After I input pip install -e . command, the system raises the error like the following:
"
Obtaining file:///home/huiguo/LMFlow
Preparing metadata (setup.py) ... done
Collecting peft@ git+https://github.com/huggingface/peft@df0e1fb
Cloning https://github.com/huggingface/peft (to revision df0e1fb) to /tmp/pip-install-dwdpzdhq/peft_7f8f6b97ae304fa699f8fb78c5fb0c5e
Running command git clone --filter=blob:none --quiet https://github.com/huggingface/peft /tmp/pip-install-dwdpzdhq/peft_7f8f6b97ae304fa699f8fb78c5fb0c5e
fatal: unable to access 'https://github.com/huggingface/peft/': Failed to connect to github.com port 443 after 129836 ms: Connection timed out
error: subprocess-exited-with-error

× git clone --filter=blob:none --quiet https://github.com/huggingface/peft /tmp/pip-install-dwdpzdhq/peft_7f8f6b97ae304fa699f8fb78c5fb0c5e did not run successfully.
│ exit code: 128
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

note: This error originates from a subprocess, and is likely not a problem with pip.

About deepspeed and fsdp speed differences？

I have encountered some problems which cause that deepspeed cannot be used normally.
AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'
May I ask if you have compared the speed difference between deepspeed and fsdp? only ~16h finetuning on (llama 30b lora) is surprising

About USMLE results

Hello have you tried to fine-tune the llama-7B with the USMLE training set. May you please share the final results you get. I try to do this and get only around 30 ACC. I am wondering whether this is correct. Thanks!

how do you train the instruc-tuned model?

Do you train the models in model zoo in whole model finetune manner or in LoRA manner?
What data do you use in training Hu, Dongshan, Hetian and Altay??

RuntimeError: CUDA error: device-side assert triggered

Traceback (most recent call last):
File "/home/jianc/LMFlow/examples/finetune.py", line 70, in
main()
File "/home/jianc/LMFlow/examples/finetune.py", line 66, in main
tuned_model = finetuner.tune(model=model, lm_dataset=lm_dataset)
File "/home/jianc/LMFlow/src/lmflow/pipeline/finetuner.py", line 232, in tune
train_result = trainer.train(resume_from_checkpoint=checkpoint)
File "/home/jianc/anaconda3/envs/lmflow/lib/python3.9/site-packages/transformers/trainer.py", line 1639, in train
return inner_training_loop(
File "/home/jianc/anaconda3/envs/lmflow/lib/python3.9/site-packages/transformers/trainer.py", line 1906, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/home/jianc/anaconda3/envs/lmflow/lib/python3.9/site-packages/transformers/trainer.py", line 2652, in training_step
loss = self.compute_loss(model, inputs)
File "/home/jianc/anaconda3/envs/lmflow/lib/python3.9/site-packages/transformers/trainer.py", line 2684, in compute_loss
outputs = model(**inputs)
File "/home/jianc/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/jianc/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/utils/nvtx.py", line 11, in wrapped_fn
ret_val = func(*args, **kwargs)
File "/home/jianc/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1846, in forward
loss = self.module(*inputs, **kwargs)
File "/home/jianc/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl
result = forward_call(*args, **kwargs)
File "/home/jianc/anaconda3/envs/lmflow/lib/python3.9/site-packages/peft/peft_model.py", line 529, in forward
return self.base_model(
File "/home/jianc/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl
result = forward_call(*args, **kwargs)
File "/home/jianc/anaconda3/envs/lmflow/lib/python3.9/site-packages/transformers/models/opt/modeling_opt.py", line 936, in forward
outputs = self.model.decoder(
File "/home/jianc/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl
result = forward_call(*args, **kwargs)
File "/home/jianc/anaconda3/envs/lmflow/lib/python3.9/site-packages/transformers/models/opt/modeling_opt.py", line 642, in forward
pos_embeds = self.embed_positions(attention_mask, past_key_values_length)
File "/home/jianc/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
result = hook(self, args)
File "/home/jianc/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/utils/nvtx.py", line 11, in wrapped_fn
ret_val = func(*args, **kwargs)
File "/home/jianc/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/runtime/zero/parameter_offload.py", line 348, in _pre_forward_module_hook
self.pre_sub_module_forward_function(module)
File "/home/jianc/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/jianc/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/runtime/zero/parameter_offload.py", line 478, in pre_sub_module_forward_function
param_coordinator.fetch_sub_module(sub_module)
File "/home/jianc/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/utils/nvtx.py", line 11, in wrapped_fn
ret_val = func(*args, **kwargs)
File "/home/jianc/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/jianc/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 349, in fetch_sub_module
self.__all_gather_params(params_to_prefetch)
File "/home/jianc/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/utils/nvtx.py", line 11, in wrapped_fn
ret_val = func(*args, **kwargs)
File "/home/jianc/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/runtime/zero/partitioned_param_coordinator.py", line 399, in __all_gather_params
handle = partitioned_params[0].all_gather_coalesced(partitioned_params)
File "/home/jianc/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/utils/nvtx.py", line 11, in wrapped_fn
ret_val = func(*args, **kwargs)
File "/home/jianc/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 849, in all_gather_coalesced
param.ds_tensor.to(get_accelerator().current_device_name()),

RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

wandb: ERROR api_key not configured (no-tty) ...

Hi,

When I run the fine-tuning script
./scripts/run_finetune.sh

Some errors occur:

wandb: ERROR api_key not configured (no-tty). call wandb.login(key=[your_api_key])
Traceback (most recent call last):
  File "/home/yan/Documents/LMFlow/examples/finetune.py", line 70, in <module>
    main()
  File "/home/yan/Documents/LMFlow/examples/finetune.py", line 66, in main
    tuned_model = finetuner.tune(model=model, lm_dataset=lm_dataset)
  File "/home/yan/Documents/LMFlow/src/lmflow/pipeline/finetuner.py", line 232, in tune
    train_result = trainer.train(resume_from_checkpoint=checkpoint)
  File "/home/yan/anaconda3/envs/lmflow/lib/python3.9/site-packages/transformers/trainer.py", line 1639, in train
    return inner_training_loop(
  File "/home/yan/anaconda3/envs/lmflow/lib/python3.9/site-packages/transformers/trainer.py", line 1822, in _inner_training_loop
    self.control = self.callback_handler.on_train_begin(args, self.state, self.control)
  File "/home/yan/anaconda3/envs/lmflow/lib/python3.9/site-packages/transformers/trainer_callback.py", line 353, in on_train_begin
    return self.call_event("on_train_begin", args, state, control)
  File "/home/yan/anaconda3/envs/lmflow/lib/python3.9/site-packages/transformers/trainer_callback.py", line 397, in call_event
    result = getattr(callback, event)(
  File "/home/yan/anaconda3/envs/lmflow/lib/python3.9/site-packages/transformers/integrations.py", line 764, in on_train_begin
    self.setup(args, state, model, **kwargs)
  File "/home/yan/anaconda3/envs/lmflow/lib/python3.9/site-packages/transformers/integrations.py", line 738, in setup
    self._wandb.init(
  File "/home/yan/anaconda3/envs/lmflow/lib/python3.9/site-packages/wandb/sdk/wandb_init.py", line 1140, in init
    wi.setup(kwargs)
  File "/home/yan/anaconda3/envs/lmflow/lib/python3.9/site-packages/wandb/sdk/wandb_init.py", line 288, in setup
    wandb_login._login(
  File "/home/yan/anaconda3/envs/lmflow/lib/python3.9/site-packages/wandb/sdk/wandb_login.py", line 298, in _login
    wlogin.prompt_api_key()
  File "/home/yan/anaconda3/envs/lmflow/lib/python3.9/site-packages/wandb/sdk/wandb_login.py", line 228, in prompt_api_key
    raise UsageError("api_key not configured (no-tty). call " + directive)
wandb.errors.UsageError: api_key not configured (no-tty). call wandb.login(key=[your_api_key])
[2023-04-03 10:07:42,282] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 16203
[2023-04-03 10:07:42,283] [ERROR] [launch.py:324:sigkill_handler] ['/home/yan/anaconda3/envs/lmflow/bin/python', '-u', 'examples/finetune.py', '--local_rank=0', '--model_name_or_path', 'gpt2', '--dataset_path', '/home/yan/Documents/LMFlow/data/alpaca/train', '--output_dir', '/home/yan/Documents/LMFlow/output_models/finetune', '--overwrite_output_dir', '--num_train_epochs', '0.01', '--learning_rate', '2e-5', '--block_size', '512', '--per_device_train_batch_size', '1', '--deepspeed', 'configs/ds_config_zero3.json', '--bf16', '--run_name', 'finetune', '--validation_split_percentage', '0', '--logging_steps', '20', '--do_train', '--ddp_timeout', '72000', '--save_steps', '5000', '--dataloader_num_workers', '1'] exits with return code = 1

Would you please provide some advice?

网页版好像所答非所问

http://lmflow.com/chat

RLHF

do you try add rlhf for better result?

OSError: output_models/finetune_with_lora/checkpoint-5000 does not appear to have a file named config.json.

LoRA does not support RAM optimized load currently. Automatically use original load instead.
Traceback (most recent call last):
File "/root/LMFlow/examples/evaluate.py", line 33, in
model = AutoModel.get_model(model_args, tune_strategy='none', ds_config=ds_config)
File "/root/LMFlow/examples/lmflow/models/auto_model.py", line 14, in get_model
return HFDecoderModel(model_args, *args, **kwargs)
File "/root/LMFlow/examples/lmflow/models/hf_decoder_model.py", line 213, in init
self.backend_model = AutoModelForCausalLM.from_pretrained(
File "/root/anaconda3/envs/lmflow/lib/python3.9/site-packages/transformers/models/auto/auto_factory.py", line 441, in from_pretrained
config, kwargs = AutoConfig.from_pretrained(
File "/root/anaconda3/envs/lmflow/lib/python3.9/site-packages/transformers/models/auto/configuration_auto.py", line 896, in from_pretrained
config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
File "/root/anaconda3/envs/lmflow/lib/python3.9/site-packages/transformers/configuration_utils.py", line 573, in get_config_dict
config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
File "/root/anaconda3/envs/lmflow/lib/python3.9/site-packages/transformers/configuration_utils.py", line 628, in _get_config_dict
resolved_config_file = cached_file(
File "/root/anaconda3/envs/lmflow/lib/python3.9/site-packages/transformers/utils/hub.py", line 380, in cached_file
raise EnvironmentError(
OSError: output_models/finetune_with_lora/checkpoint-5000 does not appear to have a file named config.json. Checkout 'https://huggingface.co/output_models/finetune_with_lora/checkpoint-5000/None' for available files.
[2023-04-04 14:37:35,087] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 1580
[2023-04-04 14:37:35,088] [ERROR] [launch.py:324:sigkill_handler] ['/root/anaconda3/envs/lmflow/bin/python', '-u', 'examples/evaluate.py', '--local_rank=0', '--answer_type', 'text', '--model_name_or_path', 'output_models/finetune_with_lora/checkpoint-5000', '--lora_model_path', 'output_models/finetune_with_lora', '--dataset_path', 'data/alpaca/test', '--prompt_structure', 'Input: {input}', '--deepspeed', 'examples/ds_config.json'] exits with return code = 1, model path那边不是微调过后的checkpoint 的path吗

no data or dataset return

After updating to the newest version of LMFlow, I met this problem (it does not happen before).

My running code is:

Could you please help me figure it out?

Thanks

AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'

RuntimeError: Error building extension 'cpu_adam'
Exception ignored in: <function DeepSpeedCPUAdam.del at 0x7f683231b670>
Traceback (most recent call last):
File "/home/u20/miniconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/ops/adam/cpu_adam.py", line 110, in del
AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'
[2023-04-03 12:50:15,113] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 21626
[2023-04-03 12:50:15,113] [ERROR] [launch.py:324:sigkill_handler] ['/home/u20/miniconda3/envs/lmflow/bin/python', '-u', 'examples/finetune.py', '--local_rank=0', '--model_name_or_path', 'gpt2', '--dataset_path', '/home/u20/LMFlow/data/alpaca/train', '--output_dir', '/home/u20/LMFlow/output_models/finetune', '--overwrite_output_dir', '--num_train_epochs', '0.01', '--learning_rate', '2e-5', '--block_size', '512', '--per_device_train_batch_size', '1', '--deepspeed', 'configs/ds_config_zero3.json', '--bf16', '--run_name', 'finetune', '--validation_split_percentage', '0', '--logging_steps', '20', '--do_train', '--ddp_timeout', '72000', '--save_steps', '5000', '--dataloader_num_workers', '1'] exits with return code = 1

error when run ./scripts/run_finetune.sh
i have gpu and cuda installed,
why it raises cpu error?

./scripts/run_finetune_with_lora.sh also raise same error

Error when running un_finetune.sh

(lmflow) [63@tianlu LMFlow]$ ./scripts/run_finetune.sh --num_gpus=1
[2023-04-03 04:05:37,917] [WARNING] [runner.py:186:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
[2023-04-03 04:05:37,992] [INFO] [runner.py:550:main] cmd = /home/dw63/anaconda3/envs/lmflow/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMF19 --master_addr=127.0.0.1 --master_port=29500 --enable_each_rank_log=None examples/finetune.py --model_name_or_path gpt2 --dataset_path /home/dw63/john/LMFlow/data/alpaca/train --output_dir /home/dw63/john/LMFlow/output_models/finetune --overwrite_output_dir --num_train_epochs 0.01 --learning_rate 2e-5 --block_size 512 --per_device_train_batch_size 1 --deepspeed configs/ds_config_zero3.json --bf16 --run_name finetune --validation_split_percentage 0 --logging_steps 20 --do_train --ddp_timeout 72000 --save_steps 5000 --dataloader_num_workers 1
[2023-04-03 04:05:39,299] [INFO] [launch.py:142:main] WORLD INFO DICT: {'localhost': [0]}
[2023-04-03 04:05:39,299] [INFO] [launch.py:148:main] nnodes=1, num_local_procs=1, node_rank=0
[2023-04-03 04:05:39,299] [INFO] [launch.py:161:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0]})
[2023-04-03 04:05:39,299] [INFO] [launch.py:162:main] dist_world_size=1
[2023-04-03 04:05:39,299] [INFO] [launch.py:164:main] Setting CUDA_VISIBLE_DEVICES=0
[2023-04-03 04:05:41,749] [INFO] [comm.py:652:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
04/03/2023 04:05:41 - WARNING - lmflow.pipeline.finetuner - Process rank: 0, device: cuda:0, n_gpu: 1distributed training: True, 16-bits training: False
04/03/2023 04:05:42 - WARNING - datasets.builder - Found cached dataset json (/home/dw63/.cache/huggingface/datasets/json/default-41b7151d126375bd/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)
[2023-04-03 04:05:44,808] [INFO] [partition_parameters.py:415:exit] finished initializing model with 0.16B parameters
/home/dw63/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py:2547: UserWarning: torch.distributed.all_gather_base is a private function and will be deprecated. Please use torch.distributed.all_gather_into_tensor instead.
warnings.warn(
04/03/2023 04:05:45 - WARNING - datasets.fingerprint - Parameter 'function'=<function HFDecoderModel.tokenize..tokenize_function at 0x7f2dd4436820> of the transform datasets.arrow_dataset.Dataset.map_single couldn't be hashed properly, a random hash was used instead. Make sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and caching to work. If you reuse this transform, the caching mechanism will consider it to be different from the previous calls and recompute everything. This warning is only showed once. Subsequent hashing failures won't be showed.
04/03/2023 04:05:45 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at /home/dw63/.cache/huggingface/datasets/json/default-41b7151d126375bd/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-1c80317fa3b1799d.arrow
04/03/2023 04:05:45 - WARNING - datasets.arrow_dataset - Loading cached processed dataset at /home/dw63/.cache/huggingface/datasets/json/default-41b7151d126375bd/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51/cache-bbe2d282518ba636.arrow
Installed CUDA version 11.4 does not match the version torch was compiled with 11.7 but since the APIs are compatible, accepting this combination
Using /home/dw63/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /home/dw63/.cache/torch_extensions/py39_cu117/cpu_adam/build.ninja...
Building extension module cpu_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/3] /usr/local/cuda-11.4/bin/nvcc -ccbin /home/dw63/anaconda3/envs/lmflow/bin/x86_64-conda-linux-gnu-cc -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -I/home/dw63/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/ops/csrc/includes -I/usr/local/cuda-11.4/include -isystem /home/dw63/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/include -isystem /home/dw63/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /home/dw63/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/include/TH -isystem /home/dw63/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/include/THC -isystem /usr/local/cuda-11.4/include -isystem /home/dw63/anaconda3/envs/lmflow/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS -D__CUDA_NO_BFLOAT16_CONVERSIONS_ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86 --compiler-options '-fPIC' -O3 --use_fast_math -std=c++14 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_86,code=compute_86 -c /home/dw63/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/ops/csrc/common/custom_cuda_kernel.cu -o custom_cuda_kernel.cuda.o
[2/3] /home/dw63/anaconda3/envs/lmflow/bin/x86_64-conda-linux-gnu-c++ -MMD -MF cpu_adam.o.d -DTORCH_EXTENSION_NAME=cpu_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -I/home/dw63/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/ops/csrc/includes -I/usr/local/cuda-11.4/include -isystem /home/dw63/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/include -isystem /home/dw63/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /home/dw63/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/include/TH -isystem /home/dw63/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/include/THC -isystem /usr/local/cuda-11.4/include -isystem /home/dw63/anaconda3/envs/lmflow/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++17 -O3 -std=c++14 -g -Wno-reorder -L/usr/local/cuda-11.4/lib64 -lcudart -lcublas -g -march=native -fopenmp -D__AVX512 -D__ENABLE_CUDA_ -c /home/dw63/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/ops/csrc/adam/cpu_adam.cpp -o cpu_adam.o
[3/3] /home/dw63/anaconda3/envs/lmflow/bin/x86_64-conda-linux-gnu-c++ cpu_adam.o custom_cuda_kernel.cuda.o -shared -lcurand -L/home/dw63/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/lib -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -ltorch_python -L/usr/local/cuda-11.4/lib64 -lcudart -o cpu_adam.so
Loading extension module cpu_adam...
Time to load cpu_adam op: 23.398841857910156 seconds
Using /home/dw63/.cache/torch_extensions/py39_cu117 as PyTorch extensions root...
Emitting ninja build file /home/dw63/.cache/torch_extensions/py39_cu117/utils/build.ninja...
Building extension module utils...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/2] /home/dw63/anaconda3/envs/lmflow/bin/x86_64-conda-linux-gnu-c++ -MMD -MF flatten_unflatten.o.d -DTORCH_EXTENSION_NAME=utils -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -isystem /home/dw63/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/include -isystem /home/dw63/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /home/dw63/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/include/TH -isystem /home/dw63/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/include/THC -isystem /home/dw63/anaconda3/envs/lmflow/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++17 -c /home/dw63/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/ops/csrc/utils/flatten_unflatten.cpp -o flatten_unflatten.o
FAILED: flatten_unflatten.o
/home/dw63/anaconda3/envs/lmflow/bin/x86_64-conda-linux-gnu-c++ -MMD -MF flatten_unflatten.o.d -DTORCH_EXTENSION_NAME=utils -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="_gcc" -DPYBIND11_STDLIB="_libstdcpp" -DPYBIND11_BUILD_ABI="_cxxabi1011" -isystem /home/dw63/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/include -isystem /home/dw63/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -isystem /home/dw63/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/include/TH -isystem /home/dw63/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/include/THC -isystem /home/dw63/anaconda3/envs/lmflow/include/python3.9 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++17 -c /home/dw63/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/ops/csrc/utils/flatten_unflatten.cpp -o flatten_unflatten.o
In file included from /home/dw63/anaconda3/envs/lmflow/x86_64-conda-linux-gnu/include/c++/11.2.0/chrono:42,
from /home/dw63/anaconda3/envs/lmflow/x86_64-conda-linux-gnu/include/c++/11.2.0/mutex:39,
from /home/dw63/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/include/ATen/core/Generator.h:4,
from /home/dw63/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/include/ATen/CPUGeneratorImpl.h:3,
from /home/dw63/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/include/ATen/Context.h:3,
from /home/dw63/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/include/ATen/ATen.h:7,
from /home/dw63/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/include/torch/csrc/utils/tensor_flatten.h:3,
from /home/dw63/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/ops/csrc/utils/flatten_unflatten.cpp:7:
/home/dw63/anaconda3/envs/lmflow/x86_64-conda-linux-gnu/include/c++/11.2.0/ctime:80:11: error: 'timespec_get' has not been declared in '::'
80 | using ::timespec_get;
| ^~~~~~~~~~~~
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
File "/home/dw63/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1893, in _run_ninja_build
subprocess.run(
File "/home/dw63/anaconda3/envs/lmflow/lib/python3.9/subprocess.py", line 528, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/dw63/john/LMFlow/examples/finetune.py", line 70, in
main()
File "/home/dw63/john/LMFlow/examples/finetune.py", line 66, in main
tuned_model = finetuner.tune(model=model, lm_dataset=lm_dataset)
File "/home/dw63/john/LMFlow/src/lmflow/pipeline/finetuner.py", line 232, in tune
train_result = trainer.train(resume_from_checkpoint=checkpoint)
File "/home/dw63/anaconda3/envs/lmflow/lib/python3.9/site-packages/transformers/trainer.py", line 1639, in train
return inner_training_loop(
File "/home/dw63/anaconda3/envs/lmflow/lib/python3.9/site-packages/transformers/trainer.py", line 1708, in _inner_training_loop
deepspeed_engine, optimizer, lr_scheduler = deepspeed_init(
File "/home/dw63/anaconda3/envs/lmflow/lib/python3.9/site-packages/transformers/deepspeed.py", line 378, in deepspeed_init
deepspeed_engine, optimizer, _, lr_scheduler = deepspeed.initialize(**kwargs)
File "/home/dw63/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/init.py", line 125, in initialize
engine = DeepSpeedEngine(args=args,
File "/home/dw63/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 340, in init
self._configure_optimizer(optimizer, model_parameters)
File "/home/dw63/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1298, in _configure_optimizer
self.optimizer = self._configure_zero_optimizer(basic_optimizer)
File "/home/dw63/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1599, in _configure_zero_optimizer
optimizer = DeepSpeedZeroOptimizer_Stage3(
File "/home/dw63/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/runtime/zero/stage3.py", line 130, in init
util_ops = UtilsBuilder().load()
File "/home/dw63/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/ops/op_builder/builder.py", line 485, in load
return self.jit_load(verbose)
File "/home/dw63/anaconda3/envs/lmflow/lib/python3.9/site-packages/deepspeed/ops/op_builder/builder.py", line 520, in jit_load
op_module = load(
File "/home/dw63/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1284, in load
return _jit_compile(
File "/home/dw63/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1509, in _jit_compile
_write_ninja_file_and_build_library(
File "/home/dw63/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1624, in _write_ninja_file_and_build_library
_run_ninja_build(
File "/home/dw63/anaconda3/envs/lmflow/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1909, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error building extension 'utils'
[2023-04-03 04:06:23,357] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 1314014
[2023-04-03 04:06:23,358] [ERROR] [launch.py:324:sigkill_handler] ['/home/dw63/anaconda3/envs/lmflow/bin/python', '-u', 'examples/finetune.py', '--local_rank=0', '--model_name_or_path', 'gpt2', '--dataset_path', '/home/dw63/john/LMFlow/data/alpaca/train', '--output_dir', '/home/dw63/john/LMFlow/output_models/finetune', '--overwrite_output_dir', '--num_train_epochs', '0.01', '--learning_rate', '2e-5', '--block_size', '512', '--per_device_train_batch_size', '1', '--deepspeed', 'configs/ds_config_zero3.json', '--bf16', '--run_name', 'finetune', '--validation_split_percentage', '0', '--logging_steps', '20', '--do_train', '--ddp_timeout', '72000', '--save_steps', '5000', '--dataloader_num_workers', '1'] exits with return code = 1

What is the difference between LMFlow and huggingface/peft libraries？

sh run_finetune_with_lora.sh报错：RuntimeError: CUDA error: device-side assert triggered

Traceback (most recent call last):
File "/mnt/amj/LMFlow/examples/finetune.py", line 69, in
main()
File "/mnt/amj/LMFlow/examples/finetune.py", line 65, in main
tuned_model = finetuner.tune(model=model, lm_dataset=lm_dataset)
File "/mnt/amj/LMFlow/src/lmflow/pipeline/finetuner.py", line 232, in tune
train_result = trainer.train(resume_from_checkpoint=checkpoint)
File "/mnt/amj/conda/envs/lmflow/lib/python3.9/site-packages/transformers/trainer.py", line 1639, in train
return inner_training_loop(
File "/mnt/amj/conda/envs/lmflow/lib/python3.9/site-packages/transformers/trainer.py", line 1906, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/mnt/amj/conda/envs/lmflow/lib/python3.9/site-packages/transformers/trainer.py", line 2652, in training_step
loss = self.compute_loss(model, inputs)
File "/mnt/amj/conda/envs/lmflow/lib/python3.9/site-packages/transformers/trainer.py", line 2684, in compute_loss
outputs = model(**inputs)
File "/mnt/amj/conda/envs/lmflow/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/mnt/amj/conda/envs/lmflow/lib/python3.9/site-packages/deepspeed/utils/nvtx.py", line 11, in wrapped_fn
ret_val = func(*args, **kwargs)
File "/mnt/amj/conda/envs/lmflow/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1846, in forward
loss = self.module(*inputs, **kwargs)
File "/mnt/amj/conda/envs/lmflow/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl
result = forward_call(*args, **kwargs)
File "/mnt/amj/conda/envs/lmflow/lib/python3.9/site-packages/transformers/models/opt/modeling_opt.py", line 936, in forward
outputs = self.model.decoder(
File "/mnt/amj/conda/envs/lmflow/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl
result = forward_call(*args, **kwargs)
File "/mnt/amj/conda/envs/lmflow/lib/python3.9/site-packages/transformers/models/opt/modeling_opt.py", line 644, in forward
attention_mask = self._prepare_decoder_attention_mask(
File "/mnt/amj/conda/envs/lmflow/lib/python3.9/site-packages/transformers/models/opt/modeling_opt.py", line 538, in _prepare_decoder_attention_mask
combined_attention_mask = _make_causal_mask(
File "/mnt/amj/conda/envs/lmflow/lib/python3.9/site-packages/transformers/models/opt/modeling_opt.py", line 75, in _make_causal_mask
mask = torch.full((tgt_len, tgt_len), torch.tensor(torch.finfo(dtype).min, device=device), device=device)
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
请问这是什么问题呢

deepspeed ${deepspeed_args}
examples/finetune.py
--model_name_or_path facebook/galactica-1.3b
--dataset_path ${dataset_path}
--output_dir ${output_dir} --overwrite_output_dir
--num_train_epochs 0.01
--learning_rate 1e-4
--block_size 512
--per_device_train_batch_size 1
--use_lora 1
--lora_r 8
--deepspeed configs/ds_config_zero3.json
--bf16
--run_name finetune_with_lora
--validation_split_percentage 0
--logging_steps 20
--do_train
--ddp_timeout 72000
--save_steps 5000
--report_to none
--dataloader_num_workers 1
| tee ${log_dir}/train.log
2> ${log_dir}/train.err

参数用的是默认的，GPU只有一个A100,40G显存

Message: File not found.

Hi team,

Thanks for the great work!

Just an issue when downloading the data. Everything else is OK except when downloading 144.214.54.164:5000/natural-instructions.tar.gz, which returns error messages below:

Error response
Error code: 404

Message: File not found.

Error code explanation: HTTPStatus.NOT_FOUND - Nothing matches the given URI.

Could you please help check out?

Lots of thanks!
George

NotImplementedError: Cannot copy out of meta tensor; no data!

run:

CUDA_VISIBLE_DEVICES=1 deepspeed examples/evaluate.py --answer_type text --model_name_or_path llama-hf-path/llama-7b-hf --lora_model_path output_models/llama7b-lora-380k --test_file data/alpaca/test/test_252.json --deepspeed examples/ds_config.json

error message as follow:

Traceback (most recent call last):
File "/data/zhanghanyi/LMFlow/examples/evaluate.py", line 34, in
model = AutoModel.get_model(model_args, tune_strategy='none', ds_config=ds_config)
File "/data/zhanghanyi/LMFlow/src/lmflow/models/auto_model.py", line 14, in get_model
return HFDecoderModel(model_args, *args, **kwargs)
File "/data/zhanghanyi/LMFlow/src/lmflow/models/hf_decoder_model.py", line 197, in init
self.ds_engine = deepspeed.initialize(model=self.backend_model, config_params=ds_config)[0]
File "/home/deepwisdom/anaconda3/envs/zhy_lmflow/lib/python3.9/site-packages/deepspeed/init.py", line 125, in initialize
engine = DeepSpeedEngine(args=args,
File "/home/deepwisdom/anaconda3/envs/zhy_lmflow/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 297, in init
self._configure_distributed_model(model)
File "/home/deepwisdom/anaconda3/envs/zhy_lmflow/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1146, in _configure_distributed_model
self.module.to(self.device)
File "/home/deepwisdom/anaconda3/envs/zhy_lmflow/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1145, in to
return self._apply(convert)
File "/home/deepwisdom/anaconda3/envs/zhy_lmflow/lib/python3.9/site-packages/torch/nn/modules/module.py", line 797, in _apply
module._apply(fn)
File "/home/deepwisdom/anaconda3/envs/zhy_lmflow/lib/python3.9/site-packages/torch/nn/modules/module.py", line 797, in _apply
module._apply(fn)
File "/home/deepwisdom/anaconda3/envs/zhy_lmflow/lib/python3.9/site-packages/torch/nn/modules/module.py", line 797, in _apply
module._apply(fn)
[Previous line repeated 5 more times]
File "/home/deepwisdom/anaconda3/envs/zhy_lmflow/lib/python3.9/site-packages/torch/nn/modules/module.py", line 820, in _apply
param_applied = fn(param)
File "/home/deepwisdom/anaconda3/envs/zhy_lmflow/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1143, in convert
return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
NotImplementedError: Cannot copy out of meta tensor; no data!

Using default model(gpt2) also has error:

why should json_data should have the same data type?

Hi, why "text_only" & "text2text" can not be trained simultaneously?

optimalscale / lmflow Goto Github PK

lmflow's Introduction

LMFlow

English | 简体中文 | Español | 日本語 | 한국어 | हिंदी

Latest News

Table of Contents

Quick Start

Setup

Prepare Dataset

Finetuning (Full)

Finetuning (LISA)

Finetuning (LoRA)

Inference

Deployment

Evaluation

Supported Features

Support

License

Citation

lmflow's People

Contributors

Stargazers

Watchers

Forkers

lmflow's Issues

Recommend Projects

Recommend Topics

Recommend Org