openaccess-ai-collective / axolotl Goto Github PK

Go ahead and axolotl questions

Home Page: https://axolotl-ai-cloud.github.io/axolotl/

License: Apache License 2.0

Python 99.18% Shell 0.55% Dockerfile 0.13% Jinja 0.14% CSS 0.01%

axolotl's Introduction

Axolotl

Axolotl is a tool designed to streamline the fine-tuning of various AI models, offering support for multiple configurations and architectures.

Features:

Train various Huggingface models such as llama, pythia, falcon, mpt
Supports fullfinetune, lora, qlora, relora, and gptq
Customize configurations using a simple yaml file or CLI overwrite
Load different dataset formats, use custom formats, or bring your own tokenized datasets
Integrated with xformer, flash attention, rope scaling, and multipacking
Works with single GPU or multiple GPUs via FSDP or Deepspeed
Easily run with Docker locally or on the cloud
Log results and optionally checkpoints to wandb or mlflow
And more!

Introduction
Supported Features
Quickstart
Environment
- Docker
- Conda/Pip venv
- Cloud GPU - Latitude.sh, JarvisLabs, RunPod
- Bare Metal Cloud GPU
- Windows
- Mac
- Google Colab
- Launching on public clouds via SkyPilot
- Launching on public clouds via dstack
Dataset
Config
Advanced Topics
Common Errors
- Tokenization Mismatch b/w Training & Inference
Debugging Axolotl
Need Help?
Badge
Community Showcase
Contributing
Sponsors

Axolotl provides a unified repository for fine-tuning
a variety of AI models with ease

Go ahead and Axolotl questions!!

Axolotl supports

	fp16/fp32	lora	qlora	gptq	gptq w/flash attn	flash attn	xformers attn
llama	✅	✅	✅	✅	✅	✅	✅
Mistral	✅	✅	✅	✅	✅	✅	✅
Mixtral-MoE	✅	✅	✅	❓	❓	❓	❓
Mixtral8X22	✅	✅	✅	❓	❓	❓	❓
Pythia	✅	✅	✅	❌	❌	❌	❓
cerebras	✅	✅	✅	❌	❌	❌	❓
btlm	✅	✅	✅	❌	❌	❌	❓
mpt	✅	❌	❓	❌	❌	❌	❓
falcon	✅	✅	✅	❌	❌	❌	❓
gpt-j	✅	✅	✅	❌	❌	❓	❓
XGen	✅	❓	✅	❓	❓	❓	✅
phi	✅	✅	✅	❓	❓	❓	❓
RWKV	✅	❓	❓	❓	❓	❓	❓
Qwen	✅	✅	✅	❓	❓	❓	❓
Gemma	✅	✅	✅	❓	❓	✅	❓

✅: supported ❌: not supported ❓: untested

Quickstart ⚡

Get started with Axolotl in just a few steps! This quickstart guide will walk you through setting up and running a basic fine-tuning task.

Requirements: Python >=3.10 and Pytorch >=2.1.1.

git clone https://github.com/axolotl-ai-cloud/axolotl
cd axolotl

pip3 install packaging ninja
pip3 install -e '.[flash-attn,deepspeed]'

Usage

# preprocess datasets - optional but recommended
CUDA_VISIBLE_DEVICES="" python -m axolotl.cli.preprocess examples/openllama-3b/lora.yml

# finetune lora
accelerate launch -m axolotl.cli.train examples/openllama-3b/lora.yml

# inference
accelerate launch -m axolotl.cli.inference examples/openllama-3b/lora.yml \
    --lora_model_dir="./outputs/lora-out"

# gradio
accelerate launch -m axolotl.cli.inference examples/openllama-3b/lora.yml \
    --lora_model_dir="./outputs/lora-out" --gradio

# remote yaml files - the yaml config can be hosted on a public URL
# Note: the yaml config must directly link to the **raw** yaml
accelerate launch -m axolotl.cli.train https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/examples/openllama-3b/lora.yml

Advanced Setup

Environment

Docker

docker run --gpus '"all"' --rm -it winglian/axolotl:main-latest

Or run on the current files for development:

docker compose up -d

Tip

If you want to debug axolotl or prefer to use Docker as your development environment, see the debugging guide's section on Docker.

Docker advanced

A more powerful Docker command to run would be this:

docker run --privileged --gpus '"all"' --shm-size 10g --rm -it --name axolotl --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 --mount type=bind,src="${PWD}",target=/workspace/axolotl -v ${HOME}/.cache/huggingface:/root/.cache/huggingface winglian/axolotl:main-latest

It additionally:

Prevents memory issues when running e.g. deepspeed (e.g. you could hit SIGBUS/signal 7 error) through --ipc and --ulimit args.
Persists the downloaded HF data (models etc.) and your modifications to axolotl code through --mount/-v args.
The --name argument simply makes it easier to refer to the container in vscode (Dev Containers: Attach to Running Container...) or in your terminal.
The --privileged flag gives all capabilities to the container.
The --shm-size 10g argument increases the shared memory size. Use this if you see exitcode: -7 errors using deepspeed.

More information on nvidia website

Conda/Pip venv

Install python >=3.10
Install pytorch stable https://pytorch.org/get-started/locally/

Install Axolotl along with python dependencies

pip3 install packaging
pip3 install -e '.[flash-attn,deepspeed]'

(Optional) Login to Huggingface to use gated models/datasets.
```
huggingface-cli login
```
Get the token at huggingface.co/settings/tokens

Cloud GPU

For cloud GPU providers that support docker images, use winglian/axolotl-cloud:main-latest

on Latitude.sh use this direct link
on JarvisLabs.ai use this direct link
on RunPod use this direct link

Bare Metal Cloud GPU

LambdaLabs

Click to Expand

Install python

sudo apt update
sudo apt install -y python3.10

sudo update-alternatives --install /usr/bin/python python /usr/bin/python3.10 1
sudo update-alternatives --config python # pick 3.10 if given option
python -V # should be 3.10

Install pip

wget https://bootstrap.pypa.io/get-pip.py
python get-pip.py

Install Pytorch https://pytorch.org/get-started/locally/
Follow instructions on quickstart.
Run

pip3 install protobuf==3.20.3
pip3 install -U --ignore-installed requests Pillow psutil scipy

Set path

export LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu:$LD_LIBRARY_PATH

GCP

Click to Expand

Use a Deeplearning linux OS with cuda and pytorch installed. Then follow instructions on quickstart.

Make sure to run the below to uninstall xla.

pip uninstall -y torch_xla[tpu]

Windows

Please use WSL or Docker!

Mac

Use the below instead of the install method in QuickStart.

pip3 install -e '.'

More info: mac.md

Google Colab

Please use this example notebook.

Launching on public clouds via SkyPilot

To launch on GPU instances (both on-demand and spot instances) on 7+ clouds (GCP, AWS, Azure, OCI, and more), you can use SkyPilot:

pip install "skypilot-nightly[gcp,aws,azure,oci,lambda,kubernetes,ibm,scp]"  # choose your clouds
sky check

Get the example YAMLs of using Axolotl to finetune mistralai/Mistral-7B-v0.1:

git clone https://github.com/skypilot-org/skypilot.git
cd skypilot/llm/axolotl

Use one command to launch:

# On-demand
HF_TOKEN=xx sky launch axolotl.yaml --env HF_TOKEN

# Managed spot (auto-recovery on preemption)
HF_TOKEN=xx BUCKET=<unique-name> sky spot launch axolotl-spot.yaml --env HF_TOKEN --env BUCKET

Launching on public clouds via dstack

To launch on GPU instance (both on-demand and spot instances) on public clouds (GCP, AWS, Azure, Lambda Labs, TensorDock, Vast.ai, and CUDO), you can use dstack.

Write a job description in YAML as below:

# dstack.yaml
type: task

image: winglian/axolotl-cloud:main-20240429-py3.11-cu121-2.2.2

env:
  - HUGGING_FACE_HUB_TOKEN
  - WANDB_API_KEY

commands:
  - accelerate launch -m axolotl.cli.train config.yaml

ports:
  - 6006

resources:
  gpu:
    memory: 24GB..
    count: 2

then, simply run the job with dstack run command. Append --spot option if you want spot instance. dstack run command will show you the instance with cheapest price across multi cloud services:

pip install dstack
HUGGING_FACE_HUB_TOKEN=xxx WANDB_API_KEY=xxx dstack run . -f dstack.yaml # --spot

For further and fine-grained use cases, please refer to the official dstack documents and the detailed description of axolotl example on the official repository.

Dataset

Axolotl supports a variety of dataset formats. It is recommended to use a JSONL. The schema of the JSONL depends upon the task and the prompt template you wish to use. Instead of a JSONL, you can also use a HuggingFace dataset with columns for each JSONL field.

See these docs for more information on how to use different dataset formats.

Config

See examples for quick start. It is recommended to duplicate and modify to your needs. The most important options are:

model

base_model: ./llama-7b-hf # local or huggingface repo

Note: The code will load the right architecture.

dataset

datasets:
    # huggingface repo
  - path: vicgalle/alpaca-gpt4
    type: alpaca

    # huggingface repo with specific configuration/subset
  - path: EleutherAI/pile
    name: enron_emails
    type: completion # format from earlier
    field: text # Optional[str] default: text, field to use for completion data

    # huggingface repo with multiple named configurations/subsets
  - path: bigcode/commitpackft
    name:
      - ruby
      - python
      - typescript
    type: ... # unimplemented custom format

    # fastchat conversation
    # See 'conversation' options: https://github.com/lm-sys/FastChat/blob/main/fastchat/conversation.py
  - path: ...
    type: sharegpt
    conversation: chatml # default: vicuna_v1.1

    # local
  - path: data.jsonl # or json
    ds_type: json # see other options below
    type: alpaca

    # dataset with splits, but no train split
  - path: knowrohit07/know_sql
    type: context_qa.load_v2
    train_on_split: validation

    # loading from s3 or gcs
    # s3 creds will be loaded from the system default and gcs only supports public access
  - path: s3://path_to_ds # Accepts folder with arrow/parquet or file path like above. Supports s3, gcs.
    ...

    # Loading Data From a Public URL
    # - The file format is `json` (which includes `jsonl`) by default. For different formats, adjust the `ds_type` option accordingly.
  - path: https://some.url.com/yourdata.jsonl # The URL should be a direct link to the file you wish to load. URLs must use HTTPS protocol, not HTTP.
    ds_type: json # this is the default, see other options below.

load_in_4bit: true
load_in_8bit: true

bf16: auto # require >=ampere, auto will detect if your GPU supports this and choose automatically.
fp16: # leave empty to use fp16 when bf16 is 'auto'. set to false if you want to fallback to fp32
tf32: true # require >=ampere

bfloat16: true # require >=ampere, use instead of bf16 when you don't want AMP (automatic mixed precision)
float16: true # use instead of fp16 when you don't want AMP

Note: Repo does not do 4-bit quantization.

lora

adapter: lora # 'qlora' or leave blank for full finetune
lora_r: 8
lora_alpha: 16
lora_dropout: 0.05
lora_target_modules:
  - q_proj
  - v_proj

All Config Options

See these docs for all config options.

Train

Run

accelerate launch -m axolotl.cli.train your_config.yml

Tip

You can also reference a config file that is hosted on a public URL, for example accelerate launch -m axolotl.cli.train https://yourdomain.com/your_config.yml

Preprocess dataset

You can optionally pre-tokenize dataset with the following before finetuning. This is recommended for large datasets.

Set dataset_prepared_path: to a local folder for saving and loading pre-tokenized dataset.
(Optional): Set push_dataset_to_hub: hf_user/repo to push it to Huggingface.
(Optional): Use --debug to see preprocessed examples.

python -m axolotl.cli.preprocess your_config.yml

Multi-GPU

Below are the options available in axolotl for training with multiple GPUs. Note that DeepSpeed is the recommended multi-GPU option currently because FSDP may experience loss instability.

DeepSpeed

Deepspeed is an optimization suite for multi-gpu systems allowing you to train much larger models than you might typically be able to fit into your GPU's VRAM. More information about the various optimization types for deepspeed is available at https://huggingface.co/docs/accelerate/main/en/usage_guides/deepspeed#what-is-integrated

We provide several default deepspeed JSON configurations for ZeRO stage 1, 2, and 3.

deepspeed: deepspeed_configs/zero1.json

accelerate launch -m axolotl.cli.train examples/llama-2/config.yml --deepspeed deepspeed_configs/zero1.json

FSDP

llama FSDP

fsdp:
  - full_shard
  - auto_wrap
fsdp_config:
  fsdp_offload_params: true
  fsdp_state_dict_type: FULL_STATE_DICT
  fsdp_transformer_layer_cls_to_wrap: LlamaDecoderLayer

FSDP + QLoRA

Axolotl supports training with FSDP and QLoRA, see these docs for more information.

Weights & Biases Logging

Make sure your WANDB_API_KEY environment variable is set (recommended) or you login to wandb with wandb login.

wandb options

wandb_mode:
wandb_project:
wandb_entity:
wandb_watch:
wandb_name:
wandb_log_model:

Special Tokens

It is important to have special tokens like delimiters, end-of-sequence, beginning-of-sequence in your tokenizer's vocabulary. This will help you avoid tokenization issues and help your model train better. You can do this in axolotl like this:

special_tokens:
  bos_token: "<s>"
  eos_token: "</s>"
  unk_token: "<unk>"
tokens: # these are delimiters
  - "<|im_start|>"
  - "<|im_end|>"

When you include these tokens in your axolotl config, axolotl adds these tokens to the tokenizer's vocabulary.

Inference Playground

Axolotl allows you to load your model in an interactive terminal playground for quick experimentation. The config file is the same config file used for training.

Pass the appropriate flag to the inference command, depending upon what kind of model was trained:

Pretrained LORA:

python -m axolotl.cli.inference examples/your_config.yml --lora_model_dir="./lora-output-dir"

Full weights finetune:

python -m axolotl.cli.inference examples/your_config.yml --base_model="./completed-model"

Full weights finetune w/ a prompt from a text file:

cat /tmp/prompt.txt | python -m axolotl.cli.inference examples/your_config.yml \
  --base_model="./completed-model" --prompter=None --load_in_8bit=True

-- With gradio hosting

python -m axolotl.cli.inference examples/your_config.yml --gradio

Please use --sample_packing False if you have it on and receive the error similar to below:

RuntimeError: stack expects each tensor to be equal size, but got [1, 32, 1, 128] at entry 0 and [1, 32, 8, 128] at entry 1

Merge LORA to base

The following command will merge your LORA adapater with your base model. You can optionally pass the argument --lora_model_dir to specify the directory where your LORA adapter was saved, otherwhise, this will be inferred from output_dir in your axolotl config file. The merged model is saved in the sub-directory {lora_model_dir}/merged.

python3 -m axolotl.cli.merge_lora your_config.yml --lora_model_dir="./completed-model"

You may need to use the gpu_memory_limit and/or lora_on_cpu config options to avoid running out of memory. If you still run out of CUDA memory, you can try to merge in system RAM with

CUDA_VISIBLE_DEVICES="" python3 -m axolotl.cli.merge_lora ...

although this will be very slow, and using the config options above are recommended instead.

Common Errors 🧰

Tokenization Mismatch b/w Inference & Training

For many formats, Axolotl constructs prompts by concatenating token ids after tokenizing strings. The reason for concatenating token ids rather than operating on strings is to maintain precise accounting for attention masks.

If you decode a prompt constructed by axolotl, you might see spaces between tokens (or lack thereof) that you do not expect, especially around delimiters and special tokens. When you are starting out with a new format, you should always do the following:

Materialize some data using python -m axolotl.cli.preprocess your_config.yml --debug, and then decode the first few rows with your model's tokenizer.
During inference, right before you pass a tensor of token ids to your model, decode these tokens back into a string.
Make sure the inference string from #2 looks exactly like the data you fine tuned on from #1, including spaces and new lines. If they aren't the same, adjust your inference server accordingly.
As an additional troubleshooting step, you can look at the token ids between 1 and 2 to make sure they are identical.

Having misalignment between your prompts during training and inference can cause models to perform very poorly, so it is worth checking this. See this blog post for a concrete example.

Debugging Axolotl

See this debugging guide for tips on debugging Axolotl, along with an example configuration for debugging with VSCode.

Need help? 🙋

Join our Discord server where we our community members can help you.

Need dedicated support? Please contact us at ✉️[email protected] for dedicated support options.

Badge ❤🏷️

Building something cool with Axolotl? Consider adding a badge to your model card.

[<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)

Community Showcase

Check out some of the projects and models that have been built using Axolotl! Have a model you'd like to add to our Community Showcase? Open a PR with your model.

Open Access AI Collective

PocketDoc Labs

Dan's PersonalityEngine 13b LoRA

Contributing 🤝

Please read the contributing guide

Bugs? Please check the open issues else create a new Issue.

PRs are greatly welcome!

Please run the quickstart instructions followed by the below to setup env:

pip3 install -r requirements-dev.txt -r requirements-tests.txt
pre-commit install

# test
pytest tests/

# optional: run against all files
pre-commit run --all-files

Thanks to all of our contributors to date. Help drive open source AI progress forward by contributing to Axolotl.

contributor chart by https://contrib.rocks

Sponsors 🤝❤

OpenAccess AI Collective is run by volunteer contributors such as winglian, NanoCode012, tmm1, mhenrichsen, casper-hansen, hamelsmu and many more who help us accelerate forward by fixing bugs, answering community questions and implementing new features. Axolotl needs donations from sponsors for the compute needed to run our unit & integration tests, troubleshooting community issues, and providing bounties. If you love axolotl, consider sponsoring the project via GitHub Sponsors, Ko-fi or reach out directly to [email protected].

💎 Diamond Sponsors - Contact directly

🥇 Gold Sponsors - $5000/mo

🥈 Silver Sponsors - $1000/mo

🥉 Bronze Sponsors - $500/mo

JarvisLabs.ai

axolotl's People

Contributors

Stargazers

Watchers

Forkers

ehartford practical-dreamer khmerailab jesusoctavioas nanocode012 jak3122 rooben-me rocketgod-git thytu jordiclive haorand imperial18 farishijazi fearnworks cg123 dkzdev lightningralf winglian utensil snoopycn fredi-python angainordev bratao pocketdoclabs glavin001 akj2018 worthmining leecig jjhw mindrages blipranger mhenrichsen branng0 callum17 sroecker danidayede kshetrajna12 zuodh msinha251 idoru spellcraftai wangjiaqiys stickybandit86 moohax robertalanm jphme nan-do kimiko-ai faycald ssmi153 theobjectivedad teknium1 ethanhs plurigrid yolantele madhavajay taranakiai bjoernpl eugenepentland tmm1 l3utterfly alpindale enn-nafnlaus xnliang98 ablateit jaredquekjz morganmcg1 tokenbender ittailup evdcush flotos spachava753 endlessreform teargosling clearsitedesigns philpax sw882882 acrastt dongxiaolong scottlogic-alex maximilian-winter maximegmd brandondelpozo birch-san memazouni flyx-ai thebloke prettysparklepony simulanics brandonmcclure bdashore3 slapdrone salahzoubi mdhasanali3 simsim314 mhylle jondurbin kennyfrc nacloudai gptcrash

axolotl's Issues

[Question] Should inference instruction have stripped last new line?

https://github.com/OpenAccess-AI-Collective/axolotl/blob/bbfc333a0136bfcf3f2129986253a8141c6c64d5/scripts/finetune.py#L68-L71

The inference script requires typing enter to pass the input even it's only one line. The result is that a \n is appended to that line.

>>> get_multi_line_input()
Give me an instruction (Ctrl + D to finish): 
test
'test\n'

Should this be changed to instruction=instruction.strip('\n')?

I am not sure about other prompting style, but for completion, we want the text to be continued test is a word. Instead of having a \n after the input test\n is a word.

An alternative solution would be to add strip inside of the CompletionPrompter.

[Refactor] Remove use of local variables `save_steps` and `eval_steps` as they are not modified

https://github.com/OpenAccess-AI-Collective/axolotl/blob/87dffbc451fcd129f143c64a3ff4ea9336aaa3a5/src/axolotl/utils/trainer.py#L53-L54

When I looked into the code and saw this, I thought it was saved to a local variable to be modified / compared. However, it does not seem so.

It might be better to remove these and use the original cfg counterparts to remove the preconception that the local variable is different. Of course, it's also ok to leave them as is.

[BUG] Fix attention masking when concatenating sequences

Someone should review, but I think we're doing it incorrectly. we need to set the attention mask so that the first token in each concatenated sequence has an attention mask of zero. In most cases, this is setting the mask of the bos token to zero

Issue load Llama tokenizer

Hello, I'm getting a weird issue loading tokenizer. I've checked that the line of code hasn't changed even on my latest pull. The only difference could be transformer source changed something.

https://github.com/winglian/axolotl/blob/7576d85c735e307fa1dbbcb8e0cba8b53bb1fa48/src/axolotl/utils/models.py#L138-L139

Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00,  7.88it/s]
Repo id must use alphanumeric chars or '-', '_', '.', '--' and '..' are forbidden, '-' and '.' cannot start or end the name, max length is 96: 'LlamaForCausalLM(
  (model): LlamaModel(
    (embed_tokens): Embedding(32000, 4096, padding_idx=0)
    (layers): ModuleList(
      (0-31): 32 x LlamaDecoderLayer(
        (self_attn): LlamaAttention(
          (q_proj): Linear8bitLt(in_features=4096, out_features=4096, bias=False)
          (k_proj): Linear8bitLt(in_features=4096, out_features=4096, bias=False)
          (v_proj): Linear8bitLt(in_features=4096, out_features=4096, bias=False)
          (o_proj): Linear8bitLt(in_features=4096, out_features=4096, bias=False)
          (rotary_emb): LlamaRotaryEmbedding()
        )
        (mlp): LlamaMLP(
          (gate_proj): Linear8bitLt(in_features=4096, out_features=11008, bias=False)
          (down_proj): Linear8bitLt(in_features=11008, out_features=4096, bias=False)
          (up_proj): Linear8bitLt(in_features=4096, out_features=11008, bias=False)
          (act_fn): SiLUActivation()
        )
        (input_layernorm): LlamaRMSNorm()
        (post_attention_layernorm): LlamaRMSNorm()
      )
    )
    (norm): LlamaRMSNorm()
  )
  (lm_head): Linear(in_features=4096, out_features=32000, bias=False)
)'.
Traceback (most recent call last):
  File "/workspace/src/axolotl/utils/models.py", line 140, in load_model
    tokenizer = LlamaTokenizer.from_pretrained(model)

[Bug] xformer requires alpaca_lora_4bit install even if we don't use gptq

If we turn on xformer but don't need gptq (only qlora for example), it will not work as it imports from alpaca_lora_4bit.

Discord user faced this as well: https://discord.com/channels/1104757954588196865/1111279858136383509/1113300840913055764

Move https://github.com/johnsmith0031/alpaca_lora_4bit/blob/main/monkeypatch/llama_attn_hijack_xformers.py to repo

What does `shard` do?

In latest update, there is shard argument. What is it trying to do? Is it trying to load model then output lora adapter?

https://github.com/winglian/axolotl/blob/cb9a88704707b0fc3362988a7f57b606e4448ac7/scripts/finetune.py#L169-L171

Is this due to how you're saving full model now?

https://github.com/winglian/axolotl/blob/cb9a88704707b0fc3362988a7f57b606e4448ac7/scripts/finetune.py#L222

As I understand, if you want to extract lora from checkpoint, you need to load from the checkpoints first, then set the base model with those weights. If this shard meant something different, could I PR this feature of lora extraction from checkpoint?

optionally save as safetensors.

might be easier to have a separate option to load a model and re-save it as safetensors

Support Galactica as base model, is it possible?

Galactica has been finetuned here and here

But I haven't found any code published to finetune it.

Would love to see finetuning galactica supported in axolotl.

[Bug] Seed does not always load from `cfg.seed`

I think this is an easy Issue to tackle for anyone interested.

It should be set to a default value if not defined, somewhere at start hopefully (maybe when loading config).

Update below

https://github.com/OpenAccess-AI-Collective/axolotl/blob/a617f1b65eb3d986ab7844630944fe4c979158fe/src/axolotl/utils/data.py#L115

Update below

https://github.com/OpenAccess-AI-Collective/axolotl/blob/a617f1b65eb3d986ab7844630944fe4c979158fe/src/axolotl/utils/data.py#L220

Pass seed to Trainer
Pass seed to any function that has it available

[Feature] token gisting

https://arxiv.org/abs/2304.08467

[Feature] Allow pass prompter config

Proposal: Code has been written to accept any Prompter. We should allow this to be configurable using a cfg or kwarg

https://github.com/OpenAccess-AI-Collective/axolotl/blob/bbfc333a0136bfcf3f2129986253a8141c6c64d5/scripts/finetune.py#L188

which is used here

https://github.com/OpenAccess-AI-Collective/axolotl/blob/bbfc333a0136bfcf3f2129986253a8141c6c64d5/scripts/finetune.py#L59-L64

Support python 3.10 and higher

Currently axolotl requires python 3.9
Python 3.10 and 3.11 will fail due to dependency issue
Can you please update the dependencies so that axolotl will work in 3.10 and 3.11

[Bug] QLoRA port missing some steps

https://github.com/artidoro/qlora/blob/main/qlora.py#L328-L337

    for name, module in model.named_modules():
        if isinstance(module, LoraLayer):
            if args.bf16:
                module = module.to(torch.bfloat16)
        if 'norm' in name:
            module = module.to(torch.float32)
        if 'lm_head' in name or 'embed_tokens' in name:
            if hasattr(module, 'weight'):
                if args.bf16 and module.weight.dtype == torch.float32:
                    module = module.to(torch.bfloat16)

[Bug] Validation file does not exist

https://github.com/OpenAccess-AI-Collective/axolotl/blob/a617f1b65eb3d986ab7844630944fe4c979158fe/scripts/finetune.py#L17

Would crash as the file does not exist

disable checkpoint for wandb_log_model:

update all the configs / examples and change wandb_log_model: checkpoint => wandb_log_model:

this will prevent uploading obscenely large artifacts to wandb by default and using quota

add pre-commit hook with pylint, flake8 and black

mypy would be nice too, but that might be a big ask

[Feature] Deprecate `batch_size` in favor of `gradient accumulation steps`

Reason: Easier for calculation

Replace use of batch size with gradient
Replace in doc
Check duplicate config in validate_config
Replace all configs/examples

early stopping callback requires load_best_model_at_end to be True

there are a lot of factors that affect how load_best_model_at_end gets set from eval steps and save_steps. we need to figure out what the best way to handle this is.

save steps enhancement

if save_steps is a fraction, calculate the steps based on floor(save_steps * total_steps_per_epoch).

this way if someone were to say 0.5, they could get a checkpoint at half an epoch and the end of an epoch without having to manually figure it out

AttributeError: 'AlpacaPrompter' object has no attribute 'prompt_no_input'

Not sure if this is intended, but if the prompt dict contains the key "input" but the value for input is an empty string the line
input in prompt will resolve to False:

class AlpacaPromptTokenizingStrategy(InstructionPromptTokenizingStrategy):
    def parse_instruction_fields(self, prompt) -> (str, str, str):
        print(f"Is input in prompt?:  {input in prompt}")
        return (
            prompt["instruction"],
            prompt["input"] if "input" in prompt else "",
            prompt["output"],
        )

If the prompt input is an empty string, build_prompt will try to build a prompt with prompt_no_input

    def build_prompt(
        self,
        instruction: str,
        input: Union[None, str] = None,
        output: Union[None, str] = None,
    ) -> Generator[str, None, None]:
        # returns the full prompt from instruction and optional input
        # if a label (=response, =output) is provided, it's also appended.
        if input:
            res = self.prompt_input.format(instruction=instruction, input=input)
        else:
            res = self.prompt_no_input.format(instruction=instruction)
        if output:
            res = f"{res}{output}"
        yield res

but if the prompt style is 'alpaca', there is no prompt_no_input:

  def match_prompt_style(self):
      if self.prompt_style == PromptStyle.instruct.value:
          self.prompt_input = (
              self.system_prompt
              + "### Instruction:\n{instruction}\n\n### Input:\n{input}\n\n### Response:\n"
          )
          self.prompt_no_input = (
              self.system_no_input_prompt
              + "### Instruction:\n{instruction}\n\n### Response:\n"
          )
          self.response_split = "### Response:"
      if self.prompt_style == PromptStyle.chat.value:
          self.prompt_input = (
              self.system_prompt + "USER: {instruction}\n{input}\nASSISTANT:"
          )
          self.prompt_no_input = (
              self.system_no_input_prompt + "USER: {instruction}\nASSISTANT:"
          )
          self.response_split = "ASSISTANT:"

Not sure what the best solution is - Add a prompt_no_input for alpaca style prompts or rephrase the ifs so that the result is an empty "### Input: "?

I'm willing to do a PR, just tell me what solution you want to see.

Trainer() got multiple values for keyword argument 'callbacks'

when running (8xA100 80Gb)
I run into this error:

File "/root/axolotl/scripts/finetune.py", line 239, in <module> trainer = setup_trainer(cfg, train_dataset, eval_dataset, model, tokenizer) File "/root/axolotl/src/axolotl/utils/trainer.py", line 196, in setup_trainer Traceback (most recent call last): File "/root/axolotl/scripts/finetune.py", line 239, in <module> trainer = transformers.Trainer( fire.Fire(train)TypeError : File "/root/.local/share/virtualenvs/axolotl-9mRV-5br/lib/python3.9/site-packages/fir e/core.py", line 141, in Fire transformers.trainer.Trainer() got multiple values for keyword argument 'callbacks' fire.Fire(train) File "/root/.local/share/virtualenvs/axolotl-9mRV-5br/lib/python3.9/site-packages/fire/ core.py", line 141, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File "/root/.local/share/virtualenvs/axolotl-9mRV-5br/lib/python3.9/site-packages/fire/ core.py", line 475, in _Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File "/root/.local/share/virtualenvs/axolotl-9mRV-5br/lib/python3.9/site-packages/fire/ core.py", line 475, in _Fire component, remaining_args = _CallAndUpdateTrace( File "/root/.local/share/virtualenvs/axolotl-9mRV-5br/lib/python3.9/site-packages/fire/ core.py", line 691, in _CallAndUpdateTrace component, remaining_args = _CallAndUpdateTrace( File "/root/.local/share/virtualenvs/axolotl-9mRV-5br/lib/python3.9/site-packages/fire/ core.py", line 691, in _CallAndUpdateTrace component = fn(*varargs, **kwargs) File "/root/axolotl/scripts/finetune.py", line 198, in train component = fn(*varargs, **kwargs) File "/root/axolotl/scripts/finetune.py", line 198, in train trainer = setup_trainer(cfg, train_dataset, eval_dataset, model, tokenizer) File "/root/axolotl/src/axolotl/utils/trainer.py", line 196, in setup_trainer trainer = setup_trainer(cfg, train_dataset, eval_dataset, model, tokenizer) File "/root/axolotl/src/axolotl/utils/trainer.py", line 196, in setup_trainer trainer = transformers.Trainer( TypeError: transformers.trainer.Trainer() got multiple values for keyword argument 'c allbacks'trainer = transformers.Trainer(

[Feature] Allow passing file to inference on

Problem

It may be necessary to repeat the same questions across many experiments. It is time consuming to copy paste line by line.

Feature

Allow passing path to jsonl file or similar that can be read and ran through the model then output to a results file

custom prompt strategies improvement

https://github.com/OpenAccess-AI-Collective/axolotl/blob/main/src/axolotl/prompt_strategies/__init__.py#L10-L11

on except, try to use importlib without the relative module in case they are using their own custom module/handler

[Feature] Add Landmark attention

Per request from Discord to add this feature.

They seem to monkeypatch LlamaModel and add special mem token.

Ref: https://github.com/epfml/landmark-attention/blob/main/llama/train.py

qlora save peft on final callback

{'eval_loss': 1.2171393632888794, 'eval_runtime': 7.1067, 'eval_samples_per_second': 4.362, 'eval_steps_per_second': 0.141, 'epoch': 4.38}
{'loss': 1.0812, 'learning_rate': 3.581603349196372e-06, 'epoch': 4.5}
{'loss': 1.0813, 'learning_rate': 2.0253513192751373e-06, 'epoch': 4.62}
{'loss': 1.0691, 'learning_rate': 9.035651368646648e-07, 'epoch': 4.75}
{'loss': 1.0922, 'learning_rate': 2.2640387134577058e-07, 'epoch': 4.88}
{'loss': 1.117, 'learning_rate': 0.0, 'epoch': 5.0}
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 40/40 [2:10:50<00:00, 192.75s/it]
The intermediate checkpoints of PEFT may not be saved correctly, using TrainerCallback to save adapter_model.bin in corresponding folders, here are some examples huggingface/peft#96
The intermediate checkpoints of PEFT may not be saved correctly, using TrainerCallback to save adapter_model.bin in corresponding folders, here are some examples huggingface/peft#96
The intermediate checkpoints of PEFT may not be saved correctly, using TrainerCallback to save adapter_model.bin in corresponding folders, here are some examples huggingface/peft#96
The intermediate checkpoints of PEFT may not be saved correctly, using TrainerCallback to save adapter_model.bin in corresponding folders, here are some examples huggingface/peft#96
The intermediate checkpoints of PEFT may not be saved correctly, using TrainerCallback to save adapter_model.bin in corresponding folders, here are some examples huggingface/peft#96
The intermediate checkpoints of PEFT may not be saved correctly, using TrainerCallback to save adapter_model.bin in corresponding folders, here are some examples huggingface/peft#96
The intermediate checkpoints of PEFT may not be saved correctly, using TrainerCallback to save adapter_model.bin in corresponding folders, here are some examples huggingface/peft#96
The intermediate checkpoints of PEFT may not be saved correctly, using TrainerCallback to save adapter_model.bin in corresponding folders, here are some examples huggingface/peft#96
Traceback (most recent call last):
File "/workspace/axolotl/scripts/finetune.py", line 256, in
Traceback (most recent call last):
File "/workspace/axolotl/scripts/finetune.py", line 256, in
Traceback (most recent call last):
File "/workspace/axolotl/scripts/finetune.py", line 256, in
fire.Fire(train)fire.Fire(train)Traceback (most recent call last):
Traceback (most recent call last):

trainer.train(resume_from_checkpoint=resume_from_checkpoint)
File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/trainer.py", line 2094, in _inner_training_loop
File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/trainer.py", line 1696, in train
return inner_training_loop(
File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/trainer.py", line 2094, in _inner_training_loop
File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
return inner_training_loop(
File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/trainer.py", line 2094, in _inner_training_loop
File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
self._load_best_model()
File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/trainer.py", line 2291, in _load_best_model
self._load_best_model()
File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/trainer.py", line 2291, in _load_best_model
File "/workspace/axolotl/scripts/finetune.py", line 244, in train
trainer.train(resume_from_checkpoint=resume_from_checkpoint)
return inner_training_loop(
File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/trainer.py", line 2094, in _inner_training_loop
return inner_training_loop(
File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/trainer.py", line 2094, in _inner_training_loop
File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/trainer.py", line 1696, in train
return inner_training_loop(
self._load_best_model()
File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/trainer.py", line 2291, in _load_best_model
File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/trainer.py", line 2094, in _inner_training_loop
self._load_best_model()
self._load_best_model()self._load_best_model()

File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/trainer.py", line 2291, in _load_best_model
self._issue_warnings_after_load(load_result)
File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/trainer.py", line 2291, in _load_best_model
File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/trainer.py", line 2291, in _load_best_model
self._issue_warnings_after_load(load_result)
UnboundLocalErrorUnboundLocalError: local variable 'load_result' referenced before assignment
: local variable 'load_result' referenced before assignment
self._issue_warnings_after_load(load_result)
self._load_best_model()
UnboundLocalError : File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/trainer.py", line 2291, in _load_best_model

File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/trainer.py", line 2291, in _load_best_model

UnboundLocalError: local variable 'load_result' referenced before assignment
self._issue_warnings_after_load(load_result)

[Question] Duplicate shard config names?

I noticed two different shard code using different configs in load_tokenized_prepared_datasets and load_prepare_datasets

https://github.com/OpenAccess-AI-Collective/axolotl/blob/a617f1b65eb3d986ab7844630944fe4c979158fe/src/axolotl/utils/data.py#L114-L115

https://github.com/OpenAccess-AI-Collective/axolotl/blob/a617f1b65eb3d986ab7844630944fe4c979158fe/src/axolotl/utils/data.py#L345-L351

Not sure if these two parts should be combined and called elsewhere, but I think the config should be unified to use same name.

when hashing the dataset cache, be sure to use the tokenizer as part of the hash key

[Doc] Update wording of 4bit to GPTQ and add qlora

Problem

Previously, 4bit quant meant GPTQ. However, due to the new release of qlora, there can be some confusion on this subject (like here https://discord.com/channels/1104757954588196865/1111279858136383509/1111846381355802744).

TODO

Add new column for qlora
Rename 4bit quant
Evaluate qlora on other architectures

[Refactor] Fix duplicate `config` and `examples` folder and update previous configs

In the past, the configs were all in configs. However, as things changed, some parts have been moved to the examples folder.

Furthermore, there are some old/invalid configs within the configs folder due to recent changes.

It would be good to move all configs to examples folder within each architecture for better maintenance and update those previous configs to work.

I'm curious if anyone has any better ideas?

[Feature] update prepare_model_for_int8_training

FutureWarning: prepare_model_for_int8_training is deprecated and will be removed in a future version. Use prepare_model_for_kbit_training instead.

[Feature] Add tests

We need to also think about adding some tests to ensure more stability. I do not have much experience on the matter, however, I think at the very least, we should test the following:

Functional

Test validate_config for conflicting configs

End to end for each architecture in Readme for one or two global_step:

fp16/fp32
4bit
8bit
gptq

[Test] Add test for all prompt tokenizers

Following #111

[Bug] Add `cfg.hf_use_auth_token` to set whether to attach auth token

As discussed in Discord, if a user is not authenticated to huggingface, the code would error as it expects the token.

We would like to swap to look for whether to attach using a config instead cfg.hf_use_auth_token.

Set all use_auth_token=True to load from cfg use_auth_token=cfg.hf_use_auth_token instead
https://github.com/OpenAccess-AI-Collective/axolotl/blob/87dffbc451fcd129f143c64a3ff4ea9336aaa3a5/src/axolotl/utils/data.py#L67
Add it to the docs under all yaml options

[Refactor] prepare_model_for_int8_training is deprecated and will be removed in a future version

Getting this warning on latest peft.

/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/peft/utils/other.py:76: FutureWarning: prepare_model_for_int8_training is deprecated and will be removed in a future version. Use prepare_model_for_kbit_training instead

issues to fix reported from discord

Bambi#1600
I can report my observations from my attempts at int8 LoRA training via the trainer built into Oobabooga’s textgen webUI if it helps:

Myself and others are able to train LoRAs with int8 precision for the original unquantized HF llama-7b and llama-13b models
The LoRA from this train produced expected results at inference when applied to the unquantized llama models
VRAM usage during the train was observed to be evenly seemed split between cards
GPU utilization however was observed to alternate between the cards (one card was pulling 150 watts, the other pulling 300 watts then they’d swap) indicating a serialized but threaded workload vs true parallelization
Encountered but upon saving first checkpoint causing both cards to OOM. Following numerous forum threads we reverted out bitsandbytes model from 0.38.1 to 0.37.2 which resolved the issue.

ImportError: cannot import name 'Mapping' from 'collections'

 ~/g/axolotl   axolotl   main ≡  ?1  accelerate launch scripts/finetune.py configs/llama_30B_4bit.yml~~
Traceback (most recent call last):
  File "/home/eric/git/axolotl/scripts/finetune.py", line 11, in <module>
    from attrdict import AttrDefault
  File "/home/eric/miniconda3/envs/axolotl/lib/python3.10/site-packages/attrdict/__init__.py", line 5, in <module>
    from attrdict.mapping import AttrMap
  File "/home/eric/miniconda3/envs/axolotl/lib/python3.10/site-packages/attrdict/mapping.py", line 4, in <module>
    from collections import Mapping
ImportError: cannot import name 'Mapping' from 'collections' (/home/eric/miniconda3/envs/axolotl/lib/python3.10/collections/__init__.py)
Traceback (most recent call last):
  File "/home/eric/git/axolotl/scripts/finetune.py", line 11, in <module>
    from attrdict import AttrDefault
  File "/home/eric/miniconda3/envs/axolotl/lib/python3.10/site-packages/attrdict/__init__.py", line 5, in <module>
    from attrdict.mapping import AttrMap
  File "/home/eric/miniconda3/envs/axolotl/lib/python3.10/site-packages/attrdict/mapping.py", line 4, in <module>
    from collections import Mapping

[Feature] multi-modal training

e.g. PandaGPT for audio/video training. This is likely a large feature. lmk if anyone is interested in helping with this one.

https://github.com/yxuansu/PandaGPT/blob/main/code/model/modeling_llama.py
https://github.com/yxuansu/PandaGPT/blob/main/code/model/openllama.py#L86

load tokenizer separately and before the models

this way the datasets can be tokenized without the models for data preparation

No module named axolotl.utils.validation

After pip installing axolotl, and trying to run the provided finetuning command I get:

Traceback (most recent call last): File "/home/someone/axolotl/scripts/finetune.py", line 17, in <module> from axolotl.utils.validation import validate_config ModuleNotFoundError: No module named 'axolotl.utils.validation'

Can't find validation.py anywhere in the commits either

Unusable early_stopping_patience param

Whenever user uses early_stopping_patience it results in the following error: AssertionError: EarlyStoppingCallback requires load_best_model_at_end = True

confusing error message

I get a confusing error message. Can you please help?

My command line is:
accelerate launch scripts/finetune.py configs/llama_30B_4bit.yml

My config is:

base_model: ../alpaca_lora_4bit/llama-30b-4bit-128g.safetensors
base_model_config: ../alpaca_lora_4bit/llama-30b-4bit/
model_type: LlamaForCausalLM
tokenizer_type: LlamaTokenizer
load_in_8bit: false
datasets:
  - path: ../alpaca_lora_4bit/leet10k-alpaca-merged.json
    type: alpaca
dataset_prepared_path: data/last_run_prepared
val_set_size: 0.04
adapter: lora
lora_model_dir:
sequence_len: 2048
max_packed_sequence_len: 1024
lora_r: 16
lora_alpha: 16
lora_dropout: 0.05
lora_target_modules:
  - q_proj
  - v_proj
  - k_proj
  - o_proj
lora_fan_in_fan_out: false
wandb_project:
wandb_watch:
wandb_run_id:
wandb_log_model: checkpoint
output_dir: ./lora-test
batch_size: 128
micro_batch_size: 8
num_epochs: 4
warmup_steps: 100
learning_rate: 0.00003
train_on_inputs: false
group_by_length: false
bf16: true
tf32: true
gradient_checkpointing: false
early_stopping_patience: 3
resume_from_checkpoint:
auto_resume_from_checkpoints: true
local_rank:
load_4bit: true
xformers_attention: true
flash_attention:

My error message is:

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
binbin  /home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda121.so
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda121.so

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: /home/eric/miniconda3/envs/axolotl2 did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
  warn(msg)
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: /usr/lib/wsl/lib did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
  warn(msg)
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('unix')}
  warn(msg)
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/tmp/torchelastic_x7f2qly7/none_f9zjdlpc/attempt_0/0/error.json')}
  warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.6
CUDA SETUP: Detected CUDA version 121
CUDA SETUP: Loading binary /home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda121.so...
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: /home/eric/miniconda3/envs/axolotl2 did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
  warn(msg)
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: /usr/lib/wsl/lib did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
  warn(msg)
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('unix')}
  warn(msg)
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/tmp/torchelastic_x7f2qly7/none_f9zjdlpc/attempt_0/3/error.json')}
  warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.6
CUDA SETUP: Detected CUDA version 121
CUDA SETUP: Loading binary /home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda121.so...

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
bin /home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda121.so
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: /home/eric/miniconda3/envs/axolotl2 did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
  warn(msg)
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: /usr/lib/wsl/lib did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
  warn(msg)
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('unix')}
  warn(msg)
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/tmp/torchelastic_x7f2qly7/none_f9zjdlpc/attempt_0/2/error.json')}
  warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.6
CUDA SETUP: Detected CUDA version 121
CUDA SETUP: Loading binary /home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda121.so...
bin /home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda121.so
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: /home/eric/miniconda3/envs/axolotl2 did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
  warn(msg)
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: /usr/lib/wsl/lib did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
  warn(msg)
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('unix')}
  warn(msg)
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/tmp/torchelastic_x7f2qly7/none_f9zjdlpc/attempt_0/1/error.json')}
  warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.6
CUDA SETUP: Detected CUDA version 121
CUDA SETUP: Loading binary /home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda121.so...
INFO:root:loading model, tokenizer, and lora_config...
INFO:root:loading model, tokenizer, and lora_config...
INFO:root:patching with xformers attention
Replaced attention with xformers_attention
Loading Model ...
INFO:root:patching with xformers attention
Replaced attention with xformers_attention
Loading Model ...
INFO:root:loading model, tokenizer, and lora_config...
INFO:root:loading model, tokenizer, and lora_config...
INFO:root:patching with xformers attention
Replaced attention with xformers_attention
Loading Model ...
INFO:root:patching with xformers attention
Replaced attention with xformers_attention
Loading Model ...
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/utils/modeling.py:779: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  with safe_open(checkpoint_file, framework="pt") as f:
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/torch/_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  return self.fget.__get__(instance, owner)()
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/torch/storage.py:899: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  storage = cls(wrap_storage=untyped_storage)
WARNING:accelerate.utils.modeling:The safetensors archive passed at ../alpaca_lora_4bit/llama-30b-4bit-128g.safetensors does not contain metadata. Make sure to save your model with the `save_pretrained` method. Defaulting to 'pt' metadata.
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/safetensors/torch.py:99: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  with safe_open(filename, framework="pt", device=device) as f:
ERROR:root:Exception raised attempting to load model, retrying with AutoModelForCausalLM
ERROR:root:CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Traceback (most recent call last):
  File "/home/eric/git/axolotl/src/axolotl/utils/models.py", line 93, in load_model
    model, tokenizer = load_llama_model_4bit_low_ram(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/alpaca_lora_4bit/autograd_4bit.py", line 249, in load_llama_model_4bit_low_ram
    model = accelerate.load_checkpoint_and_dispatch(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/big_modeling.py", line 479, in load_checkpoint_and_dispatch
    load_checkpoint_in_model(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/utils/modeling.py", line 924, in load_checkpoint_in_model
    checkpoint = load_state_dict(checkpoint_file, device_map=device_map)
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/utils/modeling.py", line 804, in load_state_dict
    return safe_load_file(checkpoint_file, device=devices[0])
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/safetensors/torch.py", line 101, in load_file
    result[k] = f.get_tensor(k)
RuntimeError: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/utils/modeling.py:779: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  with safe_open(checkpoint_file, framework="pt") as f:
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/torch/_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  return self.fget.__get__(instance, owner)()
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/torch/storage.py:899: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  storage = cls(wrap_storage=untyped_storage)
WARNING:accelerate.utils.modeling:The safetensors archive passed at ../alpaca_lora_4bit/llama-30b-4bit-128g.safetensors does not contain metadata. Make sure to save your model with the `save_pretrained` method. Defaulting to 'pt' metadata.
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/safetensors/torch.py:99: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  with safe_open(filename, framework="pt", device=device) as f:
ERROR:root:Exception raised attempting to load model, retrying with AutoModelForCausalLM
ERROR:root:CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Traceback (most recent call last):
  File "/home/eric/git/axolotl/src/axolotl/utils/models.py", line 93, in load_model
    model, tokenizer = load_llama_model_4bit_low_ram(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/alpaca_lora_4bit/autograd_4bit.py", line 249, in load_llama_model_4bit_low_ram
    model = accelerate.load_checkpoint_and_dispatch(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/big_modeling.py", line 479, in load_checkpoint_and_dispatch
    load_checkpoint_in_model(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/utils/modeling.py", line 924, in load_checkpoint_in_model
    checkpoint = load_state_dict(checkpoint_file, device_map=device_map)
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/utils/modeling.py", line 804, in load_state_dict
    return safe_load_file(checkpoint_file, device=devices[0])
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/safetensors/torch.py", line 101, in load_file
    result[k] = f.get_tensor(k)
RuntimeError: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/utils/modeling.py:779: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  with safe_open(checkpoint_file, framework="pt") as f:
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/torch/_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  return self.fget.__get__(instance, owner)()
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/torch/storage.py:899: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  storage = cls(wrap_storage=untyped_storage)
WARNING:accelerate.utils.modeling:The safetensors archive passed at ../alpaca_lora_4bit/llama-30b-4bit-128g.safetensors does not contain metadata. Make sure to save your model with the `save_pretrained` method. Defaulting to 'pt' metadata.
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/safetensors/torch.py:99: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  with safe_open(filename, framework="pt", device=device) as f:
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/utils/modeling.py:779: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  with safe_open(checkpoint_file, framework="pt") as f:
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/torch/_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  return self.fget.__get__(instance, owner)()
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/torch/storage.py:899: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  storage = cls(wrap_storage=untyped_storage)
WARNING:accelerate.utils.modeling:The safetensors archive passed at ../alpaca_lora_4bit/llama-30b-4bit-128g.safetensors does not contain metadata. Make sure to save your model with the `save_pretrained` method. Defaulting to 'pt' metadata.
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/safetensors/torch.py:99: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  with safe_open(filename, framework="pt", device=device) as f:
ERROR:root:Exception raised attempting to load model, retrying with AutoModelForCausalLM
ERROR:root:Autograd4bitQuantLinear() does not have a parameter or a buffer named qzeros.
Traceback (most recent call last):
  File "/home/eric/git/axolotl/src/axolotl/utils/models.py", line 93, in load_model
    model, tokenizer = load_llama_model_4bit_low_ram(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/alpaca_lora_4bit/autograd_4bit.py", line 249, in load_llama_model_4bit_low_ram
    model = accelerate.load_checkpoint_and_dispatch(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/big_modeling.py", line 479, in load_checkpoint_and_dispatch
    load_checkpoint_in_model(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/utils/modeling.py", line 946, in load_checkpoint_in_model
    set_module_tensor_to_device(model, param_name, param_device, value=param, dtype=dtype)
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/utils/modeling.py", line 131, in set_module_tensor_to_device
    raise ValueError(f"{module} does not have a parameter or a buffer named {tensor_name}.")
ValueError: Autograd4bitQuantLinear() does not have a parameter or a buffer named qzeros.
ERROR:root:Exception raised attempting to load model, retrying with AutoModelForCausalLM
ERROR:root:Autograd4bitQuantLinear() does not have a parameter or a buffer named qzeros.
Traceback (most recent call last):
  File "/home/eric/git/axolotl/src/axolotl/utils/models.py", line 93, in load_model
    model, tokenizer = load_llama_model_4bit_low_ram(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/alpaca_lora_4bit/autograd_4bit.py", line 249, in load_llama_model_4bit_low_ram
    model = accelerate.load_checkpoint_and_dispatch(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/big_modeling.py", line 479, in load_checkpoint_and_dispatch
    load_checkpoint_in_model(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/utils/modeling.py", line 946, in load_checkpoint_in_model
    set_module_tensor_to_device(model, param_name, param_device, value=param, dtype=dtype)
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/utils/modeling.py", line 131, in set_module_tensor_to_device
    raise ValueError(f"{module} does not have a parameter or a buffer named {tensor_name}.")
ValueError: Autograd4bitQuantLinear() does not have a parameter or a buffer named qzeros.
Traceback (most recent call last):
  File "/home/eric/git/axolotl/src/axolotl/utils/models.py", line 93, in load_model
    model, tokenizer = load_llama_model_4bit_low_ram(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/alpaca_lora_4bit/autograd_4bit.py", line 249, in load_llama_model_4bit_low_ram
    model = accelerate.load_checkpoint_and_dispatch(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/big_modeling.py", line 479, in load_checkpoint_and_dispatch
    load_checkpoint_in_model(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/utils/modeling.py", line 924, in load_checkpoint_in_model
    checkpoint = load_state_dict(checkpoint_file, device_map=device_map)
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/utils/modeling.py", line 804, in load_state_dict
    return safe_load_file(checkpoint_file, device=devices[0])
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/safetensors/torch.py", line 101, in load_file
    result[k] = f.get_tensor(k)
RuntimeError: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/transformers/configuration_utils.py", line 659, in _get_config_dict
    config_dict = cls._dict_from_json_file(resolved_config_file)
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/transformers/configuration_utils.py", line 750, in _dict_from_json_file
    text = reader.read()
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe8 in position 0: invalid continuation byte

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/eric/git/axolotl/scripts/finetune.py", line 246, in <module>
    fire.Fire(train)
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/fire/core.py", line 475, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/home/eric/git/axolotl/scripts/finetune.py", line 178, in train
    model, tokenizer, lora_config = load_model(
  File "/home/eric/git/axolotl/src/axolotl/utils/models.py", line 136, in load_model
    model = AutoModelForCausalLM.from_pretrained(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/transformers/models/auto/auto_factory.py", line 445, in from_pretrained
    config, kwargs = AutoConfig.from_pretrained(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/transformers/models/auto/configuration_auto.py", line 922, in from_pretrained
    config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/transformers/configuration_utils.py", line 574, in get_config_dict
    config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/transformers/configuration_utils.py", line 662, in _get_config_dict
    raise EnvironmentError(
OSError: It looks like the config file at '../alpaca_lora_4bit/llama-30b-4bit-128g.safetensors' is not a valid JSON file.
Traceback (most recent call last):
  File "/home/eric/git/axolotl/src/axolotl/utils/models.py", line 93, in load_model
    model, tokenizer = load_llama_model_4bit_low_ram(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/alpaca_lora_4bit/autograd_4bit.py", line 249, in load_llama_model_4bit_low_ram
    model = accelerate.load_checkpoint_and_dispatch(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/big_modeling.py", line 479, in load_checkpoint_and_dispatch
    load_checkpoint_in_model(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/utils/modeling.py", line 924, in load_checkpoint_in_model
    checkpoint = load_state_dict(checkpoint_file, device_map=device_map)
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/utils/modeling.py", line 804, in load_state_dict
    return safe_load_file(checkpoint_file, device=devices[0])
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/safetensors/torch.py", line 101, in load_file
    result[k] = f.get_tensor(k)
RuntimeError: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/transformers/configuration_utils.py", line 659, in _get_config_dict
    config_dict = cls._dict_from_json_file(resolved_config_file)
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/transformers/configuration_utils.py", line 750, in _dict_from_json_file
    text = reader.read()
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe8 in position 0: invalid continuation byte

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/eric/git/axolotl/scripts/finetune.py", line 246, in <module>
    fire.Fire(train)
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/fire/core.py", line 475, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/home/eric/git/axolotl/scripts/finetune.py", line 178, in train
    model, tokenizer, lora_config = load_model(
  File "/home/eric/git/axolotl/src/axolotl/utils/models.py", line 136, in load_model
    model = AutoModelForCausalLM.from_pretrained(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/transformers/models/auto/auto_factory.py", line 445, in from_pretrained
    config, kwargs = AutoConfig.from_pretrained(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/transformers/models/auto/configuration_auto.py", line 922, in from_pretrained
    config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/transformers/configuration_utils.py", line 574, in get_config_dict
    config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/transformers/configuration_utils.py", line 662, in _get_config_dict
    raise EnvironmentError(
OSError: It looks like the config file at '../alpaca_lora_4bit/llama-30b-4bit-128g.safetensors' is not a valid JSON file.
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2878 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2879 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2881 closing signal SIGTERM
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 2 (pid: 2880) of binary: /home/eric/miniconda3/envs/axolotl2/bin/python
Traceback (most recent call last):
  File "/home/eric/miniconda3/envs/axolotl2/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/commands/accelerate_cli.py", line 45, in main
    args.func(args)
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/commands/launch.py", line 914, in launch_command
    multi_gpu_launcher(args)
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/commands/launch.py", line 603, in multi_gpu_launcher
    distrib_run.run(args)
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/torch/distributed/run.py", line 785, in run
    elastic_launch(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 134, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
scripts/finetune.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2023-04-24_23:18:00
  host      : mlc-win.
  rank      : 2 (local_rank: 2)
  exitcode  : 1 (pid: 2880)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================

Save `adapter_bin` using callbacks if `lora`

Proposal

It would be good to also save lora during each checkpoint

Solution

We can also save the lora using callbacks. I saw code for callback, however, we can slightly modify it to not delete pytorch_model.bin, so that we can resume training.

We can check if adapter: lora then add the callback.

Happy to PR this.

Edit: Discussion at huggingface/peft#353 (comment)

run model.train() on models before training

see https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.train

[Feature] Support Conditional Pretrained Datasets (using tags)

https://laion.ai/notes/cpretrain/

GPTQ vs QLoRA

GPTQ and QLoRA are mutually exclusive when it comes to the PEFT dependency. see https://github.com/winglian/alpaca_lora_4bit/blob/main/requirements.txt#L9 vs QLoRA basically needing main. It's probably worth removing the [int4] part of the install from the docker container, and simply doing a basic install. We'll also need to update the docs for those people who want to use GPTQ that they will need to pip uninstall peft and pip install .[int4]. Also, the caveat for them is they need to uninstall peft again if they want to switch back to qlora.

[Feature] Automatic Rebase

For a cleaner tree

https://github.com/marketplace/actions/automatic-rebase

add bitsandbytes build with cuda library in base docker image

from my qlora notes:

cd bitsandbytes
CUDA_VERSION=118 make cuda11x
pip uninstall bitsandbytes
python setup.py install
pip install scipy
pip uninstall transformers
pip install "transformers @ git+https://github.com/huggingface/transformers.git
pip install bert-score==0.3.13 evaluate==0.4.0 rouge-score==0.1.2 scikit-learn==1.2.2 sentencepiece==0.1.99 wandb==0.15.2

should update requirements.txt too.

[Feature] Replace `cfg.load_4bit` with `cfg.gptq`

Proposal: Change naming to reduce confusing with load_in_4bit which is used for qlora.

Breaking change: Yes

Replace all instances
Add assert in validation config to assert not cfg.load_4bit, "cfg.load_4bit has been deprecated. Please change to cfg.gptq"

RuntimeError: Loading a quantized checkpoint into non-quantized Linear8bitLt is not supported. Please call module.cuda() before module.load_state_dict()

I get the below at the end of training. I suspect it's due to loading 8 bit and https://github.com/winglian/axolotl/blob/47ad3890bc35985b9046f403312887035e19f96f/src/axolotl/utils/trainer.py#L99

Stack trace

File "/workspace/scripts/finetune.py", line 246, in <module> 
    fire.Fire(train) 
  File "/usr/local/lib/python3.9/dist-packages/fire/core.py", line 141, in Fire 
    component_trace = _Fire(component, args, parsed_flag_args, context, name) 
  File "/usr/local/lib/python3.9/dist-packages/fire/core.py", line 475, in _Fire 
    component, remaining_args = _CallAndUpdateTrace( 
  File "/usr/local/lib/python3.9/dist-packages/fire/core.py", line 691, in _CallAndUpdateTrace 
    component = fn(*varargs, **kwargs) 
  File "/workspace/scripts/finetune.py", line 235, in train 
    trainer.train(resume_from_checkpoint=resume_from_checkpoint) 
  File "/usr/local/lib/python3.9/dist-packages/transformers/trainer.py", line 1664, in train 
    return inner_training_loop( 
  File "/usr/local/lib/python3.9/dist-packages/transformers/trainer.py", line 2054, in _inner_training_loop 
    self._load_best_model() 
  File "/usr/local/lib/python3.9/dist-packages/transformers/trainer.py", line 2230, in _load_best_model 
    load_result = model.load_state_dict(state_dict, False) 
  File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 2027, in load_state_dict 
    load(self, state_dict) 
  File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 2015, in load 
    load(child, child_state_dict, child_prefix) 
  File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 2015, in load 
    load(child, child_state_dict, child_prefix) 
  File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 2015, in load 
    load(child, child_state_dict, child_prefix) 
  [Previous line repeated 4 more times] 
  File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 2009, in load 
    module._load_from_state_dict( 
  File "/usr/local/lib/python3.9/dist-packages/bitsandbytes/nn/modules.py", line 298, in _load_from_state_dict 
    raise RuntimeError("Loading a quantized checkpoint into non-quantized Linear8bitLt is " 
RuntimeError: Loading a quantized checkpoint into non-quantized Linear8bitLt is not supported. Please call module.cuda() before module.load_state_dict()

Info

Commit: Before dev merge winglian/axolotl@cb9a887

update references to previous repo location

winglian/axolotl -> OpenAccess-AI-Collective/axolotl