Giter Club home page Giter Club logo

axolotl's Introduction

Axolotl

Axolotl is a tool designed to streamline the fine-tuning of various AI models, offering support for multiple configurations and architectures.

Features:

  • Train various Huggingface models such as llama, pythia, falcon, mpt
  • Supports fullfinetune, lora, qlora, relora, and gptq
  • Customize configurations using a simple yaml file or CLI overwrite
  • Load different dataset formats, use custom formats, or bring your own tokenized datasets
  • Integrated with xformer, flash attention, rope scaling, and multipacking
  • Works with single GPU or multiple GPUs via FSDP or Deepspeed
  • Easily run with Docker locally or on the cloud
  • Log results and optionally checkpoints to wandb or mlflow
  • And more!
phorm.ai

Table of Contents

axolotl

Axolotl provides a unified repository for fine-tuning
a variety of AI models with ease

Go ahead and Axolotl questions!!

pre-commit PyTest Status

Axolotl supports

fp16/fp32 lora qlora gptq gptq w/flash attn flash attn xformers attn
llama
Mistral
Mixtral-MoE
Mixtral8X22
Pythia
cerebras
btlm
mpt
falcon
gpt-j
XGen
phi
RWKV
Qwen
Gemma

✅: supported ❌: not supported ❓: untested

Quickstart ⚡

Get started with Axolotl in just a few steps! This quickstart guide will walk you through setting up and running a basic fine-tuning task.

Requirements: Python >=3.10 and Pytorch >=2.1.1.

git clone https://github.com/axolotl-ai-cloud/axolotl
cd axolotl

pip3 install packaging ninja
pip3 install -e '.[flash-attn,deepspeed]'

Usage

# preprocess datasets - optional but recommended
CUDA_VISIBLE_DEVICES="" python -m axolotl.cli.preprocess examples/openllama-3b/lora.yml

# finetune lora
accelerate launch -m axolotl.cli.train examples/openllama-3b/lora.yml

# inference
accelerate launch -m axolotl.cli.inference examples/openllama-3b/lora.yml \
    --lora_model_dir="./outputs/lora-out"

# gradio
accelerate launch -m axolotl.cli.inference examples/openllama-3b/lora.yml \
    --lora_model_dir="./outputs/lora-out" --gradio

# remote yaml files - the yaml config can be hosted on a public URL
# Note: the yaml config must directly link to the **raw** yaml
accelerate launch -m axolotl.cli.train https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/examples/openllama-3b/lora.yml

Advanced Setup

Environment

Docker

docker run --gpus '"all"' --rm -it winglian/axolotl:main-latest

Or run on the current files for development:

docker compose up -d

Tip

If you want to debug axolotl or prefer to use Docker as your development environment, see the debugging guide's section on Docker.

Docker advanced

A more powerful Docker command to run would be this:

docker run --privileged --gpus '"all"' --shm-size 10g --rm -it --name axolotl --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 --mount type=bind,src="${PWD}",target=/workspace/axolotl -v ${HOME}/.cache/huggingface:/root/.cache/huggingface winglian/axolotl:main-latest

It additionally:

  • Prevents memory issues when running e.g. deepspeed (e.g. you could hit SIGBUS/signal 7 error) through --ipc and --ulimit args.
  • Persists the downloaded HF data (models etc.) and your modifications to axolotl code through --mount/-v args.
  • The --name argument simply makes it easier to refer to the container in vscode (Dev Containers: Attach to Running Container...) or in your terminal.
  • The --privileged flag gives all capabilities to the container.
  • The --shm-size 10g argument increases the shared memory size. Use this if you see exitcode: -7 errors using deepspeed.

More information on nvidia website

Conda/Pip venv

  1. Install python >=3.10

  2. Install pytorch stable https://pytorch.org/get-started/locally/

  3. Install Axolotl along with python dependencies

    pip3 install packaging
    pip3 install -e '.[flash-attn,deepspeed]'
  4. (Optional) Login to Huggingface to use gated models/datasets.

    huggingface-cli login

    Get the token at huggingface.co/settings/tokens

Cloud GPU

For cloud GPU providers that support docker images, use winglian/axolotl-cloud:main-latest

Bare Metal Cloud GPU

LambdaLabs
Click to Expand
  1. Install python
sudo apt update
sudo apt install -y python3.10

sudo update-alternatives --install /usr/bin/python python /usr/bin/python3.10 1
sudo update-alternatives --config python # pick 3.10 if given option
python -V # should be 3.10
  1. Install pip
wget https://bootstrap.pypa.io/get-pip.py
python get-pip.py
  1. Install Pytorch https://pytorch.org/get-started/locally/

  2. Follow instructions on quickstart.

  3. Run

pip3 install protobuf==3.20.3
pip3 install -U --ignore-installed requests Pillow psutil scipy
  1. Set path
export LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu:$LD_LIBRARY_PATH
GCP
Click to Expand

Use a Deeplearning linux OS with cuda and pytorch installed. Then follow instructions on quickstart.

Make sure to run the below to uninstall xla.

pip uninstall -y torch_xla[tpu]

Windows

Please use WSL or Docker!

Mac

Use the below instead of the install method in QuickStart.

pip3 install -e '.'

More info: mac.md

Google Colab

Please use this example notebook.

Launching on public clouds via SkyPilot

To launch on GPU instances (both on-demand and spot instances) on 7+ clouds (GCP, AWS, Azure, OCI, and more), you can use SkyPilot:

pip install "skypilot-nightly[gcp,aws,azure,oci,lambda,kubernetes,ibm,scp]"  # choose your clouds
sky check

Get the example YAMLs of using Axolotl to finetune mistralai/Mistral-7B-v0.1:

git clone https://github.com/skypilot-org/skypilot.git
cd skypilot/llm/axolotl

Use one command to launch:

# On-demand
HF_TOKEN=xx sky launch axolotl.yaml --env HF_TOKEN

# Managed spot (auto-recovery on preemption)
HF_TOKEN=xx BUCKET=<unique-name> sky spot launch axolotl-spot.yaml --env HF_TOKEN --env BUCKET

Launching on public clouds via dstack

To launch on GPU instance (both on-demand and spot instances) on public clouds (GCP, AWS, Azure, Lambda Labs, TensorDock, Vast.ai, and CUDO), you can use dstack.

Write a job description in YAML as below:

# dstack.yaml
type: task

image: winglian/axolotl-cloud:main-20240429-py3.11-cu121-2.2.2

env:
  - HUGGING_FACE_HUB_TOKEN
  - WANDB_API_KEY

commands:
  - accelerate launch -m axolotl.cli.train config.yaml

ports:
  - 6006

resources:
  gpu:
    memory: 24GB..
    count: 2

then, simply run the job with dstack run command. Append --spot option if you want spot instance. dstack run command will show you the instance with cheapest price across multi cloud services:

pip install dstack
HUGGING_FACE_HUB_TOKEN=xxx WANDB_API_KEY=xxx dstack run . -f dstack.yaml # --spot

For further and fine-grained use cases, please refer to the official dstack documents and the detailed description of axolotl example on the official repository.

Dataset

Axolotl supports a variety of dataset formats. It is recommended to use a JSONL. The schema of the JSONL depends upon the task and the prompt template you wish to use. Instead of a JSONL, you can also use a HuggingFace dataset with columns for each JSONL field.

See these docs for more information on how to use different dataset formats.

Config

See examples for quick start. It is recommended to duplicate and modify to your needs. The most important options are:

  • model

    base_model: ./llama-7b-hf # local or huggingface repo

    Note: The code will load the right architecture.

  • dataset

    datasets:
        # huggingface repo
      - path: vicgalle/alpaca-gpt4
        type: alpaca
    
        # huggingface repo with specific configuration/subset
      - path: EleutherAI/pile
        name: enron_emails
        type: completion # format from earlier
        field: text # Optional[str] default: text, field to use for completion data
    
        # huggingface repo with multiple named configurations/subsets
      - path: bigcode/commitpackft
        name:
          - ruby
          - python
          - typescript
        type: ... # unimplemented custom format
    
        # fastchat conversation
        # See 'conversation' options: https://github.com/lm-sys/FastChat/blob/main/fastchat/conversation.py
      - path: ...
        type: sharegpt
        conversation: chatml # default: vicuna_v1.1
    
        # local
      - path: data.jsonl # or json
        ds_type: json # see other options below
        type: alpaca
    
        # dataset with splits, but no train split
      - path: knowrohit07/know_sql
        type: context_qa.load_v2
        train_on_split: validation
    
        # loading from s3 or gcs
        # s3 creds will be loaded from the system default and gcs only supports public access
      - path: s3://path_to_ds # Accepts folder with arrow/parquet or file path like above. Supports s3, gcs.
        ...
    
        # Loading Data From a Public URL
        # - The file format is `json` (which includes `jsonl`) by default. For different formats, adjust the `ds_type` option accordingly.
      - path: https://some.url.com/yourdata.jsonl # The URL should be a direct link to the file you wish to load. URLs must use HTTPS protocol, not HTTP.
        ds_type: json # this is the default, see other options below.
  • loading

    load_in_4bit: true
    load_in_8bit: true
    
    bf16: auto # require >=ampere, auto will detect if your GPU supports this and choose automatically.
    fp16: # leave empty to use fp16 when bf16 is 'auto'. set to false if you want to fallback to fp32
    tf32: true # require >=ampere
    
    bfloat16: true # require >=ampere, use instead of bf16 when you don't want AMP (automatic mixed precision)
    float16: true # use instead of fp16 when you don't want AMP

    Note: Repo does not do 4-bit quantization.

  • lora

    adapter: lora # 'qlora' or leave blank for full finetune
    lora_r: 8
    lora_alpha: 16
    lora_dropout: 0.05
    lora_target_modules:
      - q_proj
      - v_proj

All Config Options

See these docs for all config options.

Train

Run

accelerate launch -m axolotl.cli.train your_config.yml

Tip

You can also reference a config file that is hosted on a public URL, for example accelerate launch -m axolotl.cli.train https://yourdomain.com/your_config.yml

Preprocess dataset

You can optionally pre-tokenize dataset with the following before finetuning. This is recommended for large datasets.

  • Set dataset_prepared_path: to a local folder for saving and loading pre-tokenized dataset.
  • (Optional): Set push_dataset_to_hub: hf_user/repo to push it to Huggingface.
  • (Optional): Use --debug to see preprocessed examples.
python -m axolotl.cli.preprocess your_config.yml

Multi-GPU

Below are the options available in axolotl for training with multiple GPUs. Note that DeepSpeed is the recommended multi-GPU option currently because FSDP may experience loss instability.

DeepSpeed

Deepspeed is an optimization suite for multi-gpu systems allowing you to train much larger models than you might typically be able to fit into your GPU's VRAM. More information about the various optimization types for deepspeed is available at https://huggingface.co/docs/accelerate/main/en/usage_guides/deepspeed#what-is-integrated

We provide several default deepspeed JSON configurations for ZeRO stage 1, 2, and 3.

deepspeed: deepspeed_configs/zero1.json
accelerate launch -m axolotl.cli.train examples/llama-2/config.yml --deepspeed deepspeed_configs/zero1.json
FSDP
  • llama FSDP
fsdp:
  - full_shard
  - auto_wrap
fsdp_config:
  fsdp_offload_params: true
  fsdp_state_dict_type: FULL_STATE_DICT
  fsdp_transformer_layer_cls_to_wrap: LlamaDecoderLayer
FSDP + QLoRA

Axolotl supports training with FSDP and QLoRA, see these docs for more information.

Weights & Biases Logging

Make sure your WANDB_API_KEY environment variable is set (recommended) or you login to wandb with wandb login.

  • wandb options
wandb_mode:
wandb_project:
wandb_entity:
wandb_watch:
wandb_name:
wandb_log_model:
Special Tokens

It is important to have special tokens like delimiters, end-of-sequence, beginning-of-sequence in your tokenizer's vocabulary. This will help you avoid tokenization issues and help your model train better. You can do this in axolotl like this:

special_tokens:
  bos_token: "<s>"
  eos_token: "</s>"
  unk_token: "<unk>"
tokens: # these are delimiters
  - "<|im_start|>"
  - "<|im_end|>"

When you include these tokens in your axolotl config, axolotl adds these tokens to the tokenizer's vocabulary.

Inference Playground

Axolotl allows you to load your model in an interactive terminal playground for quick experimentation. The config file is the same config file used for training.

Pass the appropriate flag to the inference command, depending upon what kind of model was trained:

  • Pretrained LORA:
    python -m axolotl.cli.inference examples/your_config.yml --lora_model_dir="./lora-output-dir"
  • Full weights finetune:
    python -m axolotl.cli.inference examples/your_config.yml --base_model="./completed-model"
  • Full weights finetune w/ a prompt from a text file:
    cat /tmp/prompt.txt | python -m axolotl.cli.inference examples/your_config.yml \
      --base_model="./completed-model" --prompter=None --load_in_8bit=True

-- With gradio hosting

python -m axolotl.cli.inference examples/your_config.yml --gradio

Please use --sample_packing False if you have it on and receive the error similar to below:

RuntimeError: stack expects each tensor to be equal size, but got [1, 32, 1, 128] at entry 0 and [1, 32, 8, 128] at entry 1

Merge LORA to base

The following command will merge your LORA adapater with your base model. You can optionally pass the argument --lora_model_dir to specify the directory where your LORA adapter was saved, otherwhise, this will be inferred from output_dir in your axolotl config file. The merged model is saved in the sub-directory {lora_model_dir}/merged.

python3 -m axolotl.cli.merge_lora your_config.yml --lora_model_dir="./completed-model"

You may need to use the gpu_memory_limit and/or lora_on_cpu config options to avoid running out of memory. If you still run out of CUDA memory, you can try to merge in system RAM with

CUDA_VISIBLE_DEVICES="" python3 -m axolotl.cli.merge_lora ...

although this will be very slow, and using the config options above are recommended instead.

Common Errors 🧰

See also the FAQ's and debugging guide.

If you encounter a 'Cuda out of memory' error, it means your GPU ran out of memory during the training process. Here's how to resolve it:

Please reduce any below

  • micro_batch_size
  • eval_batch_size
  • gradient_accumulation_steps
  • sequence_len

If it does not help, try running without deepspeed and without accelerate (replace "accelerate launch" with "python") in the command.

Using adamw_bnb_8bit might also save you some memory.

failed (exitcode: -9)

Usually means your system has run out of system memory. Similarly, you should consider reducing the same settings as when you run out of VRAM. Additionally, look into upgrading your system RAM which should be simpler than GPU upgrades.

RuntimeError: expected scalar type Float but found Half

Try set fp16: true

NotImplementedError: No operator found for memory_efficient_attention_forward ...

Try to turn off xformers.

accelerate config missing

It's safe to ignore it.

NCCL Timeouts during training

See the NCCL guide.

Tokenization Mismatch b/w Inference & Training

For many formats, Axolotl constructs prompts by concatenating token ids after tokenizing strings. The reason for concatenating token ids rather than operating on strings is to maintain precise accounting for attention masks.

If you decode a prompt constructed by axolotl, you might see spaces between tokens (or lack thereof) that you do not expect, especially around delimiters and special tokens. When you are starting out with a new format, you should always do the following:

  1. Materialize some data using python -m axolotl.cli.preprocess your_config.yml --debug, and then decode the first few rows with your model's tokenizer.
  2. During inference, right before you pass a tensor of token ids to your model, decode these tokens back into a string.
  3. Make sure the inference string from #2 looks exactly like the data you fine tuned on from #1, including spaces and new lines. If they aren't the same, adjust your inference server accordingly.
  4. As an additional troubleshooting step, you can look at the token ids between 1 and 2 to make sure they are identical.

Having misalignment between your prompts during training and inference can cause models to perform very poorly, so it is worth checking this. See this blog post for a concrete example.

Debugging Axolotl

See this debugging guide for tips on debugging Axolotl, along with an example configuration for debugging with VSCode.

Need help? 🙋

Join our Discord server where we our community members can help you.

Need dedicated support? Please contact us at ✉️[email protected] for dedicated support options.

Badge ❤🏷️

Building something cool with Axolotl? Consider adding a badge to your model card.

[<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)

Built with Axolotl

Community Showcase

Check out some of the projects and models that have been built using Axolotl! Have a model you'd like to add to our Community Showcase? Open a PR with your model.

Open Access AI Collective

PocketDoc Labs

Contributing 🤝

Please read the contributing guide

Bugs? Please check the open issues else create a new Issue.

PRs are greatly welcome!

Please run the quickstart instructions followed by the below to setup env:

pip3 install -r requirements-dev.txt -r requirements-tests.txt
pre-commit install

# test
pytest tests/

# optional: run against all files
pre-commit run --all-files

Thanks to all of our contributors to date. Help drive open source AI progress forward by contributing to Axolotl.

contributor chart by https://contrib.rocks

Sponsors 🤝❤

OpenAccess AI Collective is run by volunteer contributors such as winglian, NanoCode012, tmm1, mhenrichsen, casper-hansen, hamelsmu and many more who help us accelerate forward by fixing bugs, answering community questions and implementing new features. Axolotl needs donations from sponsors for the compute needed to run our unit & integration tests, troubleshooting community issues, and providing bounties. If you love axolotl, consider sponsoring the project via GitHub Sponsors, Ko-fi or reach out directly to [email protected].


💎 Diamond Sponsors - Contact directly


🥇 Gold Sponsors - $5000/mo


🥈 Silver Sponsors - $1000/mo


🥉 Bronze Sponsors - $500/mo


axolotl's People

Contributors

akj2018 avatar ali-mosavian avatar angainordev avatar brianfitzgerald avatar casper-hansen avatar cg123 avatar chiragjn avatar dreamgenx avatar fearnworks avatar hamelsmu avatar jinwonkim93 avatar johanwork avatar jphme avatar kallewoof avatar maximegmd avatar mhenrichsen avatar monk1337 avatar nanocode012 avatar napuh avatar pocketdoclabs avatar ricardodominguez avatar seungduk-yanolja avatar theobjectivedad avatar thytu avatar tmm1 avatar tokestermw avatar utensil avatar viktoriussuwandi avatar winglian avatar xzuyn avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

axolotl's Issues

[Question] Should inference instruction have stripped last new line?

https://github.com/OpenAccess-AI-Collective/axolotl/blob/bbfc333a0136bfcf3f2129986253a8141c6c64d5/scripts/finetune.py#L68-L71

The inference script requires typing enter to pass the input even it's only one line. The result is that a \n is appended to that line.

>>> get_multi_line_input()
Give me an instruction (Ctrl + D to finish): 
test
'test\n'

Should this be changed to instruction=instruction.strip('\n')?

I am not sure about other prompting style, but for completion, we want the text to be continued test is a word. Instead of having a \n after the input test\n is a word.

An alternative solution would be to add strip inside of the CompletionPrompter.

[Refactor] Remove use of local variables `save_steps` and `eval_steps` as they are not modified

https://github.com/OpenAccess-AI-Collective/axolotl/blob/87dffbc451fcd129f143c64a3ff4ea9336aaa3a5/src/axolotl/utils/trainer.py#L53-L54

When I looked into the code and saw this, I thought it was saved to a local variable to be modified / compared. However, it does not seem so.

It might be better to remove these and use the original cfg counterparts to remove the preconception that the local variable is different. Of course, it's also ok to leave them as is.

[BUG] Fix attention masking when concatenating sequences

Someone should review, but I think we're doing it incorrectly. we need to set the attention mask so that the first token in each concatenated sequence has an attention mask of zero. In most cases, this is setting the mask of the bos token to zero

Issue load Llama tokenizer

Hello, I'm getting a weird issue loading tokenizer. I've checked that the line of code hasn't changed even on my latest pull. The only difference could be transformer source changed something.

https://github.com/winglian/axolotl/blob/7576d85c735e307fa1dbbcb8e0cba8b53bb1fa48/src/axolotl/utils/models.py#L138-L139

Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00,  7.88it/s]
Repo id must use alphanumeric chars or '-', '_', '.', '--' and '..' are forbidden, '-' and '.' cannot start or end the name, max length is 96: 'LlamaForCausalLM(
  (model): LlamaModel(
    (embed_tokens): Embedding(32000, 4096, padding_idx=0)
    (layers): ModuleList(
      (0-31): 32 x LlamaDecoderLayer(
        (self_attn): LlamaAttention(
          (q_proj): Linear8bitLt(in_features=4096, out_features=4096, bias=False)
          (k_proj): Linear8bitLt(in_features=4096, out_features=4096, bias=False)
          (v_proj): Linear8bitLt(in_features=4096, out_features=4096, bias=False)
          (o_proj): Linear8bitLt(in_features=4096, out_features=4096, bias=False)
          (rotary_emb): LlamaRotaryEmbedding()
        )
        (mlp): LlamaMLP(
          (gate_proj): Linear8bitLt(in_features=4096, out_features=11008, bias=False)
          (down_proj): Linear8bitLt(in_features=11008, out_features=4096, bias=False)
          (up_proj): Linear8bitLt(in_features=4096, out_features=11008, bias=False)
          (act_fn): SiLUActivation()
        )
        (input_layernorm): LlamaRMSNorm()
        (post_attention_layernorm): LlamaRMSNorm()
      )
    )
    (norm): LlamaRMSNorm()
  )
  (lm_head): Linear(in_features=4096, out_features=32000, bias=False)
)'.
Traceback (most recent call last):
  File "/workspace/src/axolotl/utils/models.py", line 140, in load_model
    tokenizer = LlamaTokenizer.from_pretrained(model)

What does `shard` do?

In latest update, there is shard argument. What is it trying to do? Is it trying to load model then output lora adapter?

https://github.com/winglian/axolotl/blob/cb9a88704707b0fc3362988a7f57b606e4448ac7/scripts/finetune.py#L169-L171

Is this due to how you're saving full model now?

https://github.com/winglian/axolotl/blob/cb9a88704707b0fc3362988a7f57b606e4448ac7/scripts/finetune.py#L222

As I understand, if you want to extract lora from checkpoint, you need to load from the checkpoints first, then set the base model with those weights. If this shard meant something different, could I PR this feature of lora extraction from checkpoint?

[Bug] Seed does not always load from `cfg.seed`

I think this is an easy Issue to tackle for anyone interested.

It should be set to a default value if not defined, somewhere at start hopefully (maybe when loading config).

  • Update below

https://github.com/OpenAccess-AI-Collective/axolotl/blob/a617f1b65eb3d986ab7844630944fe4c979158fe/src/axolotl/utils/data.py#L115

  • Update below

https://github.com/OpenAccess-AI-Collective/axolotl/blob/a617f1b65eb3d986ab7844630944fe4c979158fe/src/axolotl/utils/data.py#L220

  • Pass seed to Trainer
  • Pass seed to any function that has it available

Support python 3.10 and higher

Currently axolotl requires python 3.9
Python 3.10 and 3.11 will fail due to dependency issue
Can you please update the dependencies so that axolotl will work in 3.10 and 3.11

disable checkpoint for wandb_log_model:

update all the configs / examples and change wandb_log_model: checkpoint => wandb_log_model:

this will prevent uploading obscenely large artifacts to wandb by default and using quota

save steps enhancement

if save_steps is a fraction, calculate the steps based on floor(save_steps * total_steps_per_epoch).

this way if someone were to say 0.5, they could get a checkpoint at half an epoch and the end of an epoch without having to manually figure it out

AttributeError: 'AlpacaPrompter' object has no attribute 'prompt_no_input'

Not sure if this is intended, but if the prompt dict contains the key "input" but the value for input is an empty string the line
input in prompt will resolve to False:

class AlpacaPromptTokenizingStrategy(InstructionPromptTokenizingStrategy):
    def parse_instruction_fields(self, prompt) -> (str, str, str):
        print(f"Is input in prompt?:  {input in prompt}")
        return (
            prompt["instruction"],
            prompt["input"] if "input" in prompt else "",
            prompt["output"],
        )

If the prompt input is an empty string, build_prompt will try to build a prompt with prompt_no_input

    def build_prompt(
        self,
        instruction: str,
        input: Union[None, str] = None,
        output: Union[None, str] = None,
    ) -> Generator[str, None, None]:
        # returns the full prompt from instruction and optional input
        # if a label (=response, =output) is provided, it's also appended.
        if input:
            res = self.prompt_input.format(instruction=instruction, input=input)
        else:
            res = self.prompt_no_input.format(instruction=instruction)
        if output:
            res = f"{res}{output}"
        yield res

but if the prompt style is 'alpaca', there is no prompt_no_input:

  def match_prompt_style(self):
      if self.prompt_style == PromptStyle.instruct.value:
          self.prompt_input = (
              self.system_prompt
              + "### Instruction:\n{instruction}\n\n### Input:\n{input}\n\n### Response:\n"
          )
          self.prompt_no_input = (
              self.system_no_input_prompt
              + "### Instruction:\n{instruction}\n\n### Response:\n"
          )
          self.response_split = "### Response:"
      if self.prompt_style == PromptStyle.chat.value:
          self.prompt_input = (
              self.system_prompt + "USER: {instruction}\n{input}\nASSISTANT:"
          )
          self.prompt_no_input = (
              self.system_no_input_prompt + "USER: {instruction}\nASSISTANT:"
          )
          self.response_split = "ASSISTANT:"

Not sure what the best solution is - Add a prompt_no_input for alpaca style prompts or rephrase the ifs so that the result is an empty "### Input: "?

I'm willing to do a PR, just tell me what solution you want to see.

Trainer() got multiple values for keyword argument 'callbacks'

when running (8xA100 80Gb)
I run into this error:

File "/root/axolotl/scripts/finetune.py", line 239, in <module> trainer = setup_trainer(cfg, train_dataset, eval_dataset, model, tokenizer) File "/root/axolotl/src/axolotl/utils/trainer.py", line 196, in setup_trainer Traceback (most recent call last): File "/root/axolotl/scripts/finetune.py", line 239, in <module> trainer = transformers.Trainer( fire.Fire(train)TypeError : File "/root/.local/share/virtualenvs/axolotl-9mRV-5br/lib/python3.9/site-packages/fir e/core.py", line 141, in Fire transformers.trainer.Trainer() got multiple values for keyword argument 'callbacks' fire.Fire(train) File "/root/.local/share/virtualenvs/axolotl-9mRV-5br/lib/python3.9/site-packages/fire/ core.py", line 141, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File "/root/.local/share/virtualenvs/axolotl-9mRV-5br/lib/python3.9/site-packages/fire/ core.py", line 475, in _Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File "/root/.local/share/virtualenvs/axolotl-9mRV-5br/lib/python3.9/site-packages/fire/ core.py", line 475, in _Fire component, remaining_args = _CallAndUpdateTrace( File "/root/.local/share/virtualenvs/axolotl-9mRV-5br/lib/python3.9/site-packages/fire/ core.py", line 691, in _CallAndUpdateTrace component, remaining_args = _CallAndUpdateTrace( File "/root/.local/share/virtualenvs/axolotl-9mRV-5br/lib/python3.9/site-packages/fire/ core.py", line 691, in _CallAndUpdateTrace component = fn(*varargs, **kwargs) File "/root/axolotl/scripts/finetune.py", line 198, in train component = fn(*varargs, **kwargs) File "/root/axolotl/scripts/finetune.py", line 198, in train trainer = setup_trainer(cfg, train_dataset, eval_dataset, model, tokenizer) File "/root/axolotl/src/axolotl/utils/trainer.py", line 196, in setup_trainer trainer = setup_trainer(cfg, train_dataset, eval_dataset, model, tokenizer) File "/root/axolotl/src/axolotl/utils/trainer.py", line 196, in setup_trainer trainer = transformers.Trainer( TypeError: transformers.trainer.Trainer() got multiple values for keyword argument 'c allbacks'trainer = transformers.Trainer(

[Feature] Allow passing file to inference on

Problem

It may be necessary to repeat the same questions across many experiments. It is time consuming to copy paste line by line.

Feature

Allow passing path to jsonl file or similar that can be read and ran through the model then output to a results file

qlora save peft on final callback

{'eval_loss': 1.2171393632888794, 'eval_runtime': 7.1067, 'eval_samples_per_second': 4.362, 'eval_steps_per_second': 0.141, 'epoch': 4.38}
{'loss': 1.0812, 'learning_rate': 3.581603349196372e-06, 'epoch': 4.5}
{'loss': 1.0813, 'learning_rate': 2.0253513192751373e-06, 'epoch': 4.62}
{'loss': 1.0691, 'learning_rate': 9.035651368646648e-07, 'epoch': 4.75}
{'loss': 1.0922, 'learning_rate': 2.2640387134577058e-07, 'epoch': 4.88}
{'loss': 1.117, 'learning_rate': 0.0, 'epoch': 5.0}
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 40/40 [2:10:50<00:00, 192.75s/it]
The intermediate checkpoints of PEFT may not be saved correctly, using TrainerCallback to save adapter_model.bin in corresponding folders, here are some examples huggingface/peft#96
The intermediate checkpoints of PEFT may not be saved correctly, using TrainerCallback to save adapter_model.bin in corresponding folders, here are some examples huggingface/peft#96
The intermediate checkpoints of PEFT may not be saved correctly, using TrainerCallback to save adapter_model.bin in corresponding folders, here are some examples huggingface/peft#96
The intermediate checkpoints of PEFT may not be saved correctly, using TrainerCallback to save adapter_model.bin in corresponding folders, here are some examples huggingface/peft#96
The intermediate checkpoints of PEFT may not be saved correctly, using TrainerCallback to save adapter_model.bin in corresponding folders, here are some examples huggingface/peft#96
The intermediate checkpoints of PEFT may not be saved correctly, using TrainerCallback to save adapter_model.bin in corresponding folders, here are some examples huggingface/peft#96
The intermediate checkpoints of PEFT may not be saved correctly, using TrainerCallback to save adapter_model.bin in corresponding folders, here are some examples huggingface/peft#96
The intermediate checkpoints of PEFT may not be saved correctly, using TrainerCallback to save adapter_model.bin in corresponding folders, here are some examples huggingface/peft#96
Traceback (most recent call last):
File "/workspace/axolotl/scripts/finetune.py", line 256, in
Traceback (most recent call last):
File "/workspace/axolotl/scripts/finetune.py", line 256, in
Traceback (most recent call last):
File "/workspace/axolotl/scripts/finetune.py", line 256, in
fire.Fire(train)fire.Fire(train)Traceback (most recent call last):
Traceback (most recent call last):

trainer.train(resume_from_checkpoint=resume_from_checkpoint)
File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/trainer.py", line 2094, in _inner_training_loop
File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/trainer.py", line 1696, in train
return inner_training_loop(
File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/trainer.py", line 2094, in _inner_training_loop
File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
return inner_training_loop(
File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/trainer.py", line 2094, in _inner_training_loop
File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
self._load_best_model()
File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/trainer.py", line 2291, in _load_best_model
self._load_best_model()
File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/trainer.py", line 2291, in _load_best_model
File "/workspace/axolotl/scripts/finetune.py", line 244, in train
trainer.train(resume_from_checkpoint=resume_from_checkpoint)
return inner_training_loop(
File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/trainer.py", line 2094, in _inner_training_loop
return inner_training_loop(
File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/trainer.py", line 2094, in _inner_training_loop
File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/trainer.py", line 1696, in train
return inner_training_loop(
self._load_best_model()
File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/trainer.py", line 2291, in _load_best_model
File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/trainer.py", line 2094, in _inner_training_loop
self._load_best_model()
self._load_best_model()self._load_best_model()

File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/trainer.py", line 2291, in _load_best_model
self._issue_warnings_after_load(load_result)
File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/trainer.py", line 2291, in _load_best_model
File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/trainer.py", line 2291, in _load_best_model
self._issue_warnings_after_load(load_result)
UnboundLocalErrorUnboundLocalError: local variable 'load_result' referenced before assignment
: local variable 'load_result' referenced before assignment
self._issue_warnings_after_load(load_result)
self._load_best_model()
UnboundLocalError : File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/trainer.py", line 2291, in _load_best_model

File "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/trainer.py", line 2291, in _load_best_model

UnboundLocalError: local variable 'load_result' referenced before assignment
self._issue_warnings_after_load(load_result)

[Question] Duplicate shard config names?

I noticed two different shard code using different configs in load_tokenized_prepared_datasets and load_prepare_datasets

https://github.com/OpenAccess-AI-Collective/axolotl/blob/a617f1b65eb3d986ab7844630944fe4c979158fe/src/axolotl/utils/data.py#L114-L115

https://github.com/OpenAccess-AI-Collective/axolotl/blob/a617f1b65eb3d986ab7844630944fe4c979158fe/src/axolotl/utils/data.py#L345-L351

Not sure if these two parts should be combined and called elsewhere, but I think the config should be unified to use same name.

[Refactor] Fix duplicate `config` and `examples` folder and update previous configs

In the past, the configs were all in configs. However, as things changed, some parts have been moved to the examples folder.

Furthermore, there are some old/invalid configs within the configs folder due to recent changes.

It would be good to move all configs to examples folder within each architecture for better maintenance and update those previous configs to work.

I'm curious if anyone has any better ideas?

[Feature] Add tests

We need to also think about adding some tests to ensure more stability. I do not have much experience on the matter, however, I think at the very least, we should test the following:

Functional

  • Test validate_config for conflicting configs

End to end for each architecture in Readme for one or two global_step:

  • fp16/fp32
  • 4bit
  • 8bit
  • gptq

[Bug] Add `cfg.hf_use_auth_token` to set whether to attach auth token

As discussed in Discord, if a user is not authenticated to huggingface, the code would error as it expects the token.

We would like to swap to look for whether to attach using a config instead cfg.hf_use_auth_token.

issues to fix reported from discord

Bambi#1600
I can report my observations from my attempts at int8 LoRA training via the trainer built into Oobabooga’s textgen webUI if it helps:

  1. Myself and others are able to train LoRAs with int8 precision for the original unquantized HF llama-7b and llama-13b models
  2. The LoRA from this train produced expected results at inference when applied to the unquantized llama models
  3. VRAM usage during the train was observed to be evenly seemed split between cards
  4. GPU utilization however was observed to alternate between the cards (one card was pulling 150 watts, the other pulling 300 watts then they’d swap) indicating a serialized but threaded workload vs true parallelization
  5. Encountered but upon saving first checkpoint causing both cards to OOM. Following numerous forum threads we reverted out bitsandbytes model from 0.38.1 to 0.37.2 which resolved the issue.

ImportError: cannot import name 'Mapping' from 'collections'

 ~/g/axolotl   axolotl   main ≡  ?1  accelerate launch scripts/finetune.py configs/llama_30B_4bit.yml~~
Traceback (most recent call last):
  File "/home/eric/git/axolotl/scripts/finetune.py", line 11, in <module>
    from attrdict import AttrDefault
  File "/home/eric/miniconda3/envs/axolotl/lib/python3.10/site-packages/attrdict/__init__.py", line 5, in <module>
    from attrdict.mapping import AttrMap
  File "/home/eric/miniconda3/envs/axolotl/lib/python3.10/site-packages/attrdict/mapping.py", line 4, in <module>
    from collections import Mapping
ImportError: cannot import name 'Mapping' from 'collections' (/home/eric/miniconda3/envs/axolotl/lib/python3.10/collections/__init__.py)
Traceback (most recent call last):
  File "/home/eric/git/axolotl/scripts/finetune.py", line 11, in <module>
    from attrdict import AttrDefault
  File "/home/eric/miniconda3/envs/axolotl/lib/python3.10/site-packages/attrdict/__init__.py", line 5, in <module>
    from attrdict.mapping import AttrMap
  File "/home/eric/miniconda3/envs/axolotl/lib/python3.10/site-packages/attrdict/mapping.py", line 4, in <module>
    from collections import Mapping

No module named axolotl.utils.validation

After pip installing axolotl, and trying to run the provided finetuning command I get:

Traceback (most recent call last): File "/home/someone/axolotl/scripts/finetune.py", line 17, in <module> from axolotl.utils.validation import validate_config ModuleNotFoundError: No module named 'axolotl.utils.validation'

Can't find validation.py anywhere in the commits either

Unusable early_stopping_patience param

Whenever user uses early_stopping_patience it results in the following error: AssertionError: EarlyStoppingCallback requires load_best_model_at_end = True

confusing error message

I get a confusing error message. Can you please help?

My command line is:
accelerate launch scripts/finetune.py configs/llama_30B_4bit.yml

My config is:

base_model: ../alpaca_lora_4bit/llama-30b-4bit-128g.safetensors
base_model_config: ../alpaca_lora_4bit/llama-30b-4bit/
model_type: LlamaForCausalLM
tokenizer_type: LlamaTokenizer
load_in_8bit: false
datasets:
  - path: ../alpaca_lora_4bit/leet10k-alpaca-merged.json
    type: alpaca
dataset_prepared_path: data/last_run_prepared
val_set_size: 0.04
adapter: lora
lora_model_dir:
sequence_len: 2048
max_packed_sequence_len: 1024
lora_r: 16
lora_alpha: 16
lora_dropout: 0.05
lora_target_modules:
  - q_proj
  - v_proj
  - k_proj
  - o_proj
lora_fan_in_fan_out: false
wandb_project:
wandb_watch:
wandb_run_id:
wandb_log_model: checkpoint
output_dir: ./lora-test
batch_size: 128
micro_batch_size: 8
num_epochs: 4
warmup_steps: 100
learning_rate: 0.00003
train_on_inputs: false
group_by_length: false
bf16: true
tf32: true
gradient_checkpointing: false
early_stopping_patience: 3
resume_from_checkpoint:
auto_resume_from_checkpoints: true
local_rank:
load_4bit: true
xformers_attention: true
flash_attention:

My error message is:

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
binbin  /home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda121.so
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda121.so

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: /home/eric/miniconda3/envs/axolotl2 did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
  warn(msg)
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: /usr/lib/wsl/lib did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
  warn(msg)
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('unix')}
  warn(msg)
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/tmp/torchelastic_x7f2qly7/none_f9zjdlpc/attempt_0/0/error.json')}
  warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.6
CUDA SETUP: Detected CUDA version 121
CUDA SETUP: Loading binary /home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda121.so...
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: /home/eric/miniconda3/envs/axolotl2 did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
  warn(msg)
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: /usr/lib/wsl/lib did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
  warn(msg)
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('unix')}
  warn(msg)
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/tmp/torchelastic_x7f2qly7/none_f9zjdlpc/attempt_0/3/error.json')}
  warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.6
CUDA SETUP: Detected CUDA version 121
CUDA SETUP: Loading binary /home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda121.so...

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
bin /home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda121.so
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: /home/eric/miniconda3/envs/axolotl2 did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
  warn(msg)
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: /usr/lib/wsl/lib did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
  warn(msg)
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('unix')}
  warn(msg)
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/tmp/torchelastic_x7f2qly7/none_f9zjdlpc/attempt_0/2/error.json')}
  warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.6
CUDA SETUP: Detected CUDA version 121
CUDA SETUP: Loading binary /home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda121.so...
bin /home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda121.so
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: /home/eric/miniconda3/envs/axolotl2 did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
  warn(msg)
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: /usr/lib/wsl/lib did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
  warn(msg)
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('unix')}
  warn(msg)
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/tmp/torchelastic_x7f2qly7/none_f9zjdlpc/attempt_0/1/error.json')}
  warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.6
CUDA SETUP: Detected CUDA version 121
CUDA SETUP: Loading binary /home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda121.so...
INFO:root:loading model, tokenizer, and lora_config...
INFO:root:loading model, tokenizer, and lora_config...
INFO:root:patching with xformers attention
Replaced attention with xformers_attention
Loading Model ...
INFO:root:patching with xformers attention
Replaced attention with xformers_attention
Loading Model ...
INFO:root:loading model, tokenizer, and lora_config...
INFO:root:loading model, tokenizer, and lora_config...
INFO:root:patching with xformers attention
Replaced attention with xformers_attention
Loading Model ...
INFO:root:patching with xformers attention
Replaced attention with xformers_attention
Loading Model ...
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/utils/modeling.py:779: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  with safe_open(checkpoint_file, framework="pt") as f:
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/torch/_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  return self.fget.__get__(instance, owner)()
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/torch/storage.py:899: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  storage = cls(wrap_storage=untyped_storage)
WARNING:accelerate.utils.modeling:The safetensors archive passed at ../alpaca_lora_4bit/llama-30b-4bit-128g.safetensors does not contain metadata. Make sure to save your model with the `save_pretrained` method. Defaulting to 'pt' metadata.
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/safetensors/torch.py:99: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  with safe_open(filename, framework="pt", device=device) as f:
ERROR:root:Exception raised attempting to load model, retrying with AutoModelForCausalLM
ERROR:root:CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Traceback (most recent call last):
  File "/home/eric/git/axolotl/src/axolotl/utils/models.py", line 93, in load_model
    model, tokenizer = load_llama_model_4bit_low_ram(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/alpaca_lora_4bit/autograd_4bit.py", line 249, in load_llama_model_4bit_low_ram
    model = accelerate.load_checkpoint_and_dispatch(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/big_modeling.py", line 479, in load_checkpoint_and_dispatch
    load_checkpoint_in_model(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/utils/modeling.py", line 924, in load_checkpoint_in_model
    checkpoint = load_state_dict(checkpoint_file, device_map=device_map)
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/utils/modeling.py", line 804, in load_state_dict
    return safe_load_file(checkpoint_file, device=devices[0])
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/safetensors/torch.py", line 101, in load_file
    result[k] = f.get_tensor(k)
RuntimeError: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/utils/modeling.py:779: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  with safe_open(checkpoint_file, framework="pt") as f:
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/torch/_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  return self.fget.__get__(instance, owner)()
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/torch/storage.py:899: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  storage = cls(wrap_storage=untyped_storage)
WARNING:accelerate.utils.modeling:The safetensors archive passed at ../alpaca_lora_4bit/llama-30b-4bit-128g.safetensors does not contain metadata. Make sure to save your model with the `save_pretrained` method. Defaulting to 'pt' metadata.
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/safetensors/torch.py:99: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  with safe_open(filename, framework="pt", device=device) as f:
ERROR:root:Exception raised attempting to load model, retrying with AutoModelForCausalLM
ERROR:root:CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Traceback (most recent call last):
  File "/home/eric/git/axolotl/src/axolotl/utils/models.py", line 93, in load_model
    model, tokenizer = load_llama_model_4bit_low_ram(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/alpaca_lora_4bit/autograd_4bit.py", line 249, in load_llama_model_4bit_low_ram
    model = accelerate.load_checkpoint_and_dispatch(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/big_modeling.py", line 479, in load_checkpoint_and_dispatch
    load_checkpoint_in_model(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/utils/modeling.py", line 924, in load_checkpoint_in_model
    checkpoint = load_state_dict(checkpoint_file, device_map=device_map)
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/utils/modeling.py", line 804, in load_state_dict
    return safe_load_file(checkpoint_file, device=devices[0])
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/safetensors/torch.py", line 101, in load_file
    result[k] = f.get_tensor(k)
RuntimeError: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/utils/modeling.py:779: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  with safe_open(checkpoint_file, framework="pt") as f:
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/torch/_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  return self.fget.__get__(instance, owner)()
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/torch/storage.py:899: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  storage = cls(wrap_storage=untyped_storage)
WARNING:accelerate.utils.modeling:The safetensors archive passed at ../alpaca_lora_4bit/llama-30b-4bit-128g.safetensors does not contain metadata. Make sure to save your model with the `save_pretrained` method. Defaulting to 'pt' metadata.
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/safetensors/torch.py:99: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  with safe_open(filename, framework="pt", device=device) as f:
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/utils/modeling.py:779: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  with safe_open(checkpoint_file, framework="pt") as f:
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/torch/_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  return self.fget.__get__(instance, owner)()
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/torch/storage.py:899: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  storage = cls(wrap_storage=untyped_storage)
WARNING:accelerate.utils.modeling:The safetensors archive passed at ../alpaca_lora_4bit/llama-30b-4bit-128g.safetensors does not contain metadata. Make sure to save your model with the `save_pretrained` method. Defaulting to 'pt' metadata.
/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/safetensors/torch.py:99: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  with safe_open(filename, framework="pt", device=device) as f:
ERROR:root:Exception raised attempting to load model, retrying with AutoModelForCausalLM
ERROR:root:Autograd4bitQuantLinear() does not have a parameter or a buffer named qzeros.
Traceback (most recent call last):
  File "/home/eric/git/axolotl/src/axolotl/utils/models.py", line 93, in load_model
    model, tokenizer = load_llama_model_4bit_low_ram(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/alpaca_lora_4bit/autograd_4bit.py", line 249, in load_llama_model_4bit_low_ram
    model = accelerate.load_checkpoint_and_dispatch(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/big_modeling.py", line 479, in load_checkpoint_and_dispatch
    load_checkpoint_in_model(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/utils/modeling.py", line 946, in load_checkpoint_in_model
    set_module_tensor_to_device(model, param_name, param_device, value=param, dtype=dtype)
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/utils/modeling.py", line 131, in set_module_tensor_to_device
    raise ValueError(f"{module} does not have a parameter or a buffer named {tensor_name}.")
ValueError: Autograd4bitQuantLinear() does not have a parameter or a buffer named qzeros.
ERROR:root:Exception raised attempting to load model, retrying with AutoModelForCausalLM
ERROR:root:Autograd4bitQuantLinear() does not have a parameter or a buffer named qzeros.
Traceback (most recent call last):
  File "/home/eric/git/axolotl/src/axolotl/utils/models.py", line 93, in load_model
    model, tokenizer = load_llama_model_4bit_low_ram(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/alpaca_lora_4bit/autograd_4bit.py", line 249, in load_llama_model_4bit_low_ram
    model = accelerate.load_checkpoint_and_dispatch(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/big_modeling.py", line 479, in load_checkpoint_and_dispatch
    load_checkpoint_in_model(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/utils/modeling.py", line 946, in load_checkpoint_in_model
    set_module_tensor_to_device(model, param_name, param_device, value=param, dtype=dtype)
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/utils/modeling.py", line 131, in set_module_tensor_to_device
    raise ValueError(f"{module} does not have a parameter or a buffer named {tensor_name}.")
ValueError: Autograd4bitQuantLinear() does not have a parameter or a buffer named qzeros.
Traceback (most recent call last):
  File "/home/eric/git/axolotl/src/axolotl/utils/models.py", line 93, in load_model
    model, tokenizer = load_llama_model_4bit_low_ram(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/alpaca_lora_4bit/autograd_4bit.py", line 249, in load_llama_model_4bit_low_ram
    model = accelerate.load_checkpoint_and_dispatch(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/big_modeling.py", line 479, in load_checkpoint_and_dispatch
    load_checkpoint_in_model(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/utils/modeling.py", line 924, in load_checkpoint_in_model
    checkpoint = load_state_dict(checkpoint_file, device_map=device_map)
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/utils/modeling.py", line 804, in load_state_dict
    return safe_load_file(checkpoint_file, device=devices[0])
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/safetensors/torch.py", line 101, in load_file
    result[k] = f.get_tensor(k)
RuntimeError: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/transformers/configuration_utils.py", line 659, in _get_config_dict
    config_dict = cls._dict_from_json_file(resolved_config_file)
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/transformers/configuration_utils.py", line 750, in _dict_from_json_file
    text = reader.read()
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe8 in position 0: invalid continuation byte

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/eric/git/axolotl/scripts/finetune.py", line 246, in <module>
    fire.Fire(train)
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/fire/core.py", line 475, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/home/eric/git/axolotl/scripts/finetune.py", line 178, in train
    model, tokenizer, lora_config = load_model(
  File "/home/eric/git/axolotl/src/axolotl/utils/models.py", line 136, in load_model
    model = AutoModelForCausalLM.from_pretrained(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/transformers/models/auto/auto_factory.py", line 445, in from_pretrained
    config, kwargs = AutoConfig.from_pretrained(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/transformers/models/auto/configuration_auto.py", line 922, in from_pretrained
    config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/transformers/configuration_utils.py", line 574, in get_config_dict
    config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/transformers/configuration_utils.py", line 662, in _get_config_dict
    raise EnvironmentError(
OSError: It looks like the config file at '../alpaca_lora_4bit/llama-30b-4bit-128g.safetensors' is not a valid JSON file.
Traceback (most recent call last):
  File "/home/eric/git/axolotl/src/axolotl/utils/models.py", line 93, in load_model
    model, tokenizer = load_llama_model_4bit_low_ram(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/alpaca_lora_4bit/autograd_4bit.py", line 249, in load_llama_model_4bit_low_ram
    model = accelerate.load_checkpoint_and_dispatch(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/big_modeling.py", line 479, in load_checkpoint_and_dispatch
    load_checkpoint_in_model(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/utils/modeling.py", line 924, in load_checkpoint_in_model
    checkpoint = load_state_dict(checkpoint_file, device_map=device_map)
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/utils/modeling.py", line 804, in load_state_dict
    return safe_load_file(checkpoint_file, device=devices[0])
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/safetensors/torch.py", line 101, in load_file
    result[k] = f.get_tensor(k)
RuntimeError: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/transformers/configuration_utils.py", line 659, in _get_config_dict
    config_dict = cls._dict_from_json_file(resolved_config_file)
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/transformers/configuration_utils.py", line 750, in _dict_from_json_file
    text = reader.read()
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe8 in position 0: invalid continuation byte

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/eric/git/axolotl/scripts/finetune.py", line 246, in <module>
    fire.Fire(train)
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/fire/core.py", line 475, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/home/eric/git/axolotl/scripts/finetune.py", line 178, in train
    model, tokenizer, lora_config = load_model(
  File "/home/eric/git/axolotl/src/axolotl/utils/models.py", line 136, in load_model
    model = AutoModelForCausalLM.from_pretrained(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/transformers/models/auto/auto_factory.py", line 445, in from_pretrained
    config, kwargs = AutoConfig.from_pretrained(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/transformers/models/auto/configuration_auto.py", line 922, in from_pretrained
    config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/transformers/configuration_utils.py", line 574, in get_config_dict
    config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/transformers/configuration_utils.py", line 662, in _get_config_dict
    raise EnvironmentError(
OSError: It looks like the config file at '../alpaca_lora_4bit/llama-30b-4bit-128g.safetensors' is not a valid JSON file.
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2878 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2879 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 2881 closing signal SIGTERM
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 2 (pid: 2880) of binary: /home/eric/miniconda3/envs/axolotl2/bin/python
Traceback (most recent call last):
  File "/home/eric/miniconda3/envs/axolotl2/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/commands/accelerate_cli.py", line 45, in main
    args.func(args)
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/commands/launch.py", line 914, in launch_command
    multi_gpu_launcher(args)
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/accelerate/commands/launch.py", line 603, in multi_gpu_launcher
    distrib_run.run(args)
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/torch/distributed/run.py", line 785, in run
    elastic_launch(
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 134, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/home/eric/miniconda3/envs/axolotl2/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
scripts/finetune.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2023-04-24_23:18:00
  host      : mlc-win.
  rank      : 2 (local_rank: 2)
  exitcode  : 1 (pid: 2880)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================

Save `adapter_bin` using callbacks if `lora`

Proposal

It would be good to also save lora during each checkpoint

Solution

We can also save the lora using callbacks. I saw code for callback, however, we can slightly modify it to not delete pytorch_model.bin, so that we can resume training.

We can check if adapter: lora then add the callback.

Happy to PR this.

Edit: Discussion at huggingface/peft#353 (comment)

GPTQ vs QLoRA

GPTQ and QLoRA are mutually exclusive when it comes to the PEFT dependency. see https://github.com/winglian/alpaca_lora_4bit/blob/main/requirements.txt#L9 vs QLoRA basically needing main. It's probably worth removing the [int4] part of the install from the docker container, and simply doing a basic install. We'll also need to update the docs for those people who want to use GPTQ that they will need to pip uninstall peft and pip install .[int4]. Also, the caveat for them is they need to uninstall peft again if they want to switch back to qlora.

add bitsandbytes build with cuda library in base docker image

from my qlora notes:

cd bitsandbytes
CUDA_VERSION=118 make cuda11x
pip uninstall bitsandbytes
python setup.py install
pip install scipy
pip uninstall transformers
pip install "transformers @ git+https://github.com/huggingface/transformers.git
pip install bert-score==0.3.13 evaluate==0.4.0 rouge-score==0.1.2 scikit-learn==1.2.2 sentencepiece==0.1.99 wandb==0.15.2

should update requirements.txt too.

[Feature] Replace `cfg.load_4bit` with `cfg.gptq`

Proposal: Change naming to reduce confusing with load_in_4bit which is used for qlora.

Breaking change: Yes

  • Replace all instances
  • Add assert in validation config to assert not cfg.load_4bit, "cfg.load_4bit has been deprecated. Please change to cfg.gptq"

RuntimeError: Loading a quantized checkpoint into non-quantized Linear8bitLt is not supported. Please call module.cuda() before module.load_state_dict()

I get the below at the end of training. I suspect it's due to loading 8 bit and https://github.com/winglian/axolotl/blob/47ad3890bc35985b9046f403312887035e19f96f/src/axolotl/utils/trainer.py#L99

Stack trace

File "/workspace/scripts/finetune.py", line 246, in <module> 
    fire.Fire(train) 
  File "/usr/local/lib/python3.9/dist-packages/fire/core.py", line 141, in Fire 
    component_trace = _Fire(component, args, parsed_flag_args, context, name) 
  File "/usr/local/lib/python3.9/dist-packages/fire/core.py", line 475, in _Fire 
    component, remaining_args = _CallAndUpdateTrace( 
  File "/usr/local/lib/python3.9/dist-packages/fire/core.py", line 691, in _CallAndUpdateTrace 
    component = fn(*varargs, **kwargs) 
  File "/workspace/scripts/finetune.py", line 235, in train 
    trainer.train(resume_from_checkpoint=resume_from_checkpoint) 
  File "/usr/local/lib/python3.9/dist-packages/transformers/trainer.py", line 1664, in train 
    return inner_training_loop( 
  File "/usr/local/lib/python3.9/dist-packages/transformers/trainer.py", line 2054, in _inner_training_loop 
    self._load_best_model() 
  File "/usr/local/lib/python3.9/dist-packages/transformers/trainer.py", line 2230, in _load_best_model 
    load_result = model.load_state_dict(state_dict, False) 
  File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 2027, in load_state_dict 
    load(self, state_dict) 
  File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 2015, in load 
    load(child, child_state_dict, child_prefix) 
  File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 2015, in load 
    load(child, child_state_dict, child_prefix) 
  File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 2015, in load 
    load(child, child_state_dict, child_prefix) 
  [Previous line repeated 4 more times] 
  File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 2009, in load 
    module._load_from_state_dict( 
  File "/usr/local/lib/python3.9/dist-packages/bitsandbytes/nn/modules.py", line 298, in _load_from_state_dict 
    raise RuntimeError("Loading a quantized checkpoint into non-quantized Linear8bitLt is " 
RuntimeError: Loading a quantized checkpoint into non-quantized Linear8bitLt is not supported. Please call module.cuda() before module.load_state_dict()

Info

Commit: Before dev merge winglian/axolotl@cb9a887

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.