Giter Club home page Giter Club logo

qa-lora-transfer's Introduction

QA-LoRA

QA-LoRA has been accepted by ICLR 2024!

This repository provides the official PyTorch implementation of QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models.

QA-LoRA is easily implemented with a few lines of code, and it equips the original LoRA with two-fold abilities: (i) during fine-tuning, the LLM's weights are quantized (e.g., into INT4) to reduce time and memory usage; (ii) after fine-tuning, the LLM and auxiliary weights are naturally integrated into a quantized model without loss of accuracy.

Todo list

Fix the conflict with the newest Auto-gptq version.

Installation

conda create -n qalora python=3.8
conda activate qalora
conda install pytorch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 pytorch-cuda=12.1 -c pytorch -c nvidia
git clone -b v0.3.0 https://github.com/PanQiWei/AutoGPTQ.git && cd AutoGPTQ
pip install .
cd ..
pip install bitsandbytes
pip install -r requirements.txt
pip install protobuf==3.20.*

Change the peft_utils.py in your own auto-gptq path(python path/auto_gptq/utils/peft_utils.py) with the new one. For the users of GPTQLORA, you only need to change the peft_utils.py file.

Quantization

We use GPTQ for quantization. bits=4, group-size=32, act-order=False If you change the group-size, you need to change the group_size in peft_utils.py and merge.py accordingly.

Training

python qalora.py --model_path <path>

The file structure of the model checkpoint is as follows:

config.json             llama7b-4bit-32g.bin  special_tokens_map.json  tokenizer_config.json
generation_config.json  quantize_config.json      tokenizer.model

Merge

Note that our trained LoRA modules can be perfectly merged into the quantized model. We offer a simple merged script in this repo.

Notice

About the implementations

There are two kinds of implementations of the dimention reduction(x from D_in to D_in//L). Both are mathematical equivalent.

The first one(this repo)

Adopt avgpooling operation. But the weights of adapters will be divided by D_in//L during merge(refer to merge.py).

adapter_result = (lora_B(lora_A(lora_dropout(self.qa_pool(x)))) * scale).type_as(result)
model[tmp_key+'.qzeros'] -= (lora['base_model.model.'+tmp_key+'.lora_B.weight'] @ lora['base_model.model.'+tmp_key+'.lora_A.weight']).t() * scale / group_size / model[tmp_key+'.scales']

The second one

Utilize sum operation. The adapters do not need to be divided during merge)

adapter_result = (lora_B(lora_A(lora_dropout(self.qa_pool(x) * group_size))) * scale).type_as(result)
model[tmp_key+'.qzeros'] -= (lora['base_model.model.'+tmp_key+'.lora_B.weight'] @ lora['base_model.model.'+tmp_key+'.lora_A.weight']).t() * scale / model[tmp_key+'.scales']

About the quantization

Some GPTQ implementation such as GPTQ-for-llama further compress the zeros into qzeros. You need to decode the qzeros first and restore fp16 format zeros.

Acknowledgements

Our code is based on QLoRA, GPTQLORA, Auto-GPTQ

qa-lora-transfer's People

Contributors

swiftieh avatar eltociear avatar sparverius avatar trapper4888 avatar xxw11 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.