cccntu / minlora Goto Github PK

minLoRA: a minimal PyTorch library that allows you to apply LoRA to any PyTorch model.

License: MIT License

Jupyter Notebook 68.41% Python 31.59%

fastai huggingface pytorch pytorch-implementation pytorch-lightning

minlora's Introduction

minLoRA

A minimal, but versatile PyTorch re-implementation of LoRA. In only ~100 lines of code, minLoRA supports the following features:

Features

Functional, no need to modify the model definition
Works everywhere, as long as you use torch.nn.Module
PyTorch native, uses PyTorch's torch.nn.utils.parametrize to do all the heavy lifting
Easily extendable, you can add your own LoRA parameterization
Supports training, inference, and inference with multiple LoRA models

Demo

demo.ipynb shows the basic usage of the library
advanced_usage.ipynb shows how you can add LoRA to other layers such as embedding, and how to tie weights

Examples

Finetuning GPT using LoRA + nanoGPT: https://github.com/cccntu/LoRAnanoGPT/pull/1/files

Library Installation

If you want to import minlora into your project:

git clone https://github.com/cccntu/minLoRA.git
cd minLoRA
pip install -e .

Usage

import torch
from minlora import add_lora, apply_to_lora, disable_lora, enable_lora, get_lora_params, merge_lora, name_is_lora, remove_lora, load_multiple_lora, select_lora

Training a model with minLoRA

model = torch.nn.Linear(in_features=5, out_features=3)
# Step 1: Add LoRA to the model
add_lora(model)

# Step 2: Collect the parameters, pass them to the optimizer

parameters = [
    {"params": list(get_lora_params(model))},
]
optimizer = torch.optim.AdamW(parameters, lr=1e-3)

# Step 3: Train the model
# ...

# Step 4: export the LoRA parameters
lora_state_dict = get_lora_state_dict(model)

Loading and Inferencing with minLoRA

# Step 1: Add LoRA to your model
add_lora(model)

# Step 2: Load the LoRA parameters
_ = model.load_state_dict(lora_state_dict, strict=False)

# Step 3: Merge the LoRA parameters into the model
merge_lora(model)

Inferencing with multiple LoRA models

# to avoid re-adding lora to the model when rerun the cell, remove lora first
remove_lora(model)
# Step 1: Add LoRA to your model
add_lora(model)

# Step 2: Load the LoRA parameters

# load three sets of LoRA parameters
lora_state_dicts = [lora_state_dict_0, lora_state_dict_1, lora_state_dict_2]

load_multiple_lora(model, lora_state_dicts)


# Step 3: Select which LoRA to use at inference time
Y0 = select_lora(model, 0)(x)
Y1 = select_lora(model, 1)(x)
Y2 = select_lora(model, 2)(x)

References

microsoft/LoRA has the official implementation of LoRA, in PyTorch
karpathy/minGPT the structure of the repo is adapted from minGPT

TODO

A notebook to show how to configure LoRA parameters
Real training & inference examples

minlora's People

Contributors

Stargazers

Watchers

minlora's Issues

How to use this with nanoGPT

Hey there,

This looks like a cool project! How do we use this for nanoGPT? :)

对blip2模型的加速研究有没有什么进展和思路

您好，请问您对blip2模型的加速研究有没有什么进展和思路，是否可以交流一下呢，万分感谢

Specify Layers By Name, not Type?

Is it possible to specify layers in the lora_config by name rather than type?

For instance suppose I only wanted to apply lora to all layers with the name qkv rather than all nn.Linear layers, how would I do that?

Understanding the forward operation

I noticed that you apply the mul operation in LoraA and LoraB, then, you sum the result with the input.

I think the result of multiplying LoraA and LoraB has to be summed to the original weights, or I am wrong?

Could you also explain the scaling factor?

Thanks.

Freeze manually

Hi, thank you for your great work.

I want to use yours for my experiment.

I wonder get_lora_params() would load parameters to optimizer, but if the model itself can compute gradient, wouldn't the model still compute gradient?

Would be freezing the model enough for using minlora without the get_lora_params?

Also, when merging lora to the model to have another lora module, should I have to set lora_A and lora_B requires_grad=False before merging?

Thank you.

import torch
import timm
from torch import nn
from minlora import add_lora, get_lora_params, get_lora_state_dict


model_timm = timm.create_model("vit_large_patch14_clip_336.openai", pretrained=True, num_classes=0, global_pool='avg')
add_lora(model_timm)
model_timm = nn.DataParallel(model_timm, device_ids=[0,1]).cuda()

with torch.no_grad():
    asdf = model_timm(torch.randn(2, 3, 336, 336).cuda())

  File "/home/anaconda3/envs/face/lib/python3.8/site-packages/minlora/model.py", line 39, in lora_forward
    return X + torch.mm(*self.swap((self.lora_B, self.dropout_fn(self.lora_A)))).view(X.shape) * self.scaling
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0!