Giter Club home page Giter Club logo

nvlabs / dora Goto Github PK

View Code? Open in Web Editor NEW
235.0 10.0 13.0 3.06 MB

[ICML2024] Official PyTorch implementation of DoRA: Weight-Decomposed Low-Rank Adaptation

Home Page: https://arxiv.org/abs/2402.09353

License: Other

Shell 2.10% Python 88.57% Jupyter Notebook 8.76% Makefile 0.01% HTML 0.22% JavaScript 0.28% CSS 0.05%
commonsense-reasoning deep-learning deep-neural-networks instruction-tuning large-language-models large-vision-language-models lora parameter-efficient-fine-tuning parameter-efficient-tuning vision-and-language

dora's Introduction

[ICML2024] DoRA: Weight-Decomposed Low-Rank Adaptation

The Official PyTorch implementation of [ICML2024] DoRA: Weight-Decomposed Low-Rank Adaptation.

Star on GitHub

Shih-Yang Liu, Chien-Yi Wang, Hongxu Yin, Pavlo Molchanov, Yu-Chiang Frank Wang, Kwang-Ting Cheng, Min-Hung Chen

[Paper] [Website] [BibTeX]

DoRA decomposes the pre-trained weight into two components, magnitude and direction, for fine-tuning, specifically employing LoRA for directional updates to minimize the number of trainable parameters efficiently. By employing DoRA, we enhance both the learning capacity and training stability of LoRA while avoiding any additional inference overhead. DoRA consistently outperforms LoRA on fine-tuning LLaMA, LLaVA, and VL-BART on various downstream tasks, such as commonsense reasoning, visual instruction tuning, and image/video-text understanding.

For business inquiries, please visit our website and submit the form: NVIDIA Research Licensing.

๐Ÿ’ฅ News ๐Ÿ’ฅ

  • [05.02.2024] ๐Ÿ”ฅ๐Ÿ”ฅ DoRA is accepted to ICML 2024!! See you in Vienna!!
  • [04.27.2024] ๐Ÿ”ฅ๐Ÿ”ฅ We have added the source code and the DoRA weight for finetuning LLaMA2-7B and LLaMA3-8B on commonsense reasoning tasks!
  • [04.22.2024] ๐Ÿ”ฅ๐Ÿ”ฅ Check out an awesome blog post FSDP/QDoRA from Answer.ai which shows that QDoRA significantly outperforms QLoRA and even edges out full finetuning!
  • [04.18.2024] ๐Ÿ”ฅ๐Ÿ”ฅ We have released the source code and the DoRA weight for reproducing the results in our paper!
  • [03.20.2024] ๐Ÿ”ฅ๐Ÿ”ฅ DoRA is now fully supported by the HuggingFace PEFT package and can now support Linear, Conv1d, and Conv2d layers, as well as linear layers quantized with bitsandbytes!

Useful Links

Quick Start and some tricks regarding finetuning with DoRA

HuggingFace PEFT

DoRA is now supported by the Huggingface PEFT package. You can install the PEFT package using

pip install git+https://github.com/huggingface/peft.git -q

After PEFT is installed, you can simply set the use_dora argument of LoraConfig() to True for applying DoRA.

An example could be as follows:

from peft import LoraConfig

# Initialize DoRA configuration
config = (
    use_dora=True, ...
)

Please refer to the official documentation for more details.

HuggingFace Diffusers

You can also toy with DoRA on finetuning diffusion models. See huggingface/diffusers. Another good tutorial would be this Colab notebook from Linoy Tsaban.

In general, DoRA finetuning on diffusion models is still experimental and is likely to require different hyperparameter values to perform best compared to LoRA.

Specifically, people have noticed 2 differences to take into account in your training:

  1. LoRA seem to converge faster than DoRA (so a set of parameters that may lead to overfitting when training a LoRA may be working well for a DoRA)
  2. DoRA quality superior to LoRA especially in lower ranks: The difference in quality of DoRA of rank 8 and LoRA of rank 8 appears to be more significant than when training ranks of 32 or 64 for example.

Some DoRA vs. LoRA diffusion finetuning results

  • Example From Linoy Tsaban(Images generated by DoRA are on the left and LoRA on the right):

DoRA hyperparameters settings

Note

๐Ÿ’ก While fine-tuning with DoRA, utilizing the configuration of LoRA can already achieve better results most of the time, achieving optimal performance compared to LoRA still requires adjustments to the hyperparameters.

We suggest starting with a slightly lower learning rate than that of LoRA, and users may also experiment with varying LoRA dropout ratios.

User may also start with half of the rank of the LoRA configuration which oftentimes can already result in comparable or even superior accuracy compared to that of LoRA.

Reproducing the results in the paper

This repo contains four directories:

./commonsense_reasoning contains the code to finetune LLaMA-7B/13B using DoRA on the commonsense reasoning tasks. This directory is modified based on LLM-Adapter.

./instruction_tuning contains the code to finetune LLaMA-7B and LLaMA2-7B using DoRA and DVoRA (DoRA+VeRA) with the cleaned Alpaca instruction tuning dataset. This directory is modified based on VeRA.

./image_video_text_understanding contains the code to finetune VL-BART using DoRA for the image/video-text understanding tasks. This directory is modified based on VL-Adapter.

./visual_instruction_tuning contains the code to finetune LLaVA-1.5-7B on the visual instruction tuning tasks with DoRA. This directory is modified based on LLaVA.

DoRA vs LoRA on the commonsense reasoning tasks

Model r BoolQ PIQA SIQA HellaS WinoG ARC-e ARC-c OBQA Average
LLaMA-7B-LoRA 32 67.5 80.8 78.2 83.4 80.4 78.0 62.6 79.1 76.3
LLaMA-7B-DoRA(ours) 16 70.0 82.6 79.7 83.2 80.6 80.6 65.4 77.6 77.5
LLaMA-7B-DoRA(ours) 32 69.7 83.4 78.6 87.2 81.0 81.9 66.2 79.2 78.4
LLaMA2-7B-LoRA 32 69.8 79.9 79.5 83.6 82.6 79.8 64.7 81.0 77.6
LLaMA2-7B-DoRA(ours) 16 72.0 83.1 79.9 89.1 83.0 84.5 71.0 81.2 80.5
LLaMA2-7B-DoRA(ours) 32 71.8 83.7 76.0 89.1 82.6 83.7 68.2 82.4 79.7
LLaMA3-8B-LoRA 32 70.8 85.2 79.9 91.7 84.3 84.2 71.2 79.0 80.8
LLaMA3-8B-DoRA(ours) 16 74.5 88.8 80.3 95.5 84.7 90.1 79.1 87.2 85.0
LLaMA3-8B-DoRA(ours) 32 74.6 89.3 79.9 95.5 85.6 90.5 80.4 85.8 85.2

Star History

Star History Chart

Contact

Shih-Yang Liu: [email protected] or [email protected]

Citation

If you find DoRA useful, please consider giving a star and citation:

@article{liu2024dora,
  title={DoRA: Weight-Decomposed Low-Rank Adaptation},
  author={Liu, Shih-Yang and Wang, Chien-Yi and Yin, Hongxu and Molchanov, Pavlo and Wang, Yu-Chiang Frank and Cheng, Kwang-Ting and Chen, Min-Hung},
  journal={arXiv preprint arXiv:2402.09353},
  year={2024}
}

Licenses

Copyright ยฉ 2024, NVIDIA Corporation. All rights reserved.

This work is made available under the NVIDIA Source Code License-NC. Click here to view a copy of this license.

dora's People

Contributors

cmhungsteve avatar nbasyl avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dora's Issues

OOM when use torch.linalg.norm

Hi,

It seems torch.linalg.norm will consume more GPU memory in the forward pass. I will encounter an OOM error when using Dora. My implementation for the forward pass is:

adapted = self.weight + torch.matmul(self.lora_b, self.lora_a) / self.rank
column_norm = adapted.norm(p=2, dim=0, keepdim=True)

norm_adapted = adapted / column_norm
calc_weights = self.magnitue * norm_adapted
return F.linear(x, calc_weights, self.bias)

RuntimeError: Internal: src/sentencepiece_processor.cc(1101) [model_proto->ParseFromArray(serialized.data(), serialized.size())]

Dear author, thank you for your outstanding contribution! We now want to reproduce the results of fine-tuning llama3-8B with dora.

We first downloaded the pre-trained weights of llama3-8B from huggingface and tried the initial weights provided by hf and original meta. Replace the base_model parameter with the downloaded weight path. But we always get this error.

The error shows that loading tokenizer failed. This seems to be caused by a failure to parse the model file tokenizer.model file. We sincerely look forward to your reply!

image

About the training seed

Hi,

Thanks for your great work.

May I ask whether you try to use different seed to fine-tune on commonsense task.

I find the test accuracy is not very stable if we do not fix the random seed.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.