Giter Club home page Giter Club logo

Comments (6)

rasbt avatar rasbt commented on May 27, 2024

Thanks for bringing that up! I think reset_parameters() will not make the weights 0 though but reinitialize them when I understand correctly. So I think this should be okay but I may be overlooking something. (Please correct me if I'm wrong or am missing the point).

So, only LoRA matrix B is zero, but LoRA matrix A should be initialized with small random weights, i.e.,

std_dev = 1 / torch.sqrt(torch.tensor(lora_r).float())
self.A = nn.Parameter(torch.randn(in_dim, rank) * std_dev)

However, I am just seeing that we used Kaiming He initialization for the linear layer with sqrt(5):

nn.init.kaiming_uniform_(self.lora_A, a=math.sqrt(5))

Maybe we should investigate some time if using the original initialization scaling based on the rank is actually better @awaelchli @carmocca ?

from litgpt.

LautaroEst avatar LautaroEst commented on May 27, 2024

Hi @rasbt! I think you are correct. reset_parameters() do not make the weights 0, it initializes the lora_A matrix with Kaiming He initizalization and the lora_B matrix with zeros. What I'm saying is that the reset_parameters() is called inside the __init__() method, and when the GPT module is constructed under the fabric.init_module() context manager, the reset_parameters() is not actually serving its purpose (both matrices end up initialized with zeros). In order to correctly intialize the matrices, I had to call reset_parameters() after calling __init__() (outside the context manager).

from litgpt.

awaelchli avatar awaelchli commented on May 27, 2024

@LautaroEst Which Fabric strategy are you using?

from litgpt.

LautaroEst avatar LautaroEst commented on May 27, 2024

@awaelchli I'm using just one gpu, so I'm initializing the fabric object with

fabric = L.Fabric(accelerator="gpu", strategy="auto", devices=1, num_nodes=1, precision="bf16-true")

from litgpt.

carmocca avatar carmocca commented on May 27, 2024

@rasbt We follow the same initialization as Microsoft's: https://github.com/microsoft/LoRA/blob/main/loralib/layers.py#L266-L271 which itself matches what you propose: https://github.com/pytorch/pytorch/blob/main/torch/nn/modules/linear.py#L106-L109

from litgpt.

carmocca avatar carmocca commented on May 27, 2024

I'm using just one gpu, so I'm initializing the fabric object with

In this case, empty_init=False is used: https://github.com/Lightning-AI/litgpt/blob/main/litgpt/finetune/lora.py#L170 so initialization should be happening normally

from litgpt.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.