Giter Club home page Giter Club logo

Comments (10)

345308394 avatar 345308394 commented on September 23, 2024

I also encountered the same problem. Have you solved it? Or can we discuss it?

Because it seems that the function which calculates the eigenvalues only returns one eigenvalue for the whole model.

from pyhessian.

Phuoc-Hoan-Le avatar Phuoc-Hoan-Le commented on September 23, 2024

I also encountered the same problem. Have you solved it? Or can we discuss it?

Because it seems that the function which calculates the eigenvalues only returns one eigenvalue for the whole model.

we can discuss it

from pyhessian.

345308394 avatar 345308394 commented on September 23, 2024

I also encountered the same problem. Have you solved it? Or can we discuss it?

Because it seems that the function which calculates the eigenvalues only returns one eigenvalue for the whole model.

we can discuss it

To calculate the maximum eigenvalue of the second derivative of the weights, first calculate the parameters and first partial derivative of the weights. This function(get_params_grad(model)) is to get all the weights and the corresponding first partial derivatives. Therefore, my method is to change this function, return the weight of each block and the corresponding first partial derivative, and then calculate the maximum eigenvalue of the corresponding second derivative.

from pyhessian.

Phuoc-Hoan-Le avatar Phuoc-Hoan-Le commented on September 23, 2024

Wow, that sounds like a complicated solution.

The way I solved it is, if you look at the eigenvalues() function, you will see that the final eigenvalue is just one value because if you look at the group_product() function, they return the sum of the whole list, rather than returning a list. This makes me think they did already calculated the eigenvalue for each weight matrix already, but instead just choose to sum up the eigenvalues to calculate the eigenvalue for the whole model. Also, in eigenvalues() function, the eigenvector is returned as a list of list of vectors where each element of the outer list corresponds to a list of the n-th eigenvector for each weight matrix, so the first element of the outer list corresponds to the 1st eigenvector for each weight matrix and so on.

Note, that when you are modifying the group_product() function. normalization() will be affected so gotta change that to make it work or maybe introduce a new function. I didn't have to change the function(get_params_grad(model))

from pyhessian.

345308394 avatar 345308394 commented on September 23, 2024

Wow, that sounds like a complicated solution.

The way I solved it is, if you look at the eigenvalues() function, you will see that the final eigenvalue is just one value because if you look at the group_product() function, they return the sum of the whole list, rather than returning a list. This makes me think they did already calculated the eigenvalue for each weight matrix already, but instead just choose to sum up the eigenvalues to calculate the eigenvalue for the whole model. Also, in eigenvalues() function, the eigenvector is returned as a list of list of vectors where each element of the outer list corresponds to a list of the n-th eigenvector for each weight matrix, so the first element of the outer list corresponds to the 1st eigenvector for each weight matrix and so on.

Note, that when you are modifying the group_product() function. normalization() will be affected so gotta change that to make it work or maybe introduce a new function. I didn't have to change the function(get_params_grad(model))

Can we exchange our calculations?

from pyhessian.

huanmei9 avatar huanmei9 commented on September 23, 2024

I also encountered the same question, have you solved it? @345308394 @CharlesLeeeee

from pyhessian.

xchuwenbo avatar xchuwenbo commented on September 23, 2024

Hi! I also find this question important!

I just want to see the layer-wise eigenvalues of a specific model.

from pyhessian.

katayoun-cadence avatar katayoun-cadence commented on September 23, 2024

@345308394 @CharlesLee did your code solve this issue? Can we exchange the calculation?

from pyhessian.

BiaoFangAIA avatar BiaoFangAIA commented on September 23, 2024

i know how to calculate each layer hessian trace:
get_trace(self,maxIter=100, tol=1e-3):
"""

    """
    device=self.device
    traces_vhv=[] #返回all layer 值
    seed=1
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    for (i_grad, i_param, (module_name, _)) in zip(self.gradsH, self.params, self.model.named_modules()):
        v=[
        torch.randint_like(i_param, high=2, device=device)
        ]
        for v_i in v:
            v_i[v_i==0] = -1
        i_v=v
        trace_vhv=[]
        trace=0.
        trace_pair={"layer_name":" ", "trace":0}
        self.model.zero_grad()
        for i in range(maxIter):
            if len(i_grad.shape)>1:
                hv=hessian_vector_product(i_grad, i_param, i_v)
                trace_vhv_cur=group_product(hv,v).cpu().item()
                # print("trace_vhv_cur:", trace_vhv_cur)
                trace_vhv.append(trace_vhv_cur)
                # print("error:", abs(np.mean(trace_vhv)-trace)/(abs(trace)+1e-6))
                if abs(np.mean(trace_vhv)-trace)/(abs(trace)+1e-6)<tol:
                    avg_trace_vhv=np.mean(trace_vhv)
                    # print("model_name:",module_name,"trace_probe:", avg_trace_vhv)
                    trace_pair["layer_name"]=module_name
                    trace_pair["trace"]=avg_trace_vhv
                    # traces_vhv.append((name[0], avg_trace_vhv))
                    traces_vhv.append(trace_pair)
                    break
                else:
                    trace=np.mean(trace_vhv)
        # i_v=i_v+1
    return traces_vhv

from pyhessian.

EdenBelouadah avatar EdenBelouadah commented on September 23, 2024

@BiaoFangAIA y

i know how to calculate each layer hessian trace: get_trace(self,maxIter=100, tol=1e-3): """

    """
    device=self.device
    traces_vhv=[] #返回all layer 值
    seed=1
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    for (i_grad, i_param, (module_name, _)) in zip(self.gradsH, self.params, self.model.named_modules()):
        v=[
        torch.randint_like(i_param, high=2, device=device)
        ]
        for v_i in v:
            v_i[v_i==0] = -1
        i_v=v
        trace_vhv=[]
        trace=0.
        trace_pair={"layer_name":" ", "trace":0}
        self.model.zero_grad()
        for i in range(maxIter):
            if len(i_grad.shape)>1:
                hv=hessian_vector_product(i_grad, i_param, i_v)
                trace_vhv_cur=group_product(hv,v).cpu().item()
                # print("trace_vhv_cur:", trace_vhv_cur)
                trace_vhv.append(trace_vhv_cur)
                # print("error:", abs(np.mean(trace_vhv)-trace)/(abs(trace)+1e-6))
                if abs(np.mean(trace_vhv)-trace)/(abs(trace)+1e-6)<tol:
                    avg_trace_vhv=np.mean(trace_vhv)
                    # print("model_name:",module_name,"trace_probe:", avg_trace_vhv)
                    trace_pair["layer_name"]=module_name
                    trace_pair["trace"]=avg_trace_vhv
                    # traces_vhv.append((name[0], avg_trace_vhv))
                    traces_vhv.append(trace_pair)
                    break
                else:
                    trace=np.mean(trace_vhv)
        # i_v=i_v+1
    return traces_vhv

did you solve the issue?

the proposed solution does not work, my ViT model contains 75 layers each containing weight and bias layers, your code returns a list with 52 traces only

thanks

from pyhessian.

Related Issues (18)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.