How do we compute eigenvalues of Hessian matrices for each weight matrices or for each module in a model rather than calculating the eigenvalue for the whole model?? about pyhessian HOT 10 OPEN

amirgholami commented on September 23, 2024

How do we compute eigenvalues of Hessian matrices for each weight matrices or for each module in a model rather than calculating the eigenvalue for the whole model??

from pyhessian.

Comments (10)

345308394 commented on September 23, 2024

I also encountered the same problem. Have you solved it? Or can we discuss it?

Because it seems that the function which calculates the eigenvalues only returns one eigenvalue for the whole model.

from pyhessian.

Phuoc-Hoan-Le commented on September 23, 2024

I also encountered the same problem. Have you solved it? Or can we discuss it?

Because it seems that the function which calculates the eigenvalues only returns one eigenvalue for the whole model.

we can discuss it

from pyhessian.

345308394 commented on September 23, 2024

I also encountered the same problem. Have you solved it? Or can we discuss it?

Because it seems that the function which calculates the eigenvalues only returns one eigenvalue for the whole model.

we can discuss it

To calculate the maximum eigenvalue of the second derivative of the weights, first calculate the parameters and first partial derivative of the weights. This function(get_params_grad(model)) is to get all the weights and the corresponding first partial derivatives. Therefore, my method is to change this function, return the weight of each block and the corresponding first partial derivative, and then calculate the maximum eigenvalue of the corresponding second derivative.

from pyhessian.

Phuoc-Hoan-Le commented on September 23, 2024

Wow, that sounds like a complicated solution.

The way I solved it is, if you look at the eigenvalues() function, you will see that the final eigenvalue is just one value because if you look at the group_product() function, they return the sum of the whole list, rather than returning a list. This makes me think they did already calculated the eigenvalue for each weight matrix already, but instead just choose to sum up the eigenvalues to calculate the eigenvalue for the whole model. Also, in eigenvalues() function, the eigenvector is returned as a list of list of vectors where each element of the outer list corresponds to a list of the n-th eigenvector for each weight matrix, so the first element of the outer list corresponds to the 1st eigenvector for each weight matrix and so on.

Note, that when you are modifying the group_product() function. normalization() will be affected so gotta change that to make it work or maybe introduce a new function. I didn't have to change the function(get_params_grad(model))

from pyhessian.

345308394 commented on September 23, 2024

Wow, that sounds like a complicated solution.

The way I solved it is, if you look at the eigenvalues() function, you will see that the final eigenvalue is just one value because if you look at the group_product() function, they return the sum of the whole list, rather than returning a list. This makes me think they did already calculated the eigenvalue for each weight matrix already, but instead just choose to sum up the eigenvalues to calculate the eigenvalue for the whole model. Also, in eigenvalues() function, the eigenvector is returned as a list of list of vectors where each element of the outer list corresponds to a list of the n-th eigenvector for each weight matrix, so the first element of the outer list corresponds to the 1st eigenvector for each weight matrix and so on.

Note, that when you are modifying the group_product() function. normalization() will be affected so gotta change that to make it work or maybe introduce a new function. I didn't have to change the function(get_params_grad(model))

Can we exchange our calculations?

from pyhessian.

huanmei9 commented on September 23, 2024

I also encountered the same question, have you solved it? @345308394 @CharlesLeeeee

from pyhessian.

xchuwenbo commented on September 23, 2024

Hi! I also find this question important!

I just want to see the layer-wise eigenvalues of a specific model.

from pyhessian.

katayoun-cadence commented on September 23, 2024

@345308394 @CharlesLee did your code solve this issue? Can we exchange the calculation?

from pyhessian.

BiaoFangAIA commented on September 23, 2024

i know how to calculate each layer hessian trace：
get_trace(self,maxIter=100, tol=1e-3):
"""

    """
    device=self.device
    traces_vhv=[] #返回all layer 值
    seed=1
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    for (i_grad, i_param, (module_name, _)) in zip(self.gradsH, self.params, self.model.named_modules()):
        v=[
        torch.randint_like(i_param, high=2, device=device)
        ]
        for v_i in v:
            v_i[v_i==0] = -1
        i_v=v
        trace_vhv=[]
        trace=0.
        trace_pair={"layer_name":" ", "trace":0}
        self.model.zero_grad()
        for i in range(maxIter):
            if len(i_grad.shape)>1:
                hv=hessian_vector_product(i_grad, i_param, i_v)
                trace_vhv_cur=group_product(hv,v).cpu().item()
                # print("trace_vhv_cur:", trace_vhv_cur)
                trace_vhv.append(trace_vhv_cur)
                # print("error:", abs(np.mean(trace_vhv)-trace)/(abs(trace)+1e-6))
                if abs(np.mean(trace_vhv)-trace)/(abs(trace)+1e-6)<tol:
                    avg_trace_vhv=np.mean(trace_vhv)
                    # print("model_name:",module_name,"trace_probe:", avg_trace_vhv)
                    trace_pair["layer_name"]=module_name
                    trace_pair["trace"]=avg_trace_vhv
                    # traces_vhv.append((name[0], avg_trace_vhv))
                    traces_vhv.append(trace_pair)
                    break
                else:
                    trace=np.mean(trace_vhv)
        # i_v=i_v+1
    return traces_vhv

from pyhessian.

EdenBelouadah commented on September 23, 2024

@BiaoFangAIA y

i know how to calculate each layer hessian trace： get_trace(self,maxIter=100, tol=1e-3): """

    """
    device=self.device
    traces_vhv=[] #返回all layer 值
    seed=1
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    for (i_grad, i_param, (module_name, _)) in zip(self.gradsH, self.params, self.model.named_modules()):
        v=[
        torch.randint_like(i_param, high=2, device=device)
        ]
        for v_i in v:
            v_i[v_i==0] = -1
        i_v=v
        trace_vhv=[]
        trace=0.
        trace_pair={"layer_name":" ", "trace":0}
        self.model.zero_grad()
        for i in range(maxIter):
            if len(i_grad.shape)>1:
                hv=hessian_vector_product(i_grad, i_param, i_v)
                trace_vhv_cur=group_product(hv,v).cpu().item()
                # print("trace_vhv_cur:", trace_vhv_cur)
                trace_vhv.append(trace_vhv_cur)
                # print("error:", abs(np.mean(trace_vhv)-trace)/(abs(trace)+1e-6))
                if abs(np.mean(trace_vhv)-trace)/(abs(trace)+1e-6)<tol:
                    avg_trace_vhv=np.mean(trace_vhv)
                    # print("model_name:",module_name,"trace_probe:", avg_trace_vhv)
                    trace_pair["layer_name"]=module_name
                    trace_pair["trace"]=avg_trace_vhv
                    # traces_vhv.append((name[0], avg_trace_vhv))
                    traces_vhv.append(trace_pair)
                    break
                else:
                    trace=np.mean(trace_vhv)
        # i_v=i_v+1
    return traces_vhv

did you solve the issue?

the proposed solution does not work, my ViT model contains 75 layers each containing weight and bias layers, your code returns a list with 52 traces only

thanks

from pyhessian.

How do we compute eigenvalues of Hessian matrices for each weight matrices or for each module in a model rather than calculating the eigenvalue for the whole model?? about pyhessian HOT 10 OPEN

Comments (10)

Related Issues (18)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent