Comments (10)
I also encountered the same problem. Have you solved it? Or can we discuss it?
Because it seems that the function which calculates the eigenvalues only returns one eigenvalue for the whole model.
from pyhessian.
I also encountered the same problem. Have you solved it? Or can we discuss it?
Because it seems that the function which calculates the eigenvalues only returns one eigenvalue for the whole model.
we can discuss it
from pyhessian.
I also encountered the same problem. Have you solved it? Or can we discuss it?
Because it seems that the function which calculates the eigenvalues only returns one eigenvalue for the whole model.
we can discuss it
To calculate the maximum eigenvalue of the second derivative of the weights, first calculate the parameters and first partial derivative of the weights. This function(get_params_grad(model)) is to get all the weights and the corresponding first partial derivatives. Therefore, my method is to change this function, return the weight of each block and the corresponding first partial derivative, and then calculate the maximum eigenvalue of the corresponding second derivative.
from pyhessian.
Wow, that sounds like a complicated solution.
The way I solved it is, if you look at the eigenvalues() function, you will see that the final eigenvalue is just one value because if you look at the group_product() function, they return the sum of the whole list, rather than returning a list. This makes me think they did already calculated the eigenvalue for each weight matrix already, but instead just choose to sum up the eigenvalues to calculate the eigenvalue for the whole model. Also, in eigenvalues() function, the eigenvector is returned as a list of list of vectors where each element of the outer list corresponds to a list of the n-th eigenvector for each weight matrix, so the first element of the outer list corresponds to the 1st eigenvector for each weight matrix and so on.
Note, that when you are modifying the group_product() function. normalization() will be affected so gotta change that to make it work or maybe introduce a new function. I didn't have to change the function(get_params_grad(model))
from pyhessian.
Wow, that sounds like a complicated solution.
The way I solved it is, if you look at the eigenvalues() function, you will see that the final eigenvalue is just one value because if you look at the group_product() function, they return the sum of the whole list, rather than returning a list. This makes me think they did already calculated the eigenvalue for each weight matrix already, but instead just choose to sum up the eigenvalues to calculate the eigenvalue for the whole model. Also, in eigenvalues() function, the eigenvector is returned as a list of list of vectors where each element of the outer list corresponds to a list of the n-th eigenvector for each weight matrix, so the first element of the outer list corresponds to the 1st eigenvector for each weight matrix and so on.
Note, that when you are modifying the group_product() function. normalization() will be affected so gotta change that to make it work or maybe introduce a new function. I didn't have to change the function(get_params_grad(model))
Can we exchange our calculations?
from pyhessian.
I also encountered the same question, have you solved it? @345308394 @CharlesLeeeee
from pyhessian.
Hi! I also find this question important!
I just want to see the layer-wise eigenvalues of a specific model.
from pyhessian.
@345308394 @CharlesLee did your code solve this issue? Can we exchange the calculation?
from pyhessian.
i know how to calculate each layer hessian trace:
get_trace(self,maxIter=100, tol=1e-3):
"""
"""
device=self.device
traces_vhv=[] #返回all layer 值
seed=1
torch.manual_seed(seed)
torch.cuda.manual_seed(seed)
for (i_grad, i_param, (module_name, _)) in zip(self.gradsH, self.params, self.model.named_modules()):
v=[
torch.randint_like(i_param, high=2, device=device)
]
for v_i in v:
v_i[v_i==0] = -1
i_v=v
trace_vhv=[]
trace=0.
trace_pair={"layer_name":" ", "trace":0}
self.model.zero_grad()
for i in range(maxIter):
if len(i_grad.shape)>1:
hv=hessian_vector_product(i_grad, i_param, i_v)
trace_vhv_cur=group_product(hv,v).cpu().item()
# print("trace_vhv_cur:", trace_vhv_cur)
trace_vhv.append(trace_vhv_cur)
# print("error:", abs(np.mean(trace_vhv)-trace)/(abs(trace)+1e-6))
if abs(np.mean(trace_vhv)-trace)/(abs(trace)+1e-6)<tol:
avg_trace_vhv=np.mean(trace_vhv)
# print("model_name:",module_name,"trace_probe:", avg_trace_vhv)
trace_pair["layer_name"]=module_name
trace_pair["trace"]=avg_trace_vhv
# traces_vhv.append((name[0], avg_trace_vhv))
traces_vhv.append(trace_pair)
break
else:
trace=np.mean(trace_vhv)
# i_v=i_v+1
return traces_vhv
from pyhessian.
i know how to calculate each layer hessian trace: get_trace(self,maxIter=100, tol=1e-3): """
""" device=self.device traces_vhv=[] #返回all layer 值 seed=1 torch.manual_seed(seed) torch.cuda.manual_seed(seed) for (i_grad, i_param, (module_name, _)) in zip(self.gradsH, self.params, self.model.named_modules()): v=[ torch.randint_like(i_param, high=2, device=device) ] for v_i in v: v_i[v_i==0] = -1 i_v=v trace_vhv=[] trace=0. trace_pair={"layer_name":" ", "trace":0} self.model.zero_grad() for i in range(maxIter): if len(i_grad.shape)>1: hv=hessian_vector_product(i_grad, i_param, i_v) trace_vhv_cur=group_product(hv,v).cpu().item() # print("trace_vhv_cur:", trace_vhv_cur) trace_vhv.append(trace_vhv_cur) # print("error:", abs(np.mean(trace_vhv)-trace)/(abs(trace)+1e-6)) if abs(np.mean(trace_vhv)-trace)/(abs(trace)+1e-6)<tol: avg_trace_vhv=np.mean(trace_vhv) # print("model_name:",module_name,"trace_probe:", avg_trace_vhv) trace_pair["layer_name"]=module_name trace_pair["trace"]=avg_trace_vhv # traces_vhv.append((name[0], avg_trace_vhv)) traces_vhv.append(trace_pair) break else: trace=np.mean(trace_vhv) # i_v=i_v+1 return traces_vhv
did you solve the issue?
the proposed solution does not work, my ViT model contains 75 layers each containing weight and bias layers, your code returns a list with 52 traces only
thanks
from pyhessian.
Related Issues (18)
- Hi, a question of inconsistency of the dymamics of tr(FIM) and tr(H).
- RuntimeError: element 0 of variables does not require grad and does not have a grad_fn HOT 2
- Where is the hessian trace calculation w.r.t activations? HOT 1
- Computing sum of square of diagonal entries of Hessian
- edits for PINN/custom loss functions
- How to calculate average Hessian trace?
- torch.eig deprecated please use torch.linalg.eig for line 261 of hessian.py HOT 2
- Potential bugs
- No module named 'pyhessian.client'
- deprecation of torch.eig method
- Question about backward() with create_graph=True warning
- A question on the computation of Hessian-vector product HOT 2
- Large variance of results HOT 1
- computational (time) cost HOT 3
- RuntimeError: derivative for grid_sampler_2d_backward is not implemented HOT 1
- Unexpected `shape` issue in Hessian-Vector computation HOT 4
- Hi! Can Pyhessian Perform stage-wise Hessian trace?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pyhessian.