amirgholami / pyhessian Goto Github PK
View Code? Open in Web Editor NEWPyHessian is a Pytorch library for second-order based analysis and training of Neural Networks
License: MIT License
PyHessian is a Pytorch library for second-order based analysis and training of Neural Networks
License: MIT License
How can I use the code for the case where I have a custom loss function, and not a torch.nn."LOSS"?
I'm specifically trying to reproduce the results in https://arxiv.org/abs/2109.01050 , and cannot get the PyHessian class to work.
Also, in the PINN case, network does not have a data=(inputs, targets), can I use the same code/class or I need further edits?
I'm thinking of changing the loss in line 92
PyHessian/pyhessian/hessian.py
Line 92 in 1a42737
wondering if that's all the changes I need
PS:
Also since I'm only interested in the eigenvectors, tried to exctract parameters and gradients, i.e.,
Line 61 in 1a42737
PyHessian/pyhessian/hessian.py
Line 111 in 1a42737
Though it's throwing some error at me, would appreciate any insights,
`Traceback (most recent call last):
File "/tmp/ipykernel_709492/339026100.py", line 1, in
hv = torch.autograd.grad(gradsH,
File "/home/xxx/.local/lib/python3.8/site-packages/torch/autograd/init.py", line 229, in grad
grad_outputs_ = make_grads(outputs, grad_outputs)
File "/home/xxx/.local/lib/python3.8/site-packages/torch/autograd/init.py", line 33, in _make_grads
if not out.shape == grad.shape:
AttributeError: 'float' object has no attribute 'shape'`
Because it seems that the function which calculates the eigenvalues only returns one eigenvalue for the whole model.
In hessian.py
When I tried to run the following code,
Hv = hessian_vector_product(gradsH, params, v)
The error comes out.
Is there any idea how to resolve this issue?
Hi!
Thank you making the source code of your work available. I tried to use the library for an application involving a 3D network architecture, and ran into the following issue:
********** Commencing Hessian Computation **********
Traceback (most recent call last):
File "hessian_analysis.py", line 181, in <module>
hessianObj.analyze(model_checkpoint_filepath)
File "/media/ee/DATA/Repositories/PyHessian/hessian_analysis.py", line 70, in analyze
top_eigenvalues, top_eigenvectors = hessian_comp.eigenvalues(top_n=self.top_n)
File "/media/ee/DATA/Repositories/PyHessian/pyhessian/hessian.py", line 167, in eigenvalues
Hv = hessian_vector_product(self.gradsH, self.params, v)
File "/media/ee/DATA/Repositories/PyHessian/pyhessian/utils.py", line 88, in hessian_vector_product
retain_graph=True)
File "/home/ee/anaconda3/envs/torch13/lib/python3.6/site-packages/torch/autograd/__init__.py", line 197, in grad
grad_outputs_ = _make_grads(outputs, grad_outputs_)
File "/home/ee/anaconda3/envs/torch13/lib/python3.6/site-packages/torch/autograd/__init__.py", line 32, in _make_grads
if not out.shape == grad.shape:
AttributeError: 'float' object has no attribute 'shape'
Interestingly, the issue does not occur at the first call to back-propagation via loss.backward()
, rather occurs at the call to torch.autograd.grad()
.
I believe that the float
object in question is the 0.
manually inserted when param.grad is None
in the following routine:
Lines 61 to 72 in c2e49d2
If I am right, it is even more mind-boggling that a type (I mistakenly mixed float
is able to pass the check for data-type in PyTorchoutputs
and inputs
arguments of torch.autograd.grad
). Kindly guide about what I can do here.
P.S. hessian_analysis.py
is a wrapper I wrote around the library, for my use-case. I verified the wrapper by running a 2-layer neural network for a regression task.
I tried this code using ResNet34 and run for a multiple of times. Due to my limit of GPU RAM, I have to use a mini batch size of 32, while using Hessian batch size 128. However, the top eigenvalue and trace varies a lot. For example, in 10 runs, the max of top eigenvalue is 1587 and the min is 159. The trace also varies from 1284 to 5054.
I thought it may due to small batch size or too few iterations so I changed Hessian batch size to 512 and max iteration to 1024. However, the results are roughly the same in 10 runs.
May I know whether this agree with your results and whether you have some thoughts on the potential cause of this issue?
Thanks for sharing this very interesting package!
I'm trying to use it on some very simple objective functions such as \frac{1}{N}\sum_{i=1}^N\log(x_i^T \theta + \epsilon), but the time cost seems to be high. The dimension of the variable \theta is about 100, and the number of samples N is about 1e6. To get the top 50 eigenvalues, it took about 45 seconds on a GPU. Could you please give some comments on whether such timing is as expected? Thanks!
Line 71 of utils.py
grads.append(0 if param.grad is None else param.grad + 0.)
should rewrite as:
grads.append(param-param if param.grad is None else param.grad + 0.)
The current implementation may cause bugs when there are unused layers in the model. To be specific, when a layer was set require_grad as true but doesn't participate in forward or backward participation, it's grad was set as float zero. It will trigger an error when torch.autograd checks the shape of grads. Detail can be seen in this discussion: #8
Thanks for making PyHessian public. I am trying to find Eigenvalues for a Neural Net that I'm implementing. I set require_grad = True for the weight variables for which I want to calculate the Eigenvalues. I am getting the following error:
RuntimeError: derivative for grid_sampler_2d_backward is not implemented
I was able to calculate first order gradients easily. I am unable to calculate Hv which is at:
hv = torch.autograd.grad(gradsH,
params,
grad_outputs=v,
only_inputs=True,
retain_graph=True)
Could you let me know what the problem could be ?
Hi, thanks for your awesome work!
I noticed that the results in the paper: PYHESSIAN: Neural Networks Through the Lens of the Hessian, the tr(H) keeps increasing during training.
And in this paper: Hessian-based Analysis of Large Batch Training and Robustness to Adversaries, the dominant eigenvalue of the Hessian w.r.t weights could decrease during small-batch training.
And in this paper: CRITICAL LEARNING PERIODS IN DEEP NETWORKS. The trace of FIM increases first, and decrease.
Are there some relationships between them? Are they inconsistent from others?
Sorry to bother you! I found this project calculate hessian trace, but not average hessian trace. How should I calculate average hessian trace? I will so appreciate if you have time to reply it!
Hi!!
First and foremost I would like to say that PyHessian is an incredible package and I am really thankful to the team for open-sourcing it!
I am a PhD student currently studying how the loss landscape behaves with change in neural network depth. For my study, I need to compute the sum of the square of the diagonal entries of the Hessian rather than just the sum (trace). Is there a way to do this with the package?
from pyhessian.client import HessianProxy
ModuleNotFoundError: No module named 'pyhessian.client'
Hi it seems that torch.eig method is deprecated.
I have proposed the solution to the problem. So will try to update the code.
Regards
Piotr
In the function dataloader_hv_product() under the class hessian(), in line 86-87, it follows
'''
THv = [torch.randn(p.size()).to(device) for p in self.params
] # accumulate result
'''
I am wondering why it uses random initialization instead of zero initialization. (Although in actual computation, with large data number, this initialization is approximate to zero.)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.