amirgholami / pyhessian Goto Github PK

View Code? Open in Web Editor NEW

632.0 632.0 111.0 2.1 MB

PyHessian is a Pytorch library for second-order based analysis and training of Neural Networks

License: MIT License

Python 28.72% Jupyter Notebook 71.28%

hessian hessian-free pytorch-library second-order second-order-optimization

pyhessian's People

Contributors

Stargazers

Watchers

Forkers

drwanghan drumilt 2kangho roham-ghotbi somous-jhzhao himalayajung zhenlinluo lilujunai iamsalil ailsaf sff1019 ehariri soudia 1157942086 wegamekinglc ml-ai-nlp-ir srivastavakshitij mtkwt dsp6414 snehashischatterjee1997 brunocavagnaro jrpedersen budhirajachinmay dendisuhubdy idetatsu elendorial zueigung1419 fagan2888 sirius93123 stalhabukhari sungyoon-lee tomprivateaccount honda18 pkadambi pksvision sanghyun-hong ekanshs szalata smamooler yikai-wu lthilnklover udemirezen kasrayazdani thuako vikranth22446 ihaeyong zhiyugege wizard1203 wzb1005 1hunters pkulwj1994 johnbensnyder xiaohangge jbolt01 vishalbelsare yoontae6719 tabtoyou xchuwenbo attiayoussef ebugger flyeagle0 xgxg1314 doha-hwang ziyu-deep kriskrisliu michaeleinhorn stau-7001 isaac-jl-chen prafull-bhosale tclw123 willtrojak jicampos asclepiusinformatica wiseodd shihuihong214 celestialized dsm-72 dev-jahn thannaga frankinwi thetechdude124 wesenu hwan-sig f-dangel gtuif garlguo andreakiro decem5150 heyzude ken-starfinger kien-vu emp325 standardgalactic jiaqing-asu olivia-fsm jy-sakata danield21 yesin25 its-peggy filippo-rambelli

pyhessian's Issues

edits for PINN/custom loss functions

How can I use the code for the case where I have a custom loss function, and not a torch.nn."LOSS"?
I'm specifically trying to reproduce the results in https://arxiv.org/abs/2109.01050 , and cannot get the PyHessian class to work.

Also, in the PINN case, network does not have a data=(inputs, targets), can I use the same code/class or I need further edits?

I'm thinking of changing the loss in line 92

PyHessian/pyhessian/hessian.py

Line 92 in 1a42737

loss = self.criterion(outputs, targets.to(device))

with my custom loss (from the network)

wondering if that's all the changes I need

PS:
Also since I'm only interested in the eigenvectors, tried to exctract parameters and gradients, i.e.,

PyHessian/pyhessian/utils.py

Line 61 in 1a42737

def get_params_grad(model):

and pass it to

PyHessian/pyhessian/hessian.py

Line 111 in 1a42737

def eigenvalues(self, maxIter=100, tol=1e-3, top_n=1):

Though it's throwing some error at me, would appreciate any insights,

`Traceback (most recent call last):

File "/tmp/ipykernel_709492/339026100.py", line 1, in
hv = torch.autograd.grad(gradsH,

File "/home/xxx/.local/lib/python3.8/site-packages/torch/autograd/init.py", line 229, in grad
grad_outputs_ = make_grads(outputs, grad_outputs)

File "/home/xxx/.local/lib/python3.8/site-packages/torch/autograd/init.py", line 33, in _make_grads
if not out.shape == grad.shape:

AttributeError: 'float' object has no attribute 'shape'`

How do we compute eigenvalues of Hessian matrices for each weight matrices or for each module in a model rather than calculating the eigenvalue for the whole model??

Because it seems that the function which calculates the eigenvalues only returns one eigenvalue for the whole model.

RuntimeError: element 0 of variables does not require grad and does not have a grad_fn

In hessian.py
When I tried to run the following code,
Hv = hessian_vector_product(gradsH, params, v)

The error comes out.
Is there any idea how to resolve this issue?

Unexpected `shape` issue in Hessian-Vector computation

Hi!

Thank you making the source code of your work available. I tried to use the library for an application involving a 3D network architecture, and ran into the following issue:

********** Commencing Hessian Computation **********
Traceback (most recent call last):
  File "hessian_analysis.py", line 181, in <module>
    hessianObj.analyze(model_checkpoint_filepath)
  File "/media/ee/DATA/Repositories/PyHessian/hessian_analysis.py", line 70, in analyze
    top_eigenvalues, top_eigenvectors  = hessian_comp.eigenvalues(top_n=self.top_n)
  File "/media/ee/DATA/Repositories/PyHessian/pyhessian/hessian.py", line 167, in eigenvalues
    Hv = hessian_vector_product(self.gradsH, self.params, v)
  File "/media/ee/DATA/Repositories/PyHessian/pyhessian/utils.py", line 88, in hessian_vector_product
    retain_graph=True)
  File "/home/ee/anaconda3/envs/torch13/lib/python3.6/site-packages/torch/autograd/__init__.py", line 197, in grad
    grad_outputs_ = _make_grads(outputs, grad_outputs_)
  File "/home/ee/anaconda3/envs/torch13/lib/python3.6/site-packages/torch/autograd/__init__.py", line 32, in _make_grads
    if not out.shape == grad.shape:
AttributeError: 'float' object has no attribute 'shape'

Interestingly, the issue does not occur at the first call to back-propagation via loss.backward(), rather occurs at the call to torch.autograd.grad().

I believe that the float object in question is the 0. manually inserted when param.grad is None in the following routine:

PyHessian/pyhessian/utils.py

Lines 61 to 72 in c2e49d2

 def get_params_grad(model): 

 """ 

  get model parameters and corresponding gradients 

  """ 

 params = [] 

 grads = [] 

 for param in model.parameters(): 

 if not param.requires_grad: 

 continue 

 params.append(param) 

 grads.append(0. if param.grad is None else param.grad + 0.) 

 return params, grads

~~If I am right, it is even more mind-boggling that a type float is able to pass the check for data-type in PyTorch~~ (I mistakenly mixed outputs and inputs arguments of torch.autograd.grad). Kindly guide about what I can do here.

P.S. hessian_analysis.py is a wrapper I wrote around the library, for my use-case. I verified the wrapper by running a 2-layer neural network for a regression task.

Large variance of results

I tried this code using ResNet34 and run for a multiple of times. Due to my limit of GPU RAM, I have to use a mini batch size of 32, while using Hessian batch size 128. However, the top eigenvalue and trace varies a lot. For example, in 10 runs, the max of top eigenvalue is 1587 and the min is 159. The trace also varies from 1284 to 5054.
I thought it may due to small batch size or too few iterations so I changed Hessian batch size to 512 and max iteration to 1024. However, the results are roughly the same in 10 runs.

May I know whether this agree with your results and whether you have some thoughts on the potential cause of this issue?

computational (time) cost

Thanks for sharing this very interesting package!
I'm trying to use it on some very simple objective functions such as \frac{1}{N}\sum_{i=1}^N\log(x_i^T \theta + \epsilon), but the time cost seems to be high. The dimension of the variable \theta is about 100, and the number of samples N is about 1e6. To get the top 50 eigenvalues, it took about 45 seconds on a GPU. Could you please give some comments on whether such timing is as expected? Thanks!

Potential bugs

Line 71 of utils.py
grads.append(0 if param.grad is None else param.grad + 0.)
should rewrite as:
grads.append(param-param if param.grad is None else param.grad + 0.)

The current implementation may cause bugs when there are unused layers in the model. To be specific, when a layer was set require_grad as true but doesn't participate in forward or backward participation, it's grad was set as float zero. It will trigger an error when torch.autograd checks the shape of grads. Detail can be seen in this discussion: #8

Where is the hessian trace calculation w.r.t activations?

RuntimeError: derivative for grid_sampler_2d_backward is not implemented

Thanks for making PyHessian public. I am trying to find Eigenvalues for a Neural Net that I'm implementing. I set require_grad = True for the weight variables for which I want to calculate the Eigenvalues. I am getting the following error:

RuntimeError: derivative for grid_sampler_2d_backward is not implemented

I was able to calculate first order gradients easily. I am unable to calculate Hv which is at:

hv = torch.autograd.grad(gradsH,
params,
grad_outputs=v,
only_inputs=True,
retain_graph=True)

Could you let me know what the problem could be ?

torch.eig deprecated please use torch.linalg.eig for line 261 of hessian.py

Hi, a question of inconsistency of the dymamics of tr(FIM) and tr(H).

Hi, thanks for your awesome work!

I noticed that the results in the paper: PYHESSIAN: Neural Networks Through the Lens of the Hessian, the tr(H) keeps increasing during training.

And in this paper: Hessian-based Analysis of Large Batch Training and Robustness to Adversaries, the dominant eigenvalue of the Hessian w.r.t weights could decrease during small-batch training.

And in this paper: CRITICAL LEARNING PERIODS IN DEEP NETWORKS. The trace of FIM increases first, and decrease.

Are there some relationships between them? Are they inconsistent from others?

How to calculate average Hessian trace?

Sorry to bother you! I found this project calculate hessian trace, but not average hessian trace. How should I calculate average hessian trace? I will so appreciate if you have time to reply it!

Hi! Can Pyhessian Perform stage-wise Hessian trace?

Computing sum of square of diagonal entries of Hessian

Hi!!

First and foremost I would like to say that PyHessian is an incredible package and I am really thankful to the team for open-sourcing it!

I am a PhD student currently studying how the loss landscape behaves with change in neural network depth. For my study, I need to compute the sum of the square of the diagonal entries of the Hessian rather than just the sum (trace). Is there a way to do this with the package?

No module named 'pyhessian.client'

from pyhessian.client import HessianProxy

ModuleNotFoundError: No module named 'pyhessian.client'

deprecation of torch.eig method

Hi it seems that torch.eig method is deprecated.
I have proposed the solution to the problem. So will try to update the code.

Regards
Piotr

A question on the computation of Hessian-vector product

In the function dataloader_hv_product() under the class hessian(), in line 86-87, it follows
'''
THv = [torch.randn(p.size()).to(device) for p in self.params
] # accumulate result
'''
I am wondering why it uses random initialization instead of zero initialization. (Although in actual computation, with large data number, this initialization is approximate to zero.)

	def get_params_grad(model):
	"""
	get model parameters and corresponding gradients
	"""
	params = []
	grads = []
	for param in model.parameters():
	if not param.requires_grad:
	continue
	params.append(param)
	grads.append(0. if param.grad is None else param.grad + 0.)
	return params, grads