Giter Club home page Giter Club logo

pyhessian's People

Contributors

amirgholami avatar dsm-72 avatar leiweimu avatar trsvchn avatar yaozhewei avatar zhenlinluo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pyhessian's Issues

edits for PINN/custom loss functions

How can I use the code for the case where I have a custom loss function, and not a torch.nn."LOSS"?
I'm specifically trying to reproduce the results in https://arxiv.org/abs/2109.01050 , and cannot get the PyHessian class to work.

Also, in the PINN case, network does not have a data=(inputs, targets), can I use the same code/class or I need further edits?

I'm thinking of changing the loss in line 92

loss = self.criterion(outputs, targets.to(device))

with my custom loss (from the network)

wondering if that's all the changes I need


PS:
Also since I'm only interested in the eigenvectors, tried to exctract parameters and gradients, i.e.,

def get_params_grad(model):

and pass it to
def eigenvalues(self, maxIter=100, tol=1e-3, top_n=1):

Though it's throwing some error at me, would appreciate any insights,

`Traceback (most recent call last):

File "/tmp/ipykernel_709492/339026100.py", line 1, in
hv = torch.autograd.grad(gradsH,

File "/home/xxx/.local/lib/python3.8/site-packages/torch/autograd/init.py", line 229, in grad
grad_outputs_ = make_grads(outputs, grad_outputs)

File "/home/xxx/.local/lib/python3.8/site-packages/torch/autograd/init.py", line 33, in _make_grads
if not out.shape == grad.shape:

AttributeError: 'float' object has no attribute 'shape'`

Unexpected `shape` issue in Hessian-Vector computation

Hi!

Thank you making the source code of your work available. I tried to use the library for an application involving a 3D network architecture, and ran into the following issue:

********** Commencing Hessian Computation **********
Traceback (most recent call last):
  File "hessian_analysis.py", line 181, in <module>
    hessianObj.analyze(model_checkpoint_filepath)
  File "/media/ee/DATA/Repositories/PyHessian/hessian_analysis.py", line 70, in analyze
    top_eigenvalues, top_eigenvectors  = hessian_comp.eigenvalues(top_n=self.top_n)
  File "/media/ee/DATA/Repositories/PyHessian/pyhessian/hessian.py", line 167, in eigenvalues
    Hv = hessian_vector_product(self.gradsH, self.params, v)
  File "/media/ee/DATA/Repositories/PyHessian/pyhessian/utils.py", line 88, in hessian_vector_product
    retain_graph=True)
  File "/home/ee/anaconda3/envs/torch13/lib/python3.6/site-packages/torch/autograd/__init__.py", line 197, in grad
    grad_outputs_ = _make_grads(outputs, grad_outputs_)
  File "/home/ee/anaconda3/envs/torch13/lib/python3.6/site-packages/torch/autograd/__init__.py", line 32, in _make_grads
    if not out.shape == grad.shape:
AttributeError: 'float' object has no attribute 'shape'

Interestingly, the issue does not occur at the first call to back-propagation via loss.backward(), rather occurs at the call to torch.autograd.grad().

I believe that the float object in question is the 0. manually inserted when param.grad is None in the following routine:

def get_params_grad(model):
"""
get model parameters and corresponding gradients
"""
params = []
grads = []
for param in model.parameters():
if not param.requires_grad:
continue
params.append(param)
grads.append(0. if param.grad is None else param.grad + 0.)
return params, grads

If I am right, it is even more mind-boggling that a type float is able to pass the check for data-type in PyTorch (I mistakenly mixed outputs and inputs arguments of torch.autograd.grad). Kindly guide about what I can do here.

P.S. hessian_analysis.py is a wrapper I wrote around the library, for my use-case. I verified the wrapper by running a 2-layer neural network for a regression task.

Large variance of results

I tried this code using ResNet34 and run for a multiple of times. Due to my limit of GPU RAM, I have to use a mini batch size of 32, while using Hessian batch size 128. However, the top eigenvalue and trace varies a lot. For example, in 10 runs, the max of top eigenvalue is 1587 and the min is 159. The trace also varies from 1284 to 5054.
I thought it may due to small batch size or too few iterations so I changed Hessian batch size to 512 and max iteration to 1024. However, the results are roughly the same in 10 runs.

May I know whether this agree with your results and whether you have some thoughts on the potential cause of this issue?

computational (time) cost

Thanks for sharing this very interesting package!
I'm trying to use it on some very simple objective functions such as \frac{1}{N}\sum_{i=1}^N\log(x_i^T \theta + \epsilon), but the time cost seems to be high. The dimension of the variable \theta is about 100, and the number of samples N is about 1e6. To get the top 50 eigenvalues, it took about 45 seconds on a GPU. Could you please give some comments on whether such timing is as expected? Thanks!

Potential bugs

Line 71 of utils.py
grads.append(0 if param.grad is None else param.grad + 0.)
should rewrite as:
grads.append(param-param if param.grad is None else param.grad + 0.)

The current implementation may cause bugs when there are unused layers in the model. To be specific, when a layer was set require_grad as true but doesn't participate in forward or backward participation, it's grad was set as float zero. It will trigger an error when torch.autograd checks the shape of grads. Detail can be seen in this discussion: #8

RuntimeError: derivative for grid_sampler_2d_backward is not implemented

Thanks for making PyHessian public. I am trying to find Eigenvalues for a Neural Net that I'm implementing. I set require_grad = True for the weight variables for which I want to calculate the Eigenvalues. I am getting the following error:

RuntimeError: derivative for grid_sampler_2d_backward is not implemented

I was able to calculate first order gradients easily. I am unable to calculate Hv which is at:

hv = torch.autograd.grad(gradsH,
params,
grad_outputs=v,
only_inputs=True,
retain_graph=True)

Could you let me know what the problem could be ?

Hi, a question of inconsistency of the dymamics of tr(FIM) and tr(H).

Hi, thanks for your awesome work!

I noticed that the results in the paper: PYHESSIAN: Neural Networks Through the Lens of the Hessian, the tr(H) keeps increasing during training.

image

And in this paper: Hessian-based Analysis of Large Batch Training and Robustness to Adversaries, the dominant eigenvalue of the Hessian w.r.t weights could decrease during small-batch training.

image

And in this paper: CRITICAL LEARNING PERIODS IN DEEP NETWORKS. The trace of FIM increases first, and decrease.
image

Are there some relationships between them? Are they inconsistent from others?

How to calculate average Hessian trace?

Sorry to bother you! I found this project calculate hessian trace, but not average hessian trace. How should I calculate average hessian trace? I will so appreciate if you have time to reply it!

Computing sum of square of diagonal entries of Hessian

Hi!!

First and foremost I would like to say that PyHessian is an incredible package and I am really thankful to the team for open-sourcing it!

I am a PhD student currently studying how the loss landscape behaves with change in neural network depth. For my study, I need to compute the sum of the square of the diagonal entries of the Hessian rather than just the sum (trace). Is there a way to do this with the package?

deprecation of torch.eig method

Hi it seems that torch.eig method is deprecated.
I have proposed the solution to the problem. So will try to update the code.

Regards
Piotr

A question on the computation of Hessian-vector product

In the function dataloader_hv_product() under the class hessian(), in line 86-87, it follows
'''
THv = [torch.randn(p.size()).to(device) for p in self.params
] # accumulate result
'''
I am wondering why it uses random initialization instead of zero initialization. (Although in actual computation, with large data number, this initialization is approximate to zero.)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.