Comments (7)
Hi @thomasahle,
I'm not sure I fully understand your question. Individual gradients don't exist in a network with BN layer in train mode, because the individual losses depend on all samples in the mini-batch. The purpose of the warning you're seeing is exactly to point out this caveat. Are you sure the batch_l2
you're getting for the other parameters would not correspond to individual gradient l2 norms, as individual gradients don't exist with BN.
Best,
Felix
from backpack.
Hi Felix, I'm indeed interested in individual gradient l2 norms.
If I can't get them on all my parameters, that's fine.
But it would be nice if the normal non batched grad
would still be computed for my parameters, without having to run backprop again without backpack.
Right now the non batched grad
is only computed whenever batch_l2
is computed.
from backpack.
Hi Thomas,
Right now the non batched
grad
is only computed wheneverbatch_l2
is computed.
That seems odd to me because BackPACK does not intervene into PyTorch's gradient computation. Are you sure that these parameters have requires_grad = True
?
I'm indeed interested in individual gradient l2 norms.
If I can't get them on all my parameters, that's fine.
For the loss of a neural network with batch normalization, individual gradients, and hence their l2 norm, don't exist. BackPACK only detects this when it encounters a batch norm module. So the result in batch_l2
in the layers before is not a per-sample gradient l2 norm. Maybe this post I wrote is helpful to understand this in more detail.
from backpack.
That seems odd to me because BackPACK does not intervene into PyTorch's gradient computation. Are you sure that these parameters have requires_grad = True?
Yes, it is only the parameters that BatchL2Grad does not support that don't get grad
. The other parameters get both batch_l2
and grad
.
I was thinking this might be a matter of how the exception is handled? That when the NotImplementedError
is thrown, computation somehow gets aborted and grad
isn't computed as it otherwise would have been.
from backpack.
Here is example code of what I mean:
import torch
import torch.nn as nn
import backpack
from backpack.extensions import BatchL2Grad
channels = 5
data = torch.randn(100, channels, 10, 10)
labels = torch.randn(100, 5)
model = nn.Sequential()
model.add_module('conv', nn.Conv2d(channels, 5, kernel_size=3, stride=1, padding=1, bias=False))
model.add_module('batch norm', nn.BatchNorm2d(5))
model.add_module('flat', nn.Flatten(1))
model.add_module('linear', nn.Linear(500, 5))
model = backpack.extend(model)
y = model(data)
loss = torch.sum((y - labels)**2)
with backpack.backpack(BatchL2Grad()):
try:
loss.backward()
except NotImplementedError as e:
pass
for name, param in model.named_parameters():
if not hasattr(param, 'batch_l2'):
print(f'Param {name} has no batch_l2')
if not hasattr(param, 'grad') or param.grad is None:
print(f'Param {name} has no grad')
This outputs:
Param conv.weight has no batch_l2
Param conv.weight has no grad
Param batch norm.weight has no batch_l2
Param batch norm.bias has no batch_l2
In other words, the linear layer gets both batch_l2
and grad
, which is great.
The conv layer doesn't get batch_l2
, since it is below the batch norm, which makes sense from what you wrote.
However, I don't see why conv
couldn't get grad
, just like it would without BatchL2.
from backpack.
Hi Thomas,
I think the problem is that the error message is not strong enough. It should be
"Encountered BatchNorm module in training mode. Quantity to compute is undefined."
The batch_l2
of the elements at the end of the network (after the batchnorm) are getting filled in because we only realise that the computation is meaningless when we hit the batchnorm layer, going backwards. It is not the L2 norm of the individual gradients, even for those parameters. If your work involves gradients of individual samples, you should avoid batchnorm.
If you specifically want to look at what the quantity that would be obtained by applying the same code used to get individual gradients but in a batchnorm network, you can install from source (pip install -e backpack
once you've extracted the source code) and remove the exception.
from backpack.
I guess you are right about batch_l2
not being defined.
I could fall back, and just compute the grad
with normal pytorch instead in this case, but it still seems like BatchL2Grad might as well do it, even if batch_l2
makes no sense, just because BatchL2Grad normally computes grad
, and grad
is defined.
from backpack.
Related Issues (20)
- Support for Custom models? HOT 1
- AttributeError: 'Parameter' object has no attribute 'grad_batch' HOT 7
- pytorch 1.13 support HOT 2
- Extending `BCEWithLogitsLoss` to non-binary labels
- [Feature Request] Levenberg Marquardt HOT 1
- cannot import backpack nor extend HOT 9
- Are customized loss functions supported? HOT 10
- Optimizing the locations of the Jacobians HOT 5
- add support for torch 2.0? HOT 7
- Encountered node that may break second-order extensions HOT 2
- Second order extension HOT 2
- Container modules with advanced control flow & modules with multiple inputs HOT 23
- torch version < 2.x in `setup.cfg` HOT 2
- AdaptiveAvgPool not supported for 2nd order derivatives? HOT 4
- Missing implementation of supported layers for DiagHessian and BatchDiagHessian
- Facing error while Using DiagHessian for torchvision.models.resnet18 HOT 1
- Feature for backpack on VAEs HOT 2
- Extend backpack to deal with weighted sums HOT 14
- Missing Support for BatchNorm and AdaptiveAvgPool in HBP methods (KFAC, KFRA, KFLR) and GGNMP HOT 8
- Second Order Extensions for Custom Loss Modules HOT 8
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from backpack.