aleximmer / laplace Goto Github PK
View Code? Open in Web Editor NEWLaplace approximations for Deep Learning.
Home Page: https://aleximmer.github.io/Laplace
License: MIT License
Laplace approximations for Deep Learning.
Home Page: https://aleximmer.github.io/Laplace
License: MIT License
See low-rank branch.
From #69:
We could consider adding more informative error messages when running out of memory during Hessian allocation / computation. E.g., if initialising the Hessian runs out of memory, we could raise an error saying something like
"Your model is too big for using FullLaplace. It has X parameters, so the Hessian would be YTB large, while your CPU/GPU only has ZGB memory available. To use FullLaplace on your machine, your model can at most have ~V parameters. Instead, consider using a more memory-efficient Laplace variant, such as W."
sublaplace.py
and the corresponding classes (only FullSubnetLaplace
possible afaik)Hi,
To make it work, I had to add the following:
import os
os.environ['KMP_DUPLICATE_LIB_OK']='True'
Check the following link:
error 15 initializing-libiomp5md-dll
We should add some tests to catch potential issues with larger models. When computing Hessians it's very easy to run out of memory on consumer hardware, so some tests that check that we don't do any unnecessary allocation of memory might be useful.
Some ideas:
Are there actually good ways to test these things? I.e. do we know which hardware the tests will be run on (probably on some CPU) and can we artificially limit the memory available (e.g. to emulate behaviour for a standard GPU with, say, 16GB of memory, if the CPU comes with more RAM than that)?
Somewhat relatedly, we could consider adding more informative error messages when running out of memory during Hessian allocation / computation. E.g., if initialising the Hessian runs out of memory, we could raise an error saying something like
"Your model is too big for using FullLaplace. It has X parameters, so the Hessian would be YTB large, while your CPU/GPU only has ZGB memory available. To use FullLaplace on your machine, your model can at most have ~V parameters. Instead, consider using a more memory-efficient Laplace variant, such as W."
Parts of the methods or classes implemented in the library are proposed in different papers. Instead of having a single reference list in the readme, we could therefore add references into the docstrings.
Kazuki's asdfghjkl
implements block-diagonal versions of GGN and EF which could readily be used to construct an alternative posterior approximation. This would require a new LA class and backend integration of asdfghjkl
for block-diagonal.
Users might want to avoid computing the Hessian approximation every time they run their code or reuse the same Laplace approximation in different files (e.g. #42). The best interface would probably be .save(filepath)
and .load(filepath)
methods.
__all__
determines what's imported when using from module import *
(https://stackoverflow.com/questions/44834/can-someone-explain-all-in-python), which would be good to use consistently throughout the library.
This can be useful for some applications, for example applying LAs to deepminds neural_testbed
. Discussed here. Code for neural_testbed
.
A straight forward way to improve the code quality is to enable runtime support for type hints via typing
.
Can Laplace support regression models that output a mean and a variance? Thanks for package!
Current version can be found here. For example, Kazuki Osawa mentioned that the data_average
parameter now defaults to True
but we require False
for a proper Hessian approximation.
Currently it's not really clear how the final softmax is dealt with in the classification case, which might lead to confusion / unintentional misuse of the library.
There's two things to clarify:
Laplace
shouldn't apply a softmax (either via a nn.Sofmax()
layer in the model or a F.softmax()
call in the overwritten forward pass) but return the logits instead. This could probably most easily be fixed by clarifying it in the documentation/readme and additionally raising a warning if the model outputs on the training set during fit()
lie in [0,1] and sum to 1.Laplace
model applies the softmax internally when making predictions and that, therefore, the user shouldn't apply another softmax on top. Here we can probably only improve the documentation.Probably subclass from torch criteria and keep module parameters for specific library functions.
Additionally could subclass from torch distributions for log probabilities and implement the predictive etc.
Check if or which approximations support models with learnable BatchNorm parameters and add clarifying comment to the docs.
Hello,
Thanks for the amazing library.
I want to first fit the Last Layer Laplace with the training data and then would like to store the essentials (e.g. the Hessian and the mu) and then later use the _glm_predictive_distribution() function with another OOD dataset in a separate python file, without having access to the training data. Could you please me to understand if this decoupling would be possible with the current code? If possible, can you please point me to the metric/variables which I would need to store?
The method proposed by Kwon et al should be implemented for the MC predictives.
Hi! Thank you for developing this module! I experienced an error when trying the Laplace module on Google colab. It says " cannot import name 'Laplace' from 'laplace' (/usr/local/lib/python3.7/dist-packages/laplace/init.py)". Kindly seek your assistance in this issue. Thank you!
My code:
model = LeNet5(num_classes=10).cuda()
trainset, testset, _ , _ = get_dataset('mnist')
train_loader = DataLoader(trainset, 128, True)
# test_loader = DataLoader(testset, 2000, False)
optimizer = torch.optim.SGD(model.parameters(), 0.1, 0.9, weight_decay=5e-4)
criterion = torch.nn.CrossEntropyLoss()
pbar = tqdm(range(epoch), total=epoch)
for _ in pbar:
acc, _ = train_once(model, train_loader, optimizer, criterion)
pbar.set_postfix_str(f'Acc: {acc:.2f}%')
la = Laplace(model, 'classification', 'all', 'kron')
la.fit(train_loader)
Haven't totally understood the math behind......
This has to be treated differently for regression and classification but is very similar to the predictive(..)
currently implemented. Basically laplace.thompson_sample(x, n_samples=1)
should return n_samples
from the posterior on functions f
to perform Thompson sampling in active learning/bandits/BO. For regression, this is simply sampling from the Gaussian distribution on f
while its unclear what would be desired for classification.
This function is currently implemented as predictive_samples()
but is not necessarily correct for the classification case.
Support DataParallel for the predictions and Hessian computation (with Kazuki's backend).
Hi! Is there any way that we can implement custom likelihood instead of 'regression' and 'classification', and data_loader? I'm trying to use laplace
for PINN. So, the negative log-likelihood (loss) and data_loader are slightly different.
My PINN network has two inputs. I faced this issue:
`/usr/local/lib/python3.7/dist-packages/laplace/baselaplace.py in fit(self, train_loader)
120 self.model.eval()
121
--> 122 X, _ = next(iter(train_loader))
123 with torch.no_grad():
124 self.n_outputs = self.model(X[:1].to(self._device)).shape[-1]
ValueError: too many values to unpack (expected 2)`
What would you advise in this case? Thanks!
Currently implemented on the level of BaseLaplace
but should be moved to a specific setter of FullLaplace
where it is actually only required. This can be achieved with a setter decorator @BaseLaplace.prior_precision.setter
.
Currently, these do not support BatchNorm due to the backends but this should not fail silently when all
-weights Laplace is used on networks with Batchnorm.
Do all computations in kernel space which allows for different approximations.
Interesting for low data and output dimensionality.
It would be nice to (have the option to) print a progress bar when fitting the Hessian (e.g. via tqdm
).
For small problems this doesn't matter as it's instant anyways, but for larger problems one can wait a considerable time for the fitting and doesn't really know how long it'll take.
Either change name to inv_temperature or implement it as actual temperature.
Currently, increased temperature leads to more concentrated posteriors so its reverse.
We call it temperature but its actually 1/temperature
Hi,
Are you going to add more examples here?
I will try myself same ones (originally tested with BNN, Pymc3/BNN or Julia/Turing/BNN) :
Use face and signature from wikipedia as logo.
What do you think about another option for tuning the prior precision that uses (gradient-based) optimization methods to minimise the NLL on a validation set? I think this might nicely complement the existing options (i.e. MLL optimization and CV using validation data).
This is e.g. how the temperature parameter in temperature scaling is typically optimized; see an example implementation using BFGS from scipy here.
Even easier would be to use the same optimization approach as for the MLL (i.e. Adam from PyTorch).
To add new subclasses of Laplace or backends, it is sometimes required to implement methods in the subclass that are not explicitly necessary. Following the alternative convention of raising errors on a call allows for more flexible subclassing.
Make more clear which method we take from where, similar to how sklearn does it. For example, here in their docstrings.
Hi guys,
I have been playing around with this library(it's really good!).
This is just a small thing but when losses are calculated in the eig_lowrank
in asdl.py I think the data and the model are not on the same device because a train_loader is passed to the eig_lowrank function. A simple change - .to(device) would make it work.
Keep up the good work! :)
Something like CIFAR-10 pretrained model loading and show how calibration improves.
Naming: laplace-torch
or ideally laplace
.
Would allow to implement other priors than Gaussian where the attribute .delta
or .prior_prec
simply returns the second derivative wrt. NN parameters and can be passed into the Laplace class.
For example, straightforward to implemnet are Gaussian and t-Student.
Simple example showing continual learning with the Laplace approximation on toy data.
It looks like I am getting an error when I pass in a model that is a subsclass of nn.Module.
I am using the following model:
class FeedForward(nn.Module):
def __init__(self, in_dim, hiddens, out_dim, dropout=0.0):
super(FeedForward, self).__init__()
dims = [in_dim] + hiddens + [out_dim]
layers = []
for i in range(len(hiddens)):
start = dims[i]
end = dims[i+1]
p = dropout if i < len(dims) - 2 else 0.0
layer = nn.Linear(start, end)
if p != 0:
layers.append(nn.Sequential(layer, nn.ReLU(), nn.Dropout(p=p)))
else:
layers.append(nn.Sequential(layer, nn.ReLU()))
layers.append(nn.Linear(hiddens[-1], out_dim))
self.layers = nn.Sequential(*layers)
Then I train it:
model = FeedForward(1, [100, 100], 1, dropout=0.0)
lr = 1e-3
optim = torch.optim.Adam([{'params': model.parameters(), 'weight_decay': 1e-4}],
lr=lr)
...
Then I get the following error:
la = Laplace(model, 'regression')
la.fit(train_dl)
Truncated Traceback (Use C-c C-$ to view full TB):
/anaconda3/envs/pytorch_hunter/lib/python3.9/site-packages/backpack/extensions/backprop_extension.py in __get_module_extension(self, module)
97 if self._fail_mode is FAIL_ERROR:
98 # PyTorch converts this Error into a RuntimeError for torch<1.7.0
---> 99 raise NotImplementedError(
100 f"Extension saving to {self.savefield} "
101 "does not have an extension for "
NotImplementedError: Extension saving to kflr does not have an extension for Module <class 'funcprior.models.FeedForward'>
There is no reason to prefent the .fit()
method to be called repeatedly, for example after changing hyperparameters or on a different data set. Currently, this raises a ValueError
here. Maybe raise a warning instead of simply reset the state to enable safe iterative fits.
Hi
Love the work you're doing.
A Question: are the algorithms in this repo agnostic to the downstream task or the type of input?
For example, if I have an object recognition model for LIDAR data or classification of Audio inputs, can I still use the package?
Hi, I have a question about how do you avoid negative determinants of Hessian for logdetermiant.
Hessian matrices are not always positive semi-definite, so generally, they have both positive and negative eigenvalues. In other words, determinants can sometimes be negative, i.e., logdet of Hessians cannot be obtained. As long as I ran some simple experiments with your code, I didn't encounter such an issue, but I'd like to know how do you avoid this problem.
Thank you!
Create a diagram which describes the inheritance structure of all subclasses of laplace.BaseLaplace
.
Useful for: Users who want to implement custom predictive approximations.
Issue: Currently, the predictive approximation is tightly coupled with the Laplace class. So, if the user wanted to implement a new predictive approximation, they have to dig deep into this class, and it might break something not to mention that it can be confusing.
Proposal:
class FunctionPredictive:
def __init__(self, ...):
...
def __call__(self, x):
''' Return 2 arrays for means and vars '''
raise NotImplementedError()
class LinearizedPredictive(FunctionPredictive):
def __init__(self, laplace_net, ...):
self.laplace_net = laplace_net
...
def __call__(self, x):
J = compute_jacobian(laplace_net, x)
return laplace_net.map_prediction(x), J.T @ laplace_net.covmat @ J
class LinkPredictive:
def __init__(self, ...):
...
def __call__(self, f_mean, f_var):
raise NotImplementedError()
class ProbitPredictive(LinkPredictive):
def __init__(self, ...):
...
def __call__(self, f_mean, f_var):
return torch.sigmoid(f_mean / torch.sqrt(1 + pi/8 * f_var))
linearized_pred = LinearizedPred()
probit_pred = ProbitPred() # Set it to `None` if one does regression
laplace_net = Laplace(..., function_predictive=linearized_pred, link_predictive=probit_pred)
laplace_net.fit(train_loader)
laplace_net(x) # Prediction using the specified predictives
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.