Giter Club home page Giter Club logo

nngeometry's Introduction

NNGeometry

Build Status codecov DOI PyPI version

NNGeometry allows you to:

  • compute Gauss-Newton or Fisher Information Matrices (FIM), as well as any matrix that is written as the covariance of gradients w.r.t. parameters, using efficient approximations such as low-rank matrices, KFAC, EKFAC, diagonal and so on.
  • compute finite-width Neural Tangent Kernels (Gram matrices), even for multiple output functions.
  • compute per-examples jacobians of the loss w.r.t network parameters, or of any function such as the network's output.
  • easily and efficiently compute linear algebra operations involving these matrices regardless of their approximation.
  • compute implicit operations on these matrices, that do not require explicitely storing large matrices that would not fit in memory.

It offers a high level abstraction over the parameter and function spaces described by neural networks. As a simple example, a parameter space vector PVector actually contains weight matrices, bias vectors, or convolutions kernels of the whole neural network (a set of tensors). Using NNGeometry's API, performing a step in parameter space (e.g. an update of your favorite optimization algorithm) is abstracted as a python addition: w_next = w_previous + epsilon * delta_w.

Example

In the Elastic Weight Consolidation continual learning technique, you want to compute $\left(\mathbf{w}-\mathbf{w}_{A}\right)^{\top}F\left(\mathbf{w}-\mathbf{w}_{A}\right)$. It can be achieved with a diagonal approximation for the FIM using:

F = FIM(model=model,
        loader=loader,
        representation=PMatDiag,
        n_output=10)

regularizer = F.vTMv(w - w_a)

The first statement instantiates a diagonal matrix, and populates it with the diagonal coefficients of the FIM of the model model computed using the examples from the dataloader loader.

If diagonal is not sufficiently accurate then you could instead choose a KFAC approximation, by just changing PMatDiag to PMatKFAC in the above. Note that it internally involves very different operations, depending on the chosen representation (e.g. KFAC, EKFAC, ...).

Documentation

You can visit the documentation at https://nngeometry.readthedocs.io.

More example usage are available in the repository https://github.com/tfjgeorge/nngeometry-examples.

Feature requests, bugs, contributions, or any kind of request

You are now many who are using NNGeometry in your work: do not hesitate to drop me a line ([email protected]) about your project so that I have a better understanding of your use cases or the current limitations of the library.

We welcome any feature request or bug report in the issue tracker.

We also welcome contributions, please submit your PRs!

Citation

If you use NNGeometry in a published project, please cite our work using the following bibtex entry

@software{george_nngeometry,
  author       = {Thomas George},
  title        = {{NNGeometry: Easy and Fast Fisher Information 
                   Matrices and Neural Tangent Kernels in PyTorch}},
  month        = feb,
  year         = 2021,
  publisher    = {Zenodo},
  version      = {v0.3},
  doi          = {10.5281/zenodo.4532597},
  url          = {https://doi.org/10.5281/zenodo.4532597}
}

License

This project is distributed under the MIT license (see LICENSE file). This project also includes code licensed under the BSD 3 clause as it borrows some code from https://github.com/owkin/grad-cnns.

nngeometry's People

Contributors

dependabot[bot] avatar tfjgeorge avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

nngeometry's Issues

Simple Model: RuntimeError: One of the differentiated Tensors appears to not have been used in the graph.

I'm trying to use the latest git release of NNGeometry's FIM to find the Fisher metric of my trivial model. As a simple example I create a model which has a single Linear layer, a single training sample, and solves the matrix equation Ax=b, where A is a 3x3 matrix, whilst x, b are 3x1 col. vectors.

Here's my code (it's not meant for anything functional -- it's just to see how these things work):

import torch
import torch.nn as nn
import torch.optim as optim

class Net(nn.Module):
    def __init__(self, input_dim, output_dim):
        super(Net, self).__init__()
        self.linear = nn.Linear(input_dim, output_dim, bias=False)

    def forward(self, x):
        out = self.linear(x)
        return out

model = nn.Linear(9, 3, bias=False)

# Define the training data
A = nn.Parameter(torch.tensor([[1., 2., 3.],
                               [4., 5., 6.],
                               [7., 8., 9.]]))

b = nn.Parameter(torch.tensor([[52.],
                              [124.],
                              [196.]]))

# Define the model and the optimizer
# model = Net(input_dim=9, output_dim=3)
optimizer = optim.Adam(model.parameters(), lr=0.01)

# Train the model
for epoch in range(2000):
    optimizer.zero_grad()
    y_pred = model(A.view(9))
    print(A@y_pred)
    loss = nn.MSELoss(reduction='sum')(A@y_pred.view(3,1), b)
    loss.backward()
    optimizer.step()

Now I create a dataloader with a single batch containing the single training sample:

from torch.utils.data import DataLoader, Dataset

class TrivialDataset(Dataset):
    def __init__(self):
        self.data = torch.arange(1, 10, dtype=torch.float32).view(1,1,9)
    def __getitem__(self, index):
        return self.data[index]

    def __len__(self):
        return len(self.data)

batch_size = 1
dataset = TrivialDataset()
loader = DataLoader(dataset, batch_size=batch_size)

Attempting to compute the Fisher metrics gives a runtime error due to the differentiated tensors not being used.
`

from nngeometry.metrics import FIM
from nngeometry.object import PMatDense, PMatBlockDiag
# check dimensions
print(model)
fisher_metric = FIM(model, loader, n_output=3, variant='regression', representation=PMatDense, device='cpu')

RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.`

I'm at an utter loss as to why this is happening. Is this a bug in NNGeometry (unlikely) or am I doing something extremely stupid (increasingly likely)? Thanks!

Make differentiating inputs optional

Hello,

I am calculating the EK-FAC representation on the linear layers of a model with an embedding layer. The inputs are integers which are lookup values and therefore should not be differentiated through. Within the calculation of the kfac blocks in the Jacobian class the inputs are labelled as requiring grad, which gives me RuntimeError: only Tensors of floating point and complex dtype can require gradients. As far as I can tell, this is just so that the hooks can be triggered on all the layers, so a simple backward call would be enough.

Here is a simple code to replicate the error

from torch.nn import Embedding, Linear

from nngeometry.metrics import FIM
from nngeometry.object import PMatEKFAC
import torch as th
from nngeometry.layercollection import LayerCollection


class EmbeddingModel(th.nn.Module):

    def __init__(
        self,
        n_input: int,
        n_output: int,
    ):
        super().__init__()
        self.embedding = Embedding(n_input, 10)
        self.fc1 = Linear(10, n_output)

    def forward(self, x):
        return self.fc1(self.embedding(x))

if __name__ == "__main__":
    model = EmbeddingModel(10, 1)
    active_layers = LayerCollection()
    active_layers.add_layer_from_model(model, model.fc1)
    dataset = th.utils.data.TensorDataset(th.randint(1, 1000, (100,)), th.randint(0, 1, (100,)))
    loader = th.utils.data.DataLoader(dataset, batch_size=10)

    F_ekfac = FIM(model, loader, PMatEKFAC, 1, variant='classif_logits', layer_collection=active_layers)

Would it be possible to remove the inputs.requires_grad = True or make it optional?
Thanks a lot for your help and for your very instructive library.

compute_correlation() intent

Hi @tfjgeorge, thanks for the library, a very useful framework. I have a quick question regarding the compute_correlation you're calling in the notebook, why is that being done before visualization of the FIM ? Not sure if I understood what is the intention. Thank you !

Error with FIM for resnet18

Code:

import numpy as np
import torch
from torchvision import models
from torch.utils.data import DataLoader
from continuum.datasets import InMemoryDataset
from nngeometry.metrics import FIM
from nngeometry.object import PMatDiag

random_x_data = np.random.randint(0, 255, size=(20, 264, 264, 3))
random_y_data = np.arange(20)
data = InMemoryDataset(random_x_data, random_y_data).to_taskset()

model = models.resnet18(pretrained=True).cuda()

fisher_loader = DataLoader(data, batch_size=1, shuffle=True, num_workers=6)

fim = FIM(model=model.eval(),
         loader=fisher_loader,
         representation=PMatDiag,
         n_output=10,
         variant='classif_logits',
         device='cuda')

Error:
image

How to compute FIM with nn.DataParallel(model)?

Hey Thomas –

Thank you for creating this terrific package! I am wondering what should be the correct way to compute FIM when we use multiple GPUs with nn.DataParallel() to load the network.

Specifically, I encountered an KeyError when I tried to run with 3 GPUs and wrap my network with nn.DataParallel(). Below is a simplified sample for my code:

# Create model instance
class MNISTLeNet(nn.Module):
    def __init__(self):
        super(MNISTLeNet, self).__init__()
        self.cnn_model = nn.Sequential( nn.Conv2d(1,6,5), nn.ReLU(), nn.AvgPool2d(2, stride=2),    
            nn.Conv2d(6, 16, 5),  nn.ReLU(), nn.AvgPool2d(2, stride=2) )

        self.fc_model = nn.Sequential(nn.Linear(256, 120), nn.ReLU(), nn.Linear(120, 84),
            nn.ReLU(), nn.Linear(84, 10))

    def forward(self,x):
        x = self.cnn_model(x)
        x = x.view(x.size(0), - 1)
        x = self.fc_model(x)
        return x

# Parallelize the model
model = MNISTLeNet()
model = torch.nn.DataParallel(model).to(device)

# Calculate only linear and Conv2d layers
layer_collection = LayerCollection()
for layer in model.modules():
        if type(layer) in (nn.Linear, nn.Conv2d):
            layer_collection.add_layer_from_model(model, layer)

# Get the Fisher Information Matrix
F_kfac = FIM(layer_collection=layer_collection,
             model=model,
             loader=test_loader,
             representation=PMatKFAC,
             n_output=10,
             variant='classif_logits',
             device='cuda')

And I got the following error message:

  File "/home/-/github/project/helper/utils.py", line 358, in get_fisher
    F_kfac = FIM(layer_collection=layer_collection,
  File "/home/-/.local/lib/python3.8/site-packages/nngeometry/metrics.py", line 169, in FIM
    return representation(generator=generator, examples=loader)
  File "/home/-/.local/lib/python3.8/site-packages/nngeometry/object/pspace.py", line 436, in __init__
    self.data = generator.get_kfac_blocks(examples)
  File "/home/-/.local/lib/python3.8/site-packages/nngeometry/generator/jacobian/__init__.py", line 249, in get_kfac_blocks
    torch.autograd.grad(output[self.i_output], [inputs],
  File "/home/-/.local/lib/python3.8/site-packages/torch/autograd/__init__.py", line 275, in grad
    return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
  File "/home/-/.local/lib/python3.8/site-packages/nngeometry/generator/jacobian/__init__.py", line 629, in <lambda>
    o.register_hook(lambda g_o: hook_gy(mod, g_o))
  File "/home/-/.local/lib/python3.8/site-packages/nngeometry/generator/jacobian/__init__.py", line 682, in _hook_compute_kfac_blocks
    layer_id = self.m_to_l[mod]
KeyError: Linear(in_features=84, out_features=10, bias=True)

The error seems not to be raised when using only one GPU.

Would you have any idea how to efficiently solve the issue and compute FIM with multiple GPUs? Thank you so much! : )

`LayerCollection.from_model` does not get all available layers

import torch.nn as nn
from nngeometry.layercollection import LayerCollection

layers = [nn.Flatten(), nn.Linear(28 * 28, 100), nn.ReLU()] + \
          [nn.Linear(100, 100), nn.ReLU()] * 10 + \
          [nn.Linear(100, 10)]
model = nn.Sequential(*layers)
lc = LayerCollection.from_model(model)
lc.layers.items()

Output (only has 3 layers in lc):

odict_items([('1.Linear(in_features=784, out_features=100, bias=True)', <nngeometry.layercollection.LinearLayer object at 0x7e6357f401f0>), ('3.Linear(in_features=100, out_features=100, bias=True)', <nngeometry.layercollection.LinearLayer object at 0x7e6357f43820>), ('23.Linear(in_features=100, out_features=10, bias=True)', <nngeometry.layercollection.LinearLayer object at 0x7e6357f42c80>)])

Extracting Eigenvalues of Fisher using KFAC Representation

I am trying to get the eigenspectrum of the Fisher of my neural network using the compute_eigendecomposition() and get_eigendecomposition() methods in the KFAC implementation, but I am having trouble intepreting the returned dictionary.

If I just want to get the sorted eigenvalues of my Fisher as a flat tensor, what is the best way I should go about doing this using NNGeometry? Would getting the eigenvalues of the dense_tensor be sufficient? Also, the torch.symeig function used in the eigendecomposition calculation seem to be deprecated and torch suggests using torch.linalg.eigh.

Pytorch Warning -- Non-Full Backward Hook when the Forward Contains Multiple Autograd Nodes

I get the following user warning from PyTorch when I try to instantiate a FIM object using a simple 2 layer network for regression:

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py:974: UserWarning: Using a non-full backward hook when the forward contains multiple autograd Nodes is deprecated and will be removed in future versions. This hook will be missing some grad_input. Please use register_full_backward_hook to get the documented behavior. warnings.warn("Using a non-full backward hook when the forward contains multiple autograd Nodes "

Here is the model:

    def __init__(self, params):
        super().__init__()
        self.model_ = nn.Sequential(
            nn.Linear(params['input_size'], params['hidden_size'], bias = False),
            nn.ReLU(),  # when modelling non-linearities
            #nn.Dropout(params['dropout_p']),
            nn.Linear(params['hidden_size'], params['output_size'], bias = False)
        )
        self.optim_ = torch.optim.Adam(
            self.model_.parameters(), 
            lr=params['lr']
        )
  def forward(self, X):
        return self.model_(X)```

Here is the instantiation of the FIM object:

F = FIM(model=model, loader=trainloader, representation=PMatKFAC, variant = 'regression', n_output = 1, device= 'cpu')

I'm not sure what the warning is referring to, but since it is saying that a deprecated feature of PyTorch is being used, I think it is worth looking into?

Scaling of parameter space representations

Many thanks for this interesting library!

Comparing with analytical expressions, I think the provided dense representation of Fisher information matrix is calculated as the expectation over the data points in the train loader. Are the other representations, e.g. KFAC and EKFAC, on the same scale? Or, is there a constant scaling, e.g. by the batch size, that we should be aware of?

Error with float64 tensors

Hello and thanks for your work.
Ekfac seems to have issues with models that work on double precision. Here is a code to reproduce it:

from nngeometry.metrics import FIM
from nngeometry.object import PMatEKFAC
import torch as th

dtype = th.float64

class SimpleModel(th.nn.Module):

    def __init__(
        self,
        n_input: int,
        n_output: int,
    ):
        super().__init__()
        self.fc1 = th.nn.Linear(n_input, n_output, bias=True, dtype=dtype)

    def forward(self, x):
        return th.nn.Softmax(dim=-1)(self.fc1(x))
    


if __name__ == "__main__":
    model = SimpleModel(10, 3)
    dataset = th.utils.data.TensorDataset(th.randn(100, 10, dtype=dtype), th.randint(0, 3, (100,), dtype=th.long))
    loader = th.utils.data.DataLoader(dataset, batch_size=10)
    F_ekfac = FIM(model, loader, PMatEKFAC, 3, variant='classif_logits')
    F_ekfac.update_diag(loader)

I get "RuntimeError: expected scalar type Double but found Float"

Support of ParameterList

Hello,

I'm trying to support FIM with PMatDiag for my model that has some nn.ParameterList but:

Traceback (most recent call last):
  File "run.py", line 381, in <module>
    main(train_scenario, test_scenario, args)
  File "run.py", line 191, in main
    device='cuda'
  File "/local/douillard/conda_env_continualexp/lib/python3.7/site-packages/nngeometry/metrics.py", line 131, in FIM
    layer_collection = LayerCollection.from_model(model)
  File "/local/douillard/conda_env_continualexp/lib/python3.7/site-packages/nngeometry/layercollection.py", line 46, in from_model
    raise Exception('I do not know what to do with layer ' + str(mod))
Exception: I do not know what to do with layer ParameterList(
    (0): Parameter containing: [torch.FloatTensor of size 50x5]
    (1): Parameter containing: [torch.FloatTensor of size 50x5]
)

Could you support it please?

Thank you already for your work and codebase! ❤️

cc @TLESORT

compute KFAC matrix on big network

hi, have you tried to compute the KFAC matrix on a little big networks such as resnet18? I tried to replace the network by resnet18 in your examples Continual_learning_EWC.ipynb , however it seems that the KFAC matrix is too big to be computed.
this is the error:
File "/nngeometry/nngeometry/metrics.py", line 171, in FIM
return representation(generator=generator, examples=loader)
File "/nngeometry_/nngeometry/object/pspace.py", line 439, in init
self.data = generator.get_kfac_blocks(examples)
File "/nngeometry_/nngeometry/generator/jacobian/init.py", line 247, in get_kfac_blocks
output = self.function(*d).view(bs, self.n_output).sum(dim=0)
RuntimeError: shape '[50, 30]' is invalid for input of size 50000_

what's the meaning of implementing the hook_compute_diag function?

I am confused about the details of the _hook_compute_diag that why should multiply the grad with x, which is the input of the layer before back propagation.
Other implement of fisher matrix is like below:

for n, p in self.model.named_parameters():
                precision_matrices[n].data += p.grad.data ** 2 / len(self.dataset) 

Anyone who can explain this , thank you in advance.

Information to include in the FIM() loader argument

Hi,
I'm getting started with your code and I would like to adapt it to a supervised regression ML model. I was wondering which information should be included in the FIM(model, loader, representation, n_output, variant='classif_logits', device='cpu', function=None, layer_collection=None) loader argument? Is it only the new inputs to the model, the output or both?

In your example, you are using nngeometry to do continual learning from MNIST dataset, which is a classification problem. I would like to know if your code only accept ClassIncremental() dataset or can be adapted to regression problems ? If yes, how should it be implemented?

Thank you.

Montecarlo Sampling Question

Looking at the implementation of the Monte Carlo Sampling for computing the Fisher Information I see that the sampled probabilities are divided by the square root of the number of trials. Is there a specific reason for this? Why not simply dividing by the number of trials?

Support for ConvTranspose2d layers?

Hello!

Will there ever be support for ConvTranspose2d layers? Or is it a very easy corner case from Conv2D layers thjat can implement?

I tried to calculate the exact FIM for this autoencoder;

class GradConCAE(nn.Module):
    def __init__(self, in_channel=3):
        super(GradConCAE, self).__init__()

        self.down = nn.Sequential(
            nn.Conv2d(in_channel, 32, 4, stride=2, padding=2),
            nn.ReLU(),
            nn.Conv2d(32, 32, 4, stride=2, padding=2),
            nn.ReLU(),
            nn.Conv2d(32, 64, 4, stride=2, padding=2),
            nn.ReLU(),
            nn.Conv2d(64, 64, 4, stride=2, padding=2),
            nn.ReLU()
        )

        self.up = nn.Sequential(
            nn.ConvTranspose2d(64, 64, 4, stride=2, padding=2),  # output 4x4
            nn.ReLU(),
            nn.ConvTranspose2d(64, 32, 4, stride=2, padding=1),  # output 8x8
            nn.ReLU(),
            nn.ConvTranspose2d(32, 32, 4, stride=2, padding=2),  # output 14x14
            nn.ReLU(),
            nn.ConvTranspose2d(32, in_channel, 4, stride=2, padding=1),  # output 28x28
            nn.Sigmoid()
        )

        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        z = self.down(x)
        return self.up(z)

Implementing BatchNorm for KFAC

Hello,

I am trying to use BatchNormalization in my network trained on CIFAR. The network has about 50,000 parameters and I want to use the KFAC representation in order to speed up computations. However, it looks like BatchNorm2D is unimplemented for KFAC. Would it be possible to add this implementation?

An example for gram matrix computation?

Hi,

Thanks for this great resource! The existing example available is about FIM computation. I am wondering if you could also provide an example for computing gram matrix of empirical NTK of a finite-width network (perhaps with a dummy dataloader)? This will be highly helpful.

Since this is a request, please feel free to close this issue ;-)

Cheers,
Tianlin

Error: `I do not know what to do with layer Embedding(50304, 512)`

First of all great library, I've always been looking for some ways to get jacobians and fisher information matrices for my PyTorch models.
While the library is fine with my vision models based on simple convolutional networks, I find it harder to use with Huggingface pretrained models.
To be clear, I believe the embedding layers are the culprit here.

I devised a dataloader taking text and returning a dictionary with "input_ids" and "attention_mask" which takes in a list of strings as input and yields a batch like a dictionary with the above keys and torch.Tensor of integer type as their values.

import torch
from torch.utils.data import Dataset, DataLoader
from transformers import AutoTokenizer
from transformers.tokenization_utils import BatchEncoding

torch_model = GPTNeoXForCausalLM.from_pretrained(
    pretrained_model_name_or_path=f"EleutherAI/pythia-70m-deduped",
    revision=f"step1000",
    cache_dir=cache_dir,
)

class FIMDataLoader(Dataset):
    def __init__(self, text_list, tokenizer, max_length=128):
        self.text_list = text_list
        self.tokenizer = tokenizer
        self.max_length = max_length

    def __len__(self):
        return len(self.text_list)

    def __getitem__(self, idx):
        text = self.text_list[idx]
        encoding = self.tokenizer(
            text,
            max_length=self.max_length,
            truncation=True,
            padding="max_length",
            # max_length=self.max_length,
            return_tensors="pt",
        )
        input_ids = encoding["input_ids"].squeeze()
        attention_mask = encoding["attention_mask"].squeeze()
        return input_ids, attention_mask


def collate_fn(batch):
    input_ids, attention_mask = zip(*batch)

    return BatchEncoding(
        {
            "input_ids": torch.stack(input_ids),
            "attention_mask": torch.stack(attention_mask),
        }
    )


def create_dataloader(text_list, tokenizer, batch_size, max_length, shuffle=False):
    dataset = FIMDataLoader(text_list, tokenizer, max_length)
    dataloader = DataLoader(
        dataset, batch_size=batch_size, shuffle=shuffle, collate_fn=collate_fn
    )
    return dataloader

then I instanciate the dataloader

texts_list = ["The cat is on the table", "Alice and Bob are friends"]
dataloader = create_dataloader(
    texts_list, tokenizer, batch_size=1, max_length=128, shuffle=False
)

For a model with a total of 70m parameters, having the entire Fisher matrix in memory is prohibitive, so I have chosen to use the diagonal with storage proportional to number of parameters, by choosing the PMatDiag representation you kindly provided in your library.

I thought this would give me the diagonal of the Fisher information matrix, right?
However, an error appears that seems related with LayerCollection creation.

from nngeometry.metrics import FIM
from nngeometry.object import PMatDiag

FIM(
    model=torch_model,
    loader=dataloader,
    representation=PMatDiag,
    n_output=1,
    device="cpu",
)

but I get the following error:

Exception                                 Traceback (most recent call last)
Cell In[93], line 6
      3 from nngeometry.metrics import FIM
      4 from nngeometry.object import PMatDiag
----> 6 F_ekfac = FIM(
      7     model=torch_model,
      8     loader=dataloader,
      9     representation=PMatDiag,
     10     n_output=1,
     11     device=\"cpu\",
     12 )

File ~/opt/miniconda3/envs/pythia/lib/python3.10/site-packages/nngeometry/metrics.py:147, in FIM(model, loader, representation, n_output, variant, device, function, layer_collection)
    144         return model(d[0].to(device))
    146 if layer_collection is None:
--> 147     layer_collection = LayerCollection.from_model(model)
    149 if variant == 'classif_logits':
    151     def function_fim(*d):

File ~/opt/miniconda3/envs/pythia/lib/python3.10/site-packages/nngeometry/layercollection.py:50, in LayerCollection.from_model(model, ignore_unsupported_layers)
     48     elif not ignore_unsupported_layers:
     49         if len(list(mod.children())) == 0 and len(list(mod.parameters())) > 0:
---> 50             raise Exception('I do not know what to do with layer ' + str(mod))
     52 return lc

Exception: I do not know what to do with layer Embedding(50304, 512)"
}
```

It looks like the reason why I get this error has to do with the Embedding layers (there are two embedding layers, one to convert token ids  from the vocabulary space (size 50304) to the latent space (size 512) and another embedding layer at the end to do viceversa.

What should I do to have the FIM diagonal of all model parameters?
Many thanks, and again, great package.

RuntimeError: Shape is invalid for input of size

I'm trying to use the latest git release of NNGeometry's FIM to find the Fisher metric of my trivial model. As a stupidly basic example which recreates my problem, I create a model which has a single Linear layer, a single training sample, and solves the matrix equation Ax=b, where A is a 3x3 matrix, whilst x, b are 3x1 col. vectors.

Here's my code (it's not meant for anything functional, it's just to replicate my problem):

import torch
import torch.nn as nn
import torch.optim as optim

class Net(nn.Module):
    def __init__(self, input_dim, output_dim):
        super(Net, self).__init__()

        self.linear = nn.Linear(input_dim, output_dim, bias=False)

    def forward(self, x):
        out = self.linear(x)
        return out

# Define the training data
A = torch.tensor([[1., 2., 3.],
                  [4., 5., 6.],
                  [7., 8., 9.]])

b = torch.tensor([[52.],
                  [124.],
                  [196.]])
# Define the model and the optimizer
model = Net(input_dim=9, output_dim=3)
optimizer = optim.Adam(model.parameters(), lr=0.01)

# Train the model
for epoch in range(2000):
    optimizer.zero_grad()
    y_pred = model(A.view(9))
    print(A@y_pred)
    loss = nn.MSELoss(reduction='sum')(A@y_pred.view((3,1)), b)
    loss.backward()
    optimizer.step()

# Evaluate the model
with torch.no_grad():
    y_pred = model(A.reshape(9))
    print("Solution:\n", y_pred)

Now I create a simple dataloader with that single training sample in (just as a proof of concept):

from torch.utils.data import DataLoader, Dataset

class TrivialDataset(Dataset):
    def __init__(self):
        self.data = torch.tensor([[1., 2., 3.], [4., 5., 6.], [7., 8., 9.]]).reshape(1,9)
    def __getitem__(self, index):
        return self.data[index]

    def __len__(self):
        return len(self.data)

# Create the Dataloader
batch_size = 1
dataset = TrivialDataset()
loader = DataLoader(dataset, batch_size=batch_size)

Now if I try to find the FIM:

from nngeometry.metrics import FIM
from nngeometry.object import PMatDense

fisher_metric = FIM(model, loader, n_output=1, variant='regression', representation=PMatDense, device='cpu')

There's a runtime error:

File [~/miniconda3/envs/torch/lib/python3.10/site-packages/nngeometry/generator/jacobian/__init__.py:77](https://file+.vscode-resource.vscode-cdn.net/Users/as/Desktop/tmp/nngeometry/nngeometry-examples/display_and_timings/~/miniconda3/envs/torch/lib/python3.10/site-packages/nngeometry/generator/jacobian/__init__.py:77), in Jacobian.get_covariance_matrix(self, examples)
     75 inputs.requires_grad = True
     76 bs = inputs.size(0)
---> 77 output = self.function(*d).view(bs, self.n_output) \
     78     .sum(dim=0)
     79 for i in range(self.n_output):
     80     self.grads.zero_()

RuntimeError: shape '[9, 1]' is invalid for input of size 3

I think this comes about because FIM is trying to reshape the output based on the input size. Is this correct?

Thanks

What is the use of classif_logits variant?

I read the code carefully, and I find the details of the implement of classif_logits variant is confusing. From my understanding, the log_softmax is aim to calculate the gradient faster and restore it through exp func. But what does it mean by returning (log_probs * probs**.5) which seems like a derivavie?
Here are code pieces below:

def function_fim(*d):
      log_probs = torch.log_softmax(function(*d), dim=1)
      probs = torch.exp(log_probs).detach()
      return (log_probs * probs**.5)

Furethermore, there are 'classif_logit' and 'regression' kinds of varient, what about a output combined with regression and classif_logit? As far as i am concerned, it should calculate each output with related mode?
I will be appreciate if anyone who can help me, thank you in advance.

shapes pbs (x2)

Code:

import numpy as np
from nngeometry.layers import WeightNorm1d
from continuum.datasets import InMemoryDataset

classifier = WeightNorm1d(in_features=512, out_features=20) # btw Also fail with nn.Linear
random_x_data = np.random.randint(0, 255, size=(20, 512))
random_y_data = np.arange(20)
data = InMemoryDataset(random_x_data, random_y_data).to_taskset()
fisher_loader = DataLoader(data, batch_size=128, shuffle=True, num_workers=6)
fim = FIM(model=classifier,
         loader=fisher_loader,
         representation=PMatDiag,
         n_output=20,
         variant='classif_logits',
         device='cpu')

Error:
image

If I solve the pb of view by modifying the weightnorm class, I get another error:
image

(I just modified the forward function of WeightNorm1d with : )

def forward(self, input: Tensor) -> Tensor:
    input = input.view(-1, self.in_features)
    norm2 = (self.weight**2).sum(dim=1, keepdim=True) + self.eps
    return F.linear(input,
                    self.weight / torch.sqrt(norm2))

compute FIM of partial parameters

First, thanks for the amazing work!
I want to compute the FIM of partial parameters which means only part of whole parameters requires gradients, is that possible?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.