vincentstimper / normalizing-flows Goto Github PK

PyTorch implementation of normalizing flow models

Home Page: https://joss.theoj.org/papers/10.21105/joss.05361

License: MIT License

Python 96.86% TeX 3.14%

normalizing-flow variational-autoencoder variational-inference real-nvp residual-flow glow pytorch neural-spline-flow density-estimation invertible-neural-networks

normalizing-flows's People

Contributors

Stargazers

Watchers

Forkers

arc82 donglin-wang2 rushi-the-neural-arch jihwanyoun mbaddar1 taalua kashimmirza prclibo alexruiz95 xjtuygao pkulwj1994 orcuslc dongzhuoyao insujeon dugu9sword zf223669 cy20lin telegraphroad ywran hbenazha cjim8889 saber5433 mathcom zhangwenwen jianxiangfeng andizy y-zyy aleksandermako lollcat joseantoniosiguenza izzatum robinschmid7 aravindshaj rahulbhadani haoheliu vincentberenz jiajiasun timothygebhard maulberto3 marikgoldstein anupsingh15 arnauqb jiafengttang shaun95 chenjiaj109550158 alexmiltenberger dataflowr mjack3 huazhanghu vgurev decandido zuhengxu austin-hoover jinping-z thargreaves hanfengzhai freeworkearth jwgu kevindarby zhichaoyou nzpeng thisisalbertliang kazewong harshasatyavardhan aaron1993 rowankell jxi24 yinxip igomezv rudolfwilliam jasonkena statmixedml fzy28 mq-jonathan-xu shaneinglis qianbot shifengxu mattcleigh jackerschott bashprince xiaomoguhzz fuataydin8 chaoqun-guo slowmoyang kingusiu pburauel tsplittg rah-man nventis fsxbhyy jacketdembys saptarshi-saha-1996 cc299792458 anonymous-sup yolibernal

normalizing-flows's Issues

Remove Lambda's

In the MADE class, there is the preprocessing attribute created as lambda. Please remove this or it will be impossible to torch.save().

Plotting Code in flow_plot does not work

Issue while computing the `flow.forward` function

Hi,

I was wondering why there was a minus here

normalizing-flows/normflows/core.py

Line 54 in 0466e7f

log_det -= log_d

when computing the forward log jacobian ? Shouldn't it be a plus ? I made my own test case (see bellow) and the jacobian core test

normalizing-flows/normflows/core_test.py

Line 58 in 0466e7f

assert_close(log_det, -log_det_)

fails for me.

# Libraries
import torch
import numpy as np
import normflows as nf
from tqdm import trange

# Train a flow
def train_flow(flow, base, target, batch_size=1024, lr=5e-4, n_iter=4096):
	# Optimizer
	optimizer = torch.optim.Adam(flow.parameters(), lr=lr)
	# Train the flow
	r = trange(n_iter, unit='step', desc='loss')
	for i in r:
	    # Reset the optimizer
	    optimizer.zero_grad()
	    # Sample the target
	    target_samples = target.sample((batch_size,))
	    # Estimate the loss
	    z, log_jac = flow.inverse_and_log_det(target_samples)
	    flow_loss = -(base.log_prob(z) + log_jac).mean()
	    # Backward pass
	    flow_loss.backward()
	    optimizer.step()
	    # Debug
	    r.set_description('loss = {}'.format(round(flow_loss.item(), 4)), refresh=True)

# Make the device
device = torch.device('cuda')

# Make the arguments
dim = 2

# Make the base
base = torch.distributions.MultivariateNormal(
    loc=torch.zeros((dim,), device=device),
    covariance_matrix=torch.eye(dim, device=device)
)
locs = torch.ones((2, dim), device=device)
locs[1] *= -1
target = torch.distributions.MixtureSameFamily(
	mixture_distribution=torch.distributions.Categorical(torch.ones((2,), device=device)),
	component_distribution=torch.distributions.MultivariateNormal(
	    loc=locs,
	    covariance_matrix=torch.stack([0.3 * torch.eye(dim, device=device)])
	)
)

# Make the flow
base_ = nf.distributions.DiagGaussian(dim, trainable=False)
flows = []
b = torch.Tensor([1 if i % 2 == 0 else 0 for i in range(2)])
for i in range(2):
    s = nf.nets.MLP([2, 2 * 2, 2], init_zeros=True)
    t = nf.nets.MLP([2, 2 * 2, 2], init_zeros=True)
    if i % 2 == 0:
        flows += [nf.flows.MaskedAffineFlow(b, t, s)]
    else:
        flows += [nf.flows.MaskedAffineFlow(1 - b, t, s)]
    flows += [nf.flows.ActNorm(2)]
flow = nf.NormalizingFlow(base, flows)
flow = flow.to(device)

# Launch training
train_flow(flow, base, target, batch_size=2048, lr=1e-3)

# Test bijectivity
z = base.sample(sample_shape=(2048,))
with torch.no_grad():
	x, log_jac_x = flow.forward_and_log_det(z)
	z_z, log_jac_z_z = flow.inverse_and_log_det(x)
with torch.no_grad():
	print('norm(z - z_z) = ', torch.linalg.norm((z - z_z).view((-1, 2)), dim=-1).mean())
	print('mse(log_jac_z_z + log_jac_x) = ', torch.mean(torch.square(log_jac_z_z + log_jac_x)))
	print('mse(log_jac_z_z - log_jac_x) = ', torch.mean(torch.square(log_jac_z_z - log_jac_x)))

# Display the push-forward
import matplotlib.pyplot as plt
res = 64
x_min, x_max, y_min, y_max = -3, 3, -3, 3
x = np.linspace(x_min, x_max, res)
y = np.linspace(x_min, x_max, res)
X, Y = np.meshgrid(x, y)
Z = np.stack([X,Y], axis=-1).reshape((-1,2))
Z = torch.from_numpy(Z).float().to(device)
Z_z, Z_z_log_jac = flow.inverse_and_log_det(Z)
log_prob_flow_forward = base.log_prob(Z_z) + Z_z_log_jac
log_prob_flow_forward = log_prob_flow_forward.reshape((res, res)).detach().cpu()
plt.contourf(X, Y, torch.exp(log_prob_flow_forward), 20, cmap='GnBu')
plt.xlabel('x')
plt.ylabel('y')
plt.title('Push-forward')
plt.show()

This code returns

norm(z - z_z) =  tensor(2.3041e-07, device='cuda:0')
mse(log_jac_z_z + log_jac_x) =  tensor(1.3097, device='cuda:0')
mse(log_jac_z_z - log_jac_x) =  tensor(6.4206e-15, device='cuda:0')

Conditional Flows implementation / documentation

Hello, it's not clear to me if this package implements a conditional flow $p(x| u)$ where $u$ is continuous (i.e. $u \in \mathbb{R}^n$) as opposed to a discrete class label.

If so, could you please point me to an example if you have one?
If it is implemented, I would also suggest some tweaks to documentation to make that more clear.

Thanks for the nice package!

multi-gpu implementation

Does normflows support parallel computing on multi-GPU?

RuntimeError: output with shape [32] doesn't match the broadcast shape [1, 32]

Hi, when doing experiments, I'd suggest doing some other tutorials, for example for the ClassCondFlow, as while trying to one on my own, I keep encountering this error.

What dou you mean by "Augmented Normalizing Flow based on Real NVP"?

Nice work!
But I am a bit confused about your implementation in the examples, "augmented_flow.ipynb".
I wonder what do you mean "Augmented Normalizing Flow" and the "Set augmented target"?

# Set augmented target
target = nf.distributions.TwoIndependent(nf.distributions.TwoMoons(), 
                                         nf.distributions.DiagGaussian(2))

Is there some reference for further explanation about why it works?
Thanks!

Putting examples in the documentation

Raising issue as part of JOSS review openjournals/joss-reviews#5361

The examples in the repo are currently hosted under the https://github.com/VincentStimper/normalizing-flows/tree/master/examples folder. I think it would be nice to add a page in the documentation site that display these ipython notebook https://vincentstimper.github.io/normalizing-flows/, just like the API page.

Note that this issue is a nice to have that I don't think is necessary for JOSS review, so it shouldn't block the review process.

No community contribution guideline

Raising issue as part of JOSS review openjournals/joss-reviews#5361

Currently, there is no community contribution guideline I can find. Per Joss review checklist:

Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

There should be a standard template, it would be good to have an explicit contribution guide in the root directory

How the inverse was calculated

Hi,
I do not understand how to compute the inverse normalizing flow. Could you explain how do you compute the inverse function? Thank you.
Link:
https://github.com/VincentStimper/normalizing-flows/blob/ce86fe79d7cdc34f0362f487194d0cebb1999edc/normflows/flows/planar.py#LL66C3-L66C3

  def inverse(self, z):
        if self.act != "leaky_relu":
            raise NotImplementedError("This flow has no algebraic inverse.")
        lin = torch.sum(self.w * z, list(range(1, self.w.dim()))) + self.b
        a = (lin < 0) * (
            self.h.negative_slope - 1.0
        ) + 1.0  # absorb leakyReLU slope into u
        inner = torch.sum(self.w * self.u)
        u = self.u + (torch.log(1 + torch.exp(inner)) - 1 - inner) \
            * self.w / torch.sum(self.w ** 2)
        dims = [-1] + (u.dim() - 1) * [1]
        u = a.reshape(*dims) * u
        inner_ = torch.sum(self.w * u, list(range(1, self.w.dim())))
        z_ = z - u * (lin / (1 + inner_)).reshape(*dims)
        log_det = -torch.log(torch.abs(1 + inner_))
        return z_, log_det

NICE demo?

Thanks for your awesome repo.
Could you provide NICE demo?

issue about ConditionalNormalizingFlow

Thanks for this amazing and high quality repo. But I meet some issue about ConditionalNormalizingFlow. i checked this repo, However, it seems like only MaskedAffineAutoregressive support the context. how to implement other class support ConditionalNormalizingFlow?

Inconsistency between log_q and log_p in Encoder and NormalizingFlowVAE

In NormalizingFlowVAE class in core.py, this line says that the encoder outputs log_q:

z, log_q = self.q0(x, num_samples=num_samples)

Suppose that, as in this example, the encoder Gaussian (q0) is parameterized by an MLP. Looking at distributions.encoder.py source code, the forward method of NNDiagGaussian class says that it outputs log_p:

return z, log_p

Inconsistency or not?

Seeking Advice on Designing an Invertible Neural Network for Fission

First and foremost, I would like to express my sincere gratitude and respect for your work on this repository. The progress and innovations shared here have been immensely insightful and valuable to the community.

I am currently exploring the concept of fission in invertible neural networks, where a single latent representation 'x' can be decomposed into two distinct components 'y' and 'z'. My objective is to parameterize 'z' with a tractable distribution while ensuring that the combination of 'y' and 'z' can be accurately recombined to reconstruct 'x' using the reverse of the model.

Given your expertise in this field, I would greatly appreciate any guidance or suggestions you could provide on the following aspects:

Design Strategies: What are the best practices or strategies in designing such an invertible network that can effectively decompose and recombine representations?
Parameterization of 'z': How can 'z' be parameterized with a tractable distribution, and what are the implications of different distribution choices?
Ensuring Reversibility: What are the key considerations to ensure that the network remains reversible and accurate in the reconstruction phase?

Any insights, references, or examples you could share would be extremely helpful.

Thank you for your time and for the impactful contributions you've made to the field.

Best regards

Reverse using z

Hi, thanks for the code!
This code is learning a lot for me.

But besides sampling from the distribution,
I wonder how to implement reverse using latent z. (reverse flow)

one-dimensional coupling flows do not work

For the following example:

import numpy as np
import torch as T

import normflows as nf

base_dist = nf.distributions.base.DiagGaussian(1, trainable=False)

flows = []
for _ in range(16):
    flows += [nf.flows.CoupledRationalQuadraticSpline(
            num_input_channels=1, num_context_channels=1,
            num_blocks=5, num_hidden_channels=256, reverse_mask=True)]
    flows += [nf.flows.Permute(1)]

network = nf.ConditionalNormalizingFlow(q0=base_dist, flows=flows)
network.train()
network = network.to('cuda')

optim = T.optim.Adam(network.parameters())
for i in range(2):
    x = T.randn(10_000, 1).to('cuda')
    c = T.randn(10_000, 1).to('cuda')

    loss = network.forward_kld(x, c)
    print('loss:', loss)

    optim.zero_grad()
    loss.backward()
    optim.step()

    print(any([T.any(T.isnan(p)) for p in network.parameters()]))

I encounter the following issues:

I get a NaN-loss and NaN network parameters after the first optimizer step.
When I remove reverse_mask=True, I get a shape error.
I get a shape error with or without reverse_mask=True when I use no context dimension.

I specifically only get 1 when I perform the computation on the GPU, not on the CPU. Presumably because of precision issues?
It is clear why 2 happens, since in that case our one-dimension gets mapped to the identity split, instead of the transform split.

What I would expect instead is that the one-dimensional case works and works out of the box, specifically without having to set reverse_mask=True.

To be fair, one could easily argue that one shouldn't use coupling flows for 1d and instead use an autoregressive flow, which doesn't really change anything (except that it works without problems now). Following this thought, I would simply suggest to throw an error if one attempts to initialize a coupling flow with only one channel .

Comments on JOSS paper

Raising issue as part of JOSS review openjournals/joss-reviews#5361

I will put comments related to the JOSS paper itself in this thread. Overall the draft is concise and clear about the package and its statement of need, though there are a couple of things I would like the authors to clarify ( I am putting them in on a rolling basis, still have some deadlines to catch!) :

In the statement of need, the authors claim other packages such as distrax do not support autoregressive flow. While not having explicitly saying they offer an autoregressive flow, distrax actually has some lower/upper triangular matrix bijector (https://github.com/deepmind/distrax/tree/master/distrax/_src/bijectors), which I believe corresponds to an autoregressive flow. Considering the design choice of distrax are just the bijectors and distribution in jax, I would not phrase it such features do not exist in their code, even though it is not extensively marketed. I suggest rewording the first part of the statement of need to soften the tone .

Example usage for images

Raising issue as part of JOSS review openjournals/joss-reviews#5361

Please consider adding example usage (Colab notebook) for image data e.g. CIFAR10 or MNIST.

More functionality

Raising issue as part of JOSS review openjournals/joss-reviews#5361

Could you please show how could one implement multi-scale architecture for normalizing flows?
If it is not possible, please add this functionality.

Please consider adding augmented normalizing flows (https://arxiv.org/pdf/2002.09741.pdf, https://arxiv.org/pdf/2002.07101.pdf)

Flow++

Hello!

This is an excellent repository that I am using for my Ph.D. Thank you for such a fantastic contribution!
I wonder if you have plans to develop Flow++ https://arxiv.org/abs/1902.00275.

By the way, for making likelihood estimation, is another method (on this repo) more suitable than Flow++? What is your opinion? thanks

Tweaks for full testing

Make certain dimensions variable to accommodate cifar10 and cifar100
Implement cuda support

forward_kld with weighted training samples

Love this library, thanks so much @VincentStimper for your efforts in putting this together.

I have been working lately with weighted samples and I figured out a simple change to the loss to allow for this. There are a couple ways it could be implemented, but one suggestion is to change this method to something like this

def forward_kld(self, x, weights=None):  # adds optional weight argument
    """Estimates forward KL divergence, see [arXiv 1912.02762](https://arxiv.org/abs/1912.02762)

    Args:
      x: Batch sampled from target distribution

    Returns:
      Estimate of forward KL divergence averaged over batch
    """
    log_q = torch.zeros(len(x), device=x.device)
    z = x
    for i in range(len(self.flows) - 1, -1, -1):
        z, log_det = self.flows[i].inverse(z)
        log_q += log_det
    log_q += self.q0.log_prob(z)
    
    if weights is not None:       # Apply weights if passed as an arg 
        log_q = log_q * weights

    return -torch.mean(log_q)

I've implemented this on a fork and it works well. Happy to provide the math if needed. It's a simple modification based on importance sampling.

Any thoughts? Otherwise I may just open up a PR for these simple changes.

Normalizing Flow vs Normalizing Flow VAE behavior

I can't help but to wonder why the NormalizingFlow class use the flows' inverse method when computing forward_kl, but, on the contrary, when using the NormalizingFlowVAE, it uses the flows' forward method.

This way, when trying to fit MNIST with NormalizingFlow, when training and passing a batch of say (64, 784) images I get the following error:

     34 for i in range(len(self.flows) - 1, -1, -1):
     35     z, log_det = self.flows[i].inverse(z)
---> 36     log_q += log_det
     37 log_q += self.q0.log_prob(z)
     38 return -torch.mean(log_q)

RuntimeError: output with shape [64] doesn't match the broadcast shape [1, 64]

Any help/suggestion?

Negative KL divergence

Hello, first of all, thanks for the great package! I am noticing that for the NF given in the README, I obtain negative KL values when training, but this does not happen with other flows. I have realized that if I set the scale_map parameter to sigmoid the problem is gone. Here is an example where I try to fit a Normal distribution:

import normflows as nf
import torch

def make_flow(scale_map="sigmoid"):
    base = nf.distributions.base.DiagGaussian(4)
    num_layers = 5
    flows = []
    for i in range(num_layers):
        param_map = nf.nets.MLP([2, 64, 2], init_zeros=True)
        flows.append(nf.flows.AffineCouplingBlock(param_map, scale_map=scale_map))
        flows.append(nf.flows.Permute(4, mode='shuffle'))
    model = nf.NormalizingFlow(base, flows)
    return model

def train_flow(flow, target):
    optimizer = torch.optim.Adam(flow.parameters(), lr=5e-3)
    best_loss = torch.inf
    while True:
        n_epochs_without_improvement = 0
        optimizer.zero_grad()
        flow_samples, flow_lps = flow.sample(10000)
        target_lps = target.log_prob(flow_samples)
        loss = (flow_lps - target_lps).mean()
        if loss < best_loss:
            best_loss = loss.item()
            print(best_loss)
            n_epochs_without_improvementchs_without_improvement = 0
        else:
            n_epochs_without_improvement += 1
        if n_epochs_without_improvement > 50:
            break
        loss.backward()
        optimizer.step()
    return best_loss

target = torch.distributions.MultivariateNormal(torch.tensor([-3, -2., -1., 0.]), torch.eye(4))

flow_sigmoid = make_flow(scale_map = "sigmoid")
best_loss_sigmoid = train_flow(flow_sigmoid, target)
# This returns a value close to 0, as you would expect.

flow_exp = make_flow(scale_map = "exp")
best_loss_exp = train_flow(flow_exp, target)
# This returns a very negative value.

My first thought was numerical error but I think the value is too negative to be the case. Any help?

Thank you very much!

exp and sigmoid may cause inf.

Can we do a clamp like before z2 is returned in AffineCoupling?

For example:

class AffineCouplingStable(Flow):
    """
    Affine Coupling layer as introduced RealNVP paper, see arXiv: 1605.08803
    """

    def __init__(self, param_map, scale=True, scale_map="exp"):
        """Constructor

        Args:
          param_map: Maps features to shift and scale parameter (if applicable)
          scale: Flag whether scale shall be applied
          scale_map: Map to be applied to the scale parameter, can be 'exp' as in RealNVP or 'sigmoid' as in Glow, 'sigmoid_inv' uses multiplicative sigmoid scale when sampling from the model
        """
        super().__init__()
        self.add_module("param_map", param_map)
        self.scale = scale
        self.scale_map = scale_map

    def forward(self, z):
        """
        z is a list of z1 and z2; ```z = [z1, z2]```
        z1 is left constant and affine map is applied to z2 with parameters depending
        on z1

        Args:
          z
        """
        z1, z2 = z
        param = self.param_map(z1)
        if self.scale:
            shift = param[:, 0::2, ...]
            scale_ = param[:, 1::2, ...]
            if self.scale_map == "exp":
                z2 = z2 * torch.exp(scale_) + shift
                z2 = z2.clamp(min=-1e6, max=1e6)
                log_det = torch.sum(scale_, dim=list(range(1, shift.dim())))
            elif self.scale_map == "sigmoid":
                scale_inv = 1 + torch.exp(-(scale_ + 2)) # 1 / sigmoid(scale_ + 2)
                z2 = z2 * scale_inv + shift
                z2 = z2.clamp(min=-1e6, max=1e6)
                log_scale = nn.functional.logsigmoid(scale_ + 2)
                log_det = -torch.sum(log_scale, dim=list(range(1, shift.dim())))
            elif self.scale_map == "sigmoid_inv":
                scale = torch.sigmoid(scale_ + 2)
                z2 = z2 * scale + shift
                z2 = z2.clamp(min=-1e6, max=1e6)
                log_scale = nn.functional.logsigmoid(scale_ + 2)
                log_det = torch.sum(log_scale, dim=list(range(1, shift.dim())))
            else:
                raise NotImplementedError("This scale map is not implemented.")
        else:
            z2 = z2 + param
            log_det = zero_log_det_like_z(z2)
        return [z1, z2], log_det

    def inverse(self, z):
        z1, z2 = z
        param = self.param_map(z1)
        if self.scale:
            shift = param[:, 0::2, ...]
            scale_ = param[:, 1::2, ...]
            if self.scale_map == "exp":
                z2 = (z2 - shift) * torch.exp(-scale_)
                z2 = z2.clamp(min=-1e6, max=1e6)
                log_det = -torch.sum(scale_, dim=list(range(1, shift.dim())))
            elif self.scale_map == "sigmoid":
                scale = torch.sigmoid(scale_ + 2)
                z2 = (z2 - shift) * scale
                z2 = z2.clamp(min=-1e6, max=1e6)
                log_scale = nn.functional.logsigmoid(scale_ + 2)
                log_det = torch.sum(log_scale, dim=list(range(1, shift.dim())))
            elif self.scale_map == "sigmoid_inv":
                scale_inv = 1 + torch.exp(-(scale_ + 2)) # 1 / sigmoid(scale_ + 2)
                z2 = (z2 - shift) * scale_inv
                z2 = z2.clamp(min=-1e6, max=1e6)
                log_scale = nn.functional.logsigmoid(scale_ + 2)
                log_det = -torch.sum(log_scale, dim=list(range(1, shift.dim())))
            else:
                raise NotImplementedError("This scale map is not implemented.")
        else:
            z2 = z2 - param
            log_det = zero_log_det_like_z(z2)
        return [z1, z2], log_det

Sampling from flow raises deprecation warning

Running the following minimal example:

import normflows as nf
import torch

torch.manual_seed(42)

flow = nf.NormalizingFlow(
    nf.distributions.DiagGaussian(1, trainable=False),
    [
        nf.flows.AutoregressiveRationalQuadraticSpline(1, 1, 1),
        nf.flows.LULinearPermute(1)
    ]
)

with torch.no_grad():
    samples_flow, _ = flow.sample(4)

print(samples_flow)

raises a UserWarning about an upcoming deprecation:

/Users/timothy/Desktop/normalizing-flows/normflows/flows/mixing.py:437: UserWarning: torch.triangular_solve is deprecated in favor of torch.linalg.solve_triangular and will be removed in a future PyTorch release.
torch.linalg.solve_triangular has its arguments reversed and does not return a copy of one of the inputs.
X = torch.triangular_solve(B, A).solution
should be replaced with
X = torch.linalg.solve_triangular(A, B). (Triggered internally at  /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/BatchLinearAlgebra.cpp:2189.)
  outputs, _ = torch.triangular_solve(

I will submit a PR shortly that fixes the issue 🙂

Cannot have an odd latent_size (working with 2, 4, etc. , but not 3 or 5), shape problem

Hello, here is a code that just builds a rnvp flow with a gaussian base, and run forward on a random set of points (no training).
It works fine with latent_size=2 or latent_size=4, but not if you put latent_size=3 or latent_size=5.

Could you help me find the problem?

#%% # Import required packages
import torch
import normflows as nf
import numpy as np

#%% Real NVP model, with Gaussian base distribution
latent_size=5

# Define 2D Gaussian base distribution
base = nf.distributions.base.DiagGaussian(latent_size)

# Define list of flows
num_layers = 8
flows = []
for i in range(num_layers):
    # Neural network with two hidden layers having 64 units each
    # Last layer is initialized by zeros making training more stable
    param_map = nf.nets.MLP([latent_size//2, 64, 64, latent_size], init_zeros=True)
    # Add flow layer
    flows.append(nf.flows.AffineCouplingBlock(param_map))
    # Swap dimensions
    flows.append(nf.flows.Permute(latent_size, mode='swap'))


# Construct flow model
model = nf.NormalizingFlow(base, flows)
model = model.to('cpu')

# %%
X=np.random.rand(10,latent_size)
X_tensor = torch.tensor(X, dtype=torch.float32)
z = model.forward(X_tensor)

The error happens in the last line, and in fact, trying to run just the first flow of the loop in forward will do the same:
model.flows[0](X_tensor)

With latent_size=3, I have:
RuntimeError: mat1 and mat2 shapes cannot be multiplied (10x2 and 1x64)

With latent_size=5, I have:
RuntimeError: mat1 and mat2 shapes cannot be multiplied (10x3 and 2x64)

For this last one, if I print model.flows[0], I have the architecture herebelow, so I feel like the problem is between the Split that has to separate the variables in two sets, and the first Linear 2x64, and has to do in the code with the integer division latent_size//2.

Is the problem that when it is an odd number, we cannot split in two set of variables of the same number? How can I deal with that?

AffineCouplingBlock(
  (flows): ModuleList(
    (0): Split()
    (1): AffineCoupling(
      (param_map): MLP(
        (net): Sequential(
          (0): Linear(in_features=2, out_features=64, bias=True)
          (1): LeakyReLU(negative_slope=0.0)
          (2): Linear(in_features=64, out_features=64, bias=True)
          (3): LeakyReLU(negative_slope=0.0)
          (4): Linear(in_features=64, out_features=5, bias=True)
        )
      )
    )
    (2): Merge()
  )
)

Implementation for spheres

Hey,

thanks for your framework.
Are you currently planning to implement flows on spheres supporting forward and backward KLD as well?

Thanks and all the best

Calculating forward KL divergence (probability density maximization), I get negative loss results on my dataset, is this reasonable?

bad log_prob values from AutoregressiveRationalQuadraticSpline

Hello, first I want to say that this is a really cool library, great job!

I was experimenting with normalizing flows using this library, starting with the example notebooks that are provided. I made a couple of modifications and began seeing log likelihood values that did not seem correct.

I defined my flow model like so:

def splineFlowModelConstructor(base_dist, latent_size, hidden_units_per_flow_block = 256, hidden_layers_per_flow_block = 2, num_layers_in_model = 32):
  # Define list of flows
  flows = []
  for i in range(num_layers_in_model):
      flows += [nf.flows.AutoregressiveRationalQuadraticSpline(latent_size, hidden_layers_per_flow_block, hidden_units_per_flow_block)]
      flows += [nf.flows.LULinearPermute(latent_size)]

  # Construct flow model
  model = nf.NormalizingFlow(base_dist, flows)
  return model

And my base and target distributions:

N = 8
base = nf.distributions.base.GaussianMixture(1, N, loc=torch.zeros(1, N), scale=1.0 * torch.ones(1, N))
target = nf.distributions.base.GaussianMixture(4, N, loc=40.0 * torch.rand(4, N))

Looking at the first 4 of 8 dimensions, it looks like the NF model fits the data quite well:

And in the inverse direction, from the target distribution to the base distribution:

The problem is when I run model.log_prob( samples ) I get really large values:

model.eval()
model.cpu()

sample = target.sample(100).float()
sample -= mean
sample /= stdev

probs = model.log_prob( sample )

print(probs)
print(torch.exp(probs)[0].item())

Output:

tensor([ 5.4699,  6.4249,  7.8194,  6.3206,  7.5243,  5.1152,  5.9383,  5.5442,
         6.2432,  5.4947,  6.8944, -0.6838,  4.9015,  3.7826,  2.4365,  2.1861,
         4.3309,  6.7556,  6.9447,  5.0362,  7.6152,  7.5655,  6.9386,  6.1994,
         3.9858,  5.7908,  5.1715,  8.4266,  7.3475,  6.3807,  6.9416,  5.8353,
         4.7075,  5.3876,  7.7150,  8.0491,  4.9048,  6.5957,  6.0490,  7.3908,
         4.8446,  6.9842,  3.0278,  5.0358,  7.4219,  8.0341,  6.4992,  2.5209,
         0.4988,  3.3599,  3.3678,  6.3835,  3.1170,  4.1072,  6.1465,  5.8682,
        -0.0433,  4.1949,  8.7846, -1.4585,  4.4729,  7.9508,  0.6114,  4.1969,
         5.7270,  6.0191,  5.7380,  5.4258,  3.0765,  1.0877,  5.6288,  7.1016,
         3.5127,  3.6730,  1.7773,  6.5651,  5.0568,  2.3242,  4.6668,  4.2033,
         6.3310,  4.6139,  6.9517,  7.2128,  4.2161,  8.1131,  5.9864,  5.3085,
         3.9264,  4.5639,  5.2759,  8.6771,  4.5639,  4.9707,  6.1224,  6.1103,
         3.8134,  3.1536,  4.1925,  0.8635], grad_fn=<AddBackward0>)
237.43670654296875

Am I correct that those are supposed to be negative log_prob values?

Best,

Negative KL divergence

Hi!

I am using your package and get negative KLDiv when training. I am not sure as to why. I saw #29 but I suspect that solution is not applicable in my case.

Here is how the model is made:

b = torch.Tensor([1 if i % 2 == 0 else 0 for i in range(latent_dim)])

s = nf.nets.MLP([latent_dim, 2 * latent_dim, latent_dim], init_zeros=True)
t = nf.nets.MLP([latent_dim, 2 * latent_dim, latent_dim], init_zeros=True)
flows = [nf.flows.MaskedAffineFlow(b, t, s)]
flows += [nf.flows.ActNorm(self.latent_dim)]

base = nf.distributions.base.DiagGaussian(latent_dim)

# Construct flow model
self.nfm = nf.NormalizingFlow(base, flows)

And here is the training loop:

optimizer = torch.optim.Adam(self.nfm.parameters(), lr=lr, weight_decay=weight_decay)
loss_list = []

for epoch in range(n_epochs):
    print(f"Start epoch number {epoch + 1}")

    batch_cum_loss = 0
    n_batches = len(nf_train_loader)

    for batch_idx, (inputs, labels) in enumerate(nf_train_loader):
        batch_size = inputs.shape[0]

        inputs_cls = inputs.to(self.device)
        labels_cls = labels.to(self.device)
        optimizer.zero_grad()

        with torch.no_grad():
            outputs, _, latent = self.net(inputs_cls)

        # Compute loss
        loss = self.nfm.forward_kld(latent[-1])

Where latent[-1] is an intermediate output of a given network (before the classifier).

The loss that comes out is negative whereas if sklearn method mutual_info_score i get a positive number:

q = torch.normal(mean=0, std=1, size=(batch_size, latent_dim))
res = mutual_info_score(latent[-1].view(-1,), q.view(-1,))
res = kl_div(latent[-1], q)
res = kl_loss(latent[-1], q)

As evident in the loss graph, the loss values are also not stable being negative - although when overlooking the sign, the graph does look like a normal training graph.

I would appreciate any help!

ClassCondFlow example missing

Would be great to see an example for the conditional class. Will try to create one.

Categorical Input Features?

Hi,

I was wondering if you package can support a dataset with input continuous and categorical features.

Many thanks!

Conditional Coupling Layers

It seems that some layers do not support conditioning, like the affine coupling block. Would it be possible to extend this, as suggested in e.g. https://arxiv.org/pdf/1912.00042 ? Wouldn't be much effort, just need to make the affine coupling conditional and the rest can stay.

Forward and Inverse with log det function for `MultiscaleFlow`

Related to this Issue, we should add a forward and inverse function that returns the output from the latents and vice versa together with the log determinant.

Could you give an example for NICE?

@VincentStimper
Awesome repo. Thanks
I want to work on NICE. Didn't find the NICE example here. Could you give NICE example?

NormalizingFlow class in core.py does not provide context in forward_kld

Thank you for a repo that's easy to handle with a normalizing flow of one's choice!

I would like to implement a normalizing flow that optimizes multiple target distributions at once depending on the context I would provide to it. Yet, currently, afai, no context can be providided in the .forward_kld method of the NormalizingFlow class.

Would be great if that's added!

Cheers,

Yves

The original glow seems to use `ConditionalDiagGaussian`

The logp implemented in the original glow is $\log(p(z_2|\mu(z_1),\sigma(z_1)))$ instead of $\log(p(z_2|\mu,\sigma))$, while the MultiscaleFlow don't support log_q += self.q0[i].log_prob(z_, context=z) currently.

Replication of comparable glow with papers

Hi, Thanks for developing this package. I find it very neat and flexible and would like to use it for my research. I noticed that in the paper "Resampling Base Distributions of Normalizing Flows", the bpd of your glow can reach 3.2~3.3, which is comparable to the original paper.

I was wondering if it is possible to share your training scripts and details to train glow on cifar10 to achieve the above bpd. The current example notebook is too sketchy and only results in bpd of 3.8. Thanks very much!

vincentstimper / normalizing-flows Goto Github PK

normalizing-flows's People

Contributors

Stargazers

Watchers

Forkers

normalizing-flows's Issues

Recommend Projects

Recommend Topics

Recommend Org