vincentstimper / normalizing-flows Goto Github PK
View Code? Open in Web Editor NEWPyTorch implementation of normalizing flow models
Home Page: https://joss.theoj.org/papers/10.21105/joss.05361
License: MIT License
PyTorch implementation of normalizing flow models
Home Page: https://joss.theoj.org/papers/10.21105/joss.05361
License: MIT License
Would be great to see an example for the conditional class. Will try to create one.
https://vincentstimper.github.io/normalizing-flows/references/#normflows.flows does not have an entry for y
in the forward-kid function even though it is an input argument for forward_kld
.
I was wondering what the description of y
should be in this method? In log_prob(x, y)
, it specifies that y
is the class. Is this the case? So given say a dataset of (X, y)
, we feed in batches of X and class labels y and want to minimize the KL divergence between the decoded \hat{X}
and input X over each class label distribution?
Thanks for your awesome repo.
Could you provide NICE demo?
Raising issue as part of JOSS review openjournals/joss-reviews#5361
The examples in the repo are currently hosted under the https://github.com/VincentStimper/normalizing-flows/tree/master/examples folder. I think it would be nice to add a page in the documentation site that display these ipython notebook https://vincentstimper.github.io/normalizing-flows/, just like the API page.
Note that this issue is a nice to have that I don't think is necessary for JOSS review, so it shouldn't block the review process.
Hello, first of all, thanks for the great package! I am noticing that for the NF given in the README, I obtain negative KL values when training, but this does not happen with other flows. I have realized that if I set the scale_map
parameter to sigmoid
the problem is gone. Here is an example where I try to fit a Normal distribution:
import normflows as nf
import torch
def make_flow(scale_map="sigmoid"):
base = nf.distributions.base.DiagGaussian(4)
num_layers = 5
flows = []
for i in range(num_layers):
param_map = nf.nets.MLP([2, 64, 2], init_zeros=True)
flows.append(nf.flows.AffineCouplingBlock(param_map, scale_map=scale_map))
flows.append(nf.flows.Permute(4, mode='shuffle'))
model = nf.NormalizingFlow(base, flows)
return model
def train_flow(flow, target):
optimizer = torch.optim.Adam(flow.parameters(), lr=5e-3)
best_loss = torch.inf
while True:
n_epochs_without_improvement = 0
optimizer.zero_grad()
flow_samples, flow_lps = flow.sample(10000)
target_lps = target.log_prob(flow_samples)
loss = (flow_lps - target_lps).mean()
if loss < best_loss:
best_loss = loss.item()
print(best_loss)
n_epochs_without_improvementchs_without_improvement = 0
else:
n_epochs_without_improvement += 1
if n_epochs_without_improvement > 50:
break
loss.backward()
optimizer.step()
return best_loss
target = torch.distributions.MultivariateNormal(torch.tensor([-3, -2., -1., 0.]), torch.eye(4))
flow_sigmoid = make_flow(scale_map = "sigmoid")
best_loss_sigmoid = train_flow(flow_sigmoid, target)
# This returns a value close to 0, as you would expect.
flow_exp = make_flow(scale_map = "exp")
best_loss_exp = train_flow(flow_exp, target)
# This returns a very negative value.
My first thought was numerical error but I think the value is too negative to be the case. Any help?
Thank you very much!
Raising issue as part of JOSS review openjournals/joss-reviews#5361
Currently, there is no community contribution guideline I can find. Per Joss review checklist:
Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support
There should be a standard template, it would be good to have an explicit contribution guide in the root directory
Raising issue as part of JOSS review openjournals/joss-reviews#5361
Could you please show how could one implement multi-scale architecture for normalizing flows?
If it is not possible, please add this functionality.
Please consider adding augmented normalizing flows (https://arxiv.org/pdf/2002.09741.pdf, https://arxiv.org/pdf/2002.07101.pdf)
I can't help but to wonder why the NormalizingFlow class use the flows' inverse method when computing forward_kl, but, on the contrary, when using the NormalizingFlowVAE, it uses the flows' forward method.
This way, when trying to fit MNIST with NormalizingFlow, when training and passing a batch of say (64, 784) images I get the following error:
34 for i in range(len(self.flows) - 1, -1, -1):
35 z, log_det = self.flows[i].inverse(z)
---> 36 log_q += log_det
37 log_q += self.q0.log_prob(z)
38 return -torch.mean(log_q)
RuntimeError: output with shape [64] doesn't match the broadcast shape [1, 64]
Any help/suggestion?
Hi,
I was wondering if you package can support a dataset with input continuous and categorical features.
Many thanks!
Love this library, thanks so much @VincentStimper for your efforts in putting this together.
I have been working lately with weighted samples and I figured out a simple change to the loss to allow for this. There are a couple ways it could be implemented, but one suggestion is to change this method to something like this
def forward_kld(self, x, weights=None): # adds optional weight argument
"""Estimates forward KL divergence, see [arXiv 1912.02762](https://arxiv.org/abs/1912.02762)
Args:
x: Batch sampled from target distribution
Returns:
Estimate of forward KL divergence averaged over batch
"""
log_q = torch.zeros(len(x), device=x.device)
z = x
for i in range(len(self.flows) - 1, -1, -1):
z, log_det = self.flows[i].inverse(z)
log_q += log_det
log_q += self.q0.log_prob(z)
if weights is not None: # Apply weights if passed as an arg
log_q = log_q * weights
return -torch.mean(log_q)
I've implemented this on a fork and it works well. Happy to provide the math if needed. It's a simple modification based on importance sampling.
Any thoughts? Otherwise I may just open up a PR for these simple changes.
The logp implemented in the original glow is MultiscaleFlow
don't support log_q += self.q0[i].log_prob(z_, context=z)
currently.
Nice work!
But I am a bit confused about your implementation in the examples, "augmented_flow.ipynb".
I wonder what do you mean "Augmented Normalizing Flow" and the "Set augmented target"?
# Set augmented target
target = nf.distributions.TwoIndependent(nf.distributions.TwoMoons(),
nf.distributions.DiagGaussian(2))
Is there some reference for further explanation about why it works?
Thanks!
Hello, it's not clear to me if this package implements a conditional flow
If so, could you please point me to an example if you have one?
If it is implemented, I would also suggest some tweaks to documentation to make that more clear.
Thanks for the nice package!
Hello, first I want to say that this is a really cool library, great job!
I was experimenting with normalizing flows using this library, starting with the example notebooks that are provided. I made a couple of modifications and began seeing log likelihood values that did not seem correct.
I defined my flow model like so:
def splineFlowModelConstructor(base_dist, latent_size, hidden_units_per_flow_block = 256, hidden_layers_per_flow_block = 2, num_layers_in_model = 32):
# Define list of flows
flows = []
for i in range(num_layers_in_model):
flows += [nf.flows.AutoregressiveRationalQuadraticSpline(latent_size, hidden_layers_per_flow_block, hidden_units_per_flow_block)]
flows += [nf.flows.LULinearPermute(latent_size)]
# Construct flow model
model = nf.NormalizingFlow(base_dist, flows)
return model
And my base and target distributions:
N = 8
base = nf.distributions.base.GaussianMixture(1, N, loc=torch.zeros(1, N), scale=1.0 * torch.ones(1, N))
target = nf.distributions.base.GaussianMixture(4, N, loc=40.0 * torch.rand(4, N))
Looking at the first 4 of 8 dimensions, it looks like the NF model fits the data quite well:
And in the inverse direction, from the target distribution to the base distribution:
The problem is when I run model.log_prob( samples )
I get really large values:
model.eval()
model.cpu()
sample = target.sample(100).float()
sample -= mean
sample /= stdev
probs = model.log_prob( sample )
print(probs)
print(torch.exp(probs)[0].item())
Output:
tensor([ 5.4699, 6.4249, 7.8194, 6.3206, 7.5243, 5.1152, 5.9383, 5.5442,
6.2432, 5.4947, 6.8944, -0.6838, 4.9015, 3.7826, 2.4365, 2.1861,
4.3309, 6.7556, 6.9447, 5.0362, 7.6152, 7.5655, 6.9386, 6.1994,
3.9858, 5.7908, 5.1715, 8.4266, 7.3475, 6.3807, 6.9416, 5.8353,
4.7075, 5.3876, 7.7150, 8.0491, 4.9048, 6.5957, 6.0490, 7.3908,
4.8446, 6.9842, 3.0278, 5.0358, 7.4219, 8.0341, 6.4992, 2.5209,
0.4988, 3.3599, 3.3678, 6.3835, 3.1170, 4.1072, 6.1465, 5.8682,
-0.0433, 4.1949, 8.7846, -1.4585, 4.4729, 7.9508, 0.6114, 4.1969,
5.7270, 6.0191, 5.7380, 5.4258, 3.0765, 1.0877, 5.6288, 7.1016,
3.5127, 3.6730, 1.7773, 6.5651, 5.0568, 2.3242, 4.6668, 4.2033,
6.3310, 4.6139, 6.9517, 7.2128, 4.2161, 8.1131, 5.9864, 5.3085,
3.9264, 4.5639, 5.2759, 8.6771, 4.5639, 4.9707, 6.1224, 6.1103,
3.8134, 3.1536, 4.1925, 0.8635], grad_fn=<AddBackward0>)
237.43670654296875
Am I correct that those are supposed to be negative log_prob values?
Best,
In NormalizingFlowVAE class in core.py, this line says that the encoder outputs log_q:
z, log_q = self.q0(x, num_samples=num_samples)
Suppose that, as in this example, the encoder Gaussian (q0) is parameterized by an MLP. Looking at distributions.encoder.py source code, the forward method of NNDiagGaussian class says that it outputs log_p:
return z, log_p
Inconsistency or not?
Dear normflows,
I'm having a hard time understanding something, and if you have a moment to help that would be amazing. I was looking over the real_nvp_colab example, and I wanted to make a few changes. First, I would like to use more than 2 inputs (with a different data set), and I can't quite figure out the changes necessary to make that work. My data set, labeled 'x' in the training section of the notebook, has e.g. torch.Size([6544, 3]), for ndim=3, then I change
nf.flows.Permute(2, mode='swap') to nf.flows.Permute(3, mode='swap') nf.distributions.base.DiagGaussian(2) to nf.distributions.base.DiagGaussian(3)
but then I can't quite figure out how to change
nf.nets.MLP([1, 64, 64, 2], init_zeros=True)
for the ndim=3 case. Or, is a more dramatic change to the code required for ndim=3? I get a lot of pytorch/normflows errors associated with the matrix sizes.
Thank you,
Andrew
Raising issue as part of JOSS review openjournals/joss-reviews#5361
I will put comments related to the JOSS paper itself in this thread. Overall the draft is concise and clear about the package and its statement of need, though there are a couple of things I would like the authors to clarify ( I am putting them in on a rolling basis, still have some deadlines to catch!) :
Hi!
I am using your package and get negative KLDiv when training. I am not sure as to why. I saw #29 but I suspect that solution is not applicable in my case.
Here is how the model is made:
b = torch.Tensor([1 if i % 2 == 0 else 0 for i in range(latent_dim)])
s = nf.nets.MLP([latent_dim, 2 * latent_dim, latent_dim], init_zeros=True)
t = nf.nets.MLP([latent_dim, 2 * latent_dim, latent_dim], init_zeros=True)
flows = [nf.flows.MaskedAffineFlow(b, t, s)]
flows += [nf.flows.ActNorm(self.latent_dim)]
base = nf.distributions.base.DiagGaussian(latent_dim)
# Construct flow model
self.nfm = nf.NormalizingFlow(base, flows)
And here is the training loop:
optimizer = torch.optim.Adam(self.nfm.parameters(), lr=lr, weight_decay=weight_decay)
loss_list = []
for epoch in range(n_epochs):
print(f"Start epoch number {epoch + 1}")
batch_cum_loss = 0
n_batches = len(nf_train_loader)
for batch_idx, (inputs, labels) in enumerate(nf_train_loader):
batch_size = inputs.shape[0]
inputs_cls = inputs.to(self.device)
labels_cls = labels.to(self.device)
optimizer.zero_grad()
with torch.no_grad():
outputs, _, latent = self.net(inputs_cls)
# Compute loss
loss = self.nfm.forward_kld(latent[-1])
Where latent[-1]
is an intermediate output of a given network (before the classifier).
The loss that comes out is negative whereas if sklearn
method mutual_info_score
i get a positive number:
q = torch.normal(mean=0, std=1, size=(batch_size, latent_dim))
res = mutual_info_score(latent[-1].view(-1,), q.view(-1,))
res = kl_div(latent[-1], q)
res = kl_loss(latent[-1], q)
As evident in the loss graph, the loss values are also not stable being negative - although when overlooking the sign, the graph does look like a normal training graph.
I would appreciate any help!
Hi, thanks for the code!
This code is learning a lot for me.
But besides sampling from the distribution,
I wonder how to implement reverse using latent z. (reverse flow)
Hello, here is a code that just builds a rnvp flow with a gaussian base, and run forward
on a random set of points (no training).
It works fine with latent_size=2
or latent_size=4
, but not if you put latent_size=3
or latent_size=5
.
Could you help me find the problem?
#%% # Import required packages
import torch
import normflows as nf
import numpy as np
#%% Real NVP model, with Gaussian base distribution
latent_size=5
# Define 2D Gaussian base distribution
base = nf.distributions.base.DiagGaussian(latent_size)
# Define list of flows
num_layers = 8
flows = []
for i in range(num_layers):
# Neural network with two hidden layers having 64 units each
# Last layer is initialized by zeros making training more stable
param_map = nf.nets.MLP([latent_size//2, 64, 64, latent_size], init_zeros=True)
# Add flow layer
flows.append(nf.flows.AffineCouplingBlock(param_map))
# Swap dimensions
flows.append(nf.flows.Permute(latent_size, mode='swap'))
# Construct flow model
model = nf.NormalizingFlow(base, flows)
model = model.to('cpu')
# %%
X=np.random.rand(10,latent_size)
X_tensor = torch.tensor(X, dtype=torch.float32)
z = model.forward(X_tensor)
The error happens in the last line, and in fact, trying to run just the first flow of the loop in forward will do the same:
model.flows[0](X_tensor)
With latent_size=3
, I have:
RuntimeError: mat1 and mat2 shapes cannot be multiplied (10x2 and 1x64)
With latent_size=5
, I have:
RuntimeError: mat1 and mat2 shapes cannot be multiplied (10x3 and 2x64)
For this last one, if I print model.flows[0]
, I have the architecture herebelow, so I feel like the problem is between the Split
that has to separate the variables in two sets, and the first Linear 2x64, and has to do in the code with the integer division latent_size//2
.
Is the problem that when it is an odd number, we cannot split in two set of variables of the same number? How can I deal with that?
AffineCouplingBlock(
(flows): ModuleList(
(0): Split()
(1): AffineCoupling(
(param_map): MLP(
(net): Sequential(
(0): Linear(in_features=2, out_features=64, bias=True)
(1): LeakyReLU(negative_slope=0.0)
(2): Linear(in_features=64, out_features=64, bias=True)
(3): LeakyReLU(negative_slope=0.0)
(4): Linear(in_features=64, out_features=5, bias=True)
)
)
)
(2): Merge()
)
)
Raising issue as part of JOSS review openjournals/joss-reviews#5361
Please consider adding example usage (Colab notebook) for image data e.g. CIFAR10 or MNIST.
Hi,
I was wondering why there was a minus here
normalizing-flows/normflows/core.py
Line 54 in 0466e7f
normalizing-flows/normflows/core_test.py
Line 58 in 0466e7f
# Libraries
import torch
import numpy as np
import normflows as nf
from tqdm import trange
# Train a flow
def train_flow(flow, base, target, batch_size=1024, lr=5e-4, n_iter=4096):
# Optimizer
optimizer = torch.optim.Adam(flow.parameters(), lr=lr)
# Train the flow
r = trange(n_iter, unit='step', desc='loss')
for i in r:
# Reset the optimizer
optimizer.zero_grad()
# Sample the target
target_samples = target.sample((batch_size,))
# Estimate the loss
z, log_jac = flow.inverse_and_log_det(target_samples)
flow_loss = -(base.log_prob(z) + log_jac).mean()
# Backward pass
flow_loss.backward()
optimizer.step()
# Debug
r.set_description('loss = {}'.format(round(flow_loss.item(), 4)), refresh=True)
# Make the device
device = torch.device('cuda')
# Make the arguments
dim = 2
# Make the base
base = torch.distributions.MultivariateNormal(
loc=torch.zeros((dim,), device=device),
covariance_matrix=torch.eye(dim, device=device)
)
locs = torch.ones((2, dim), device=device)
locs[1] *= -1
target = torch.distributions.MixtureSameFamily(
mixture_distribution=torch.distributions.Categorical(torch.ones((2,), device=device)),
component_distribution=torch.distributions.MultivariateNormal(
loc=locs,
covariance_matrix=torch.stack([0.3 * torch.eye(dim, device=device)])
)
)
# Make the flow
base_ = nf.distributions.DiagGaussian(dim, trainable=False)
flows = []
b = torch.Tensor([1 if i % 2 == 0 else 0 for i in range(2)])
for i in range(2):
s = nf.nets.MLP([2, 2 * 2, 2], init_zeros=True)
t = nf.nets.MLP([2, 2 * 2, 2], init_zeros=True)
if i % 2 == 0:
flows += [nf.flows.MaskedAffineFlow(b, t, s)]
else:
flows += [nf.flows.MaskedAffineFlow(1 - b, t, s)]
flows += [nf.flows.ActNorm(2)]
flow = nf.NormalizingFlow(base, flows)
flow = flow.to(device)
# Launch training
train_flow(flow, base, target, batch_size=2048, lr=1e-3)
# Test bijectivity
z = base.sample(sample_shape=(2048,))
with torch.no_grad():
x, log_jac_x = flow.forward_and_log_det(z)
z_z, log_jac_z_z = flow.inverse_and_log_det(x)
with torch.no_grad():
print('norm(z - z_z) = ', torch.linalg.norm((z - z_z).view((-1, 2)), dim=-1).mean())
print('mse(log_jac_z_z + log_jac_x) = ', torch.mean(torch.square(log_jac_z_z + log_jac_x)))
print('mse(log_jac_z_z - log_jac_x) = ', torch.mean(torch.square(log_jac_z_z - log_jac_x)))
# Display the push-forward
import matplotlib.pyplot as plt
res = 64
x_min, x_max, y_min, y_max = -3, 3, -3, 3
x = np.linspace(x_min, x_max, res)
y = np.linspace(x_min, x_max, res)
X, Y = np.meshgrid(x, y)
Z = np.stack([X,Y], axis=-1).reshape((-1,2))
Z = torch.from_numpy(Z).float().to(device)
Z_z, Z_z_log_jac = flow.inverse_and_log_det(Z)
log_prob_flow_forward = base.log_prob(Z_z) + Z_z_log_jac
log_prob_flow_forward = log_prob_flow_forward.reshape((res, res)).detach().cpu()
plt.contourf(X, Y, torch.exp(log_prob_flow_forward), 20, cmap='GnBu')
plt.xlabel('x')
plt.ylabel('y')
plt.title('Push-forward')
plt.show()
This code returns
norm(z - z_z) = tensor(2.3041e-07, device='cuda:0')
mse(log_jac_z_z + log_jac_x) = tensor(1.3097, device='cuda:0')
mse(log_jac_z_z - log_jac_x) = tensor(6.4206e-15, device='cuda:0')
Hi,
I do not understand how to compute the inverse normalizing flow. Could you explain how do you compute the inverse function? Thank you.
Link:
https://github.com/VincentStimper/normalizing-flows/blob/ce86fe79d7cdc34f0362f487194d0cebb1999edc/normflows/flows/planar.py#LL66C3-L66C3
def inverse(self, z):
if self.act != "leaky_relu":
raise NotImplementedError("This flow has no algebraic inverse.")
lin = torch.sum(self.w * z, list(range(1, self.w.dim()))) + self.b
a = (lin < 0) * (
self.h.negative_slope - 1.0
) + 1.0 # absorb leakyReLU slope into u
inner = torch.sum(self.w * self.u)
u = self.u + (torch.log(1 + torch.exp(inner)) - 1 - inner) \
* self.w / torch.sum(self.w ** 2)
dims = [-1] + (u.dim() - 1) * [1]
u = a.reshape(*dims) * u
inner_ = torch.sum(self.w * u, list(range(1, self.w.dim())))
z_ = z - u * (lin / (1 + inner_)).reshape(*dims)
log_det = -torch.log(torch.abs(1 + inner_))
return z_, log_det
For the following example:
import numpy as np
import torch as T
import normflows as nf
base_dist = nf.distributions.base.DiagGaussian(1, trainable=False)
flows = []
for _ in range(16):
flows += [nf.flows.CoupledRationalQuadraticSpline(
num_input_channels=1, num_context_channels=1,
num_blocks=5, num_hidden_channels=256, reverse_mask=True)]
flows += [nf.flows.Permute(1)]
network = nf.ConditionalNormalizingFlow(q0=base_dist, flows=flows)
network.train()
network = network.to('cuda')
optim = T.optim.Adam(network.parameters())
for i in range(2):
x = T.randn(10_000, 1).to('cuda')
c = T.randn(10_000, 1).to('cuda')
loss = network.forward_kld(x, c)
print('loss:', loss)
optim.zero_grad()
loss.backward()
optim.step()
print(any([T.any(T.isnan(p)) for p in network.parameters()]))
I encounter the following issues:
I specifically only get 1 when I perform the computation on the GPU, not on the CPU. Presumably because of precision issues?
It is clear why 2 happens, since in that case our one-dimension gets mapped to the identity split, instead of the transform split.
What I would expect instead is that the one-dimensional case works and works out of the box, specifically without having to set reverse_mask=True.
To be fair, one could easily argue that one shouldn't use coupling flows for 1d and instead use an autoregressive flow, which doesn't really change anything (except that it works without problems now). Following this thought, I would simply suggest to throw an error if one attempts to initialize a coupling flow with only one channel .
Related to this Issue, we should add a forward and inverse function that returns the output from the latents and vice versa together with the log determinant.
Hi, Thanks for developing this package. I find it very neat and flexible and would like to use it for my research. I noticed that in the paper "Resampling Base Distributions of Normalizing Flows", the bpd of your glow can reach 3.2~3.3, which is comparable to the original paper.
I was wondering if it is possible to share your training scripts and details to train glow on cifar10 to achieve the above bpd. The current example notebook is too sketchy and only results in bpd of 3.8. Thanks very much!
First and foremost, I would like to express my sincere gratitude and respect for your work on this repository. The progress and innovations shared here have been immensely insightful and valuable to the community.
I am currently exploring the concept of fission in invertible neural networks, where a single latent representation 'x' can be decomposed into two distinct components 'y' and 'z'. My objective is to parameterize 'z' with a tractable distribution while ensuring that the combination of 'y' and 'z' can be accurately recombined to reconstruct 'x' using the reverse of the model.
Given your expertise in this field, I would greatly appreciate any guidance or suggestions you could provide on the following aspects:
Any insights, references, or examples you could share would be extremely helpful.
Thank you for your time and for the impactful contributions you've made to the field.
Best regards
Can we do a clamp
like before z2
is returned in AffineCoupling
?
For example:
class AffineCouplingStable(Flow):
"""
Affine Coupling layer as introduced RealNVP paper, see arXiv: 1605.08803
"""
def __init__(self, param_map, scale=True, scale_map="exp"):
"""Constructor
Args:
param_map: Maps features to shift and scale parameter (if applicable)
scale: Flag whether scale shall be applied
scale_map: Map to be applied to the scale parameter, can be 'exp' as in RealNVP or 'sigmoid' as in Glow, 'sigmoid_inv' uses multiplicative sigmoid scale when sampling from the model
"""
super().__init__()
self.add_module("param_map", param_map)
self.scale = scale
self.scale_map = scale_map
def forward(self, z):
"""
z is a list of z1 and z2; ```z = [z1, z2]```
z1 is left constant and affine map is applied to z2 with parameters depending
on z1
Args:
z
"""
z1, z2 = z
param = self.param_map(z1)
if self.scale:
shift = param[:, 0::2, ...]
scale_ = param[:, 1::2, ...]
if self.scale_map == "exp":
z2 = z2 * torch.exp(scale_) + shift
z2 = z2.clamp(min=-1e6, max=1e6)
log_det = torch.sum(scale_, dim=list(range(1, shift.dim())))
elif self.scale_map == "sigmoid":
scale_inv = 1 + torch.exp(-(scale_ + 2)) # 1 / sigmoid(scale_ + 2)
z2 = z2 * scale_inv + shift
z2 = z2.clamp(min=-1e6, max=1e6)
log_scale = nn.functional.logsigmoid(scale_ + 2)
log_det = -torch.sum(log_scale, dim=list(range(1, shift.dim())))
elif self.scale_map == "sigmoid_inv":
scale = torch.sigmoid(scale_ + 2)
z2 = z2 * scale + shift
z2 = z2.clamp(min=-1e6, max=1e6)
log_scale = nn.functional.logsigmoid(scale_ + 2)
log_det = torch.sum(log_scale, dim=list(range(1, shift.dim())))
else:
raise NotImplementedError("This scale map is not implemented.")
else:
z2 = z2 + param
log_det = zero_log_det_like_z(z2)
return [z1, z2], log_det
def inverse(self, z):
z1, z2 = z
param = self.param_map(z1)
if self.scale:
shift = param[:, 0::2, ...]
scale_ = param[:, 1::2, ...]
if self.scale_map == "exp":
z2 = (z2 - shift) * torch.exp(-scale_)
z2 = z2.clamp(min=-1e6, max=1e6)
log_det = -torch.sum(scale_, dim=list(range(1, shift.dim())))
elif self.scale_map == "sigmoid":
scale = torch.sigmoid(scale_ + 2)
z2 = (z2 - shift) * scale
z2 = z2.clamp(min=-1e6, max=1e6)
log_scale = nn.functional.logsigmoid(scale_ + 2)
log_det = torch.sum(log_scale, dim=list(range(1, shift.dim())))
elif self.scale_map == "sigmoid_inv":
scale_inv = 1 + torch.exp(-(scale_ + 2)) # 1 / sigmoid(scale_ + 2)
z2 = (z2 - shift) * scale_inv
z2 = z2.clamp(min=-1e6, max=1e6)
log_scale = nn.functional.logsigmoid(scale_ + 2)
log_det = -torch.sum(log_scale, dim=list(range(1, shift.dim())))
else:
raise NotImplementedError("This scale map is not implemented.")
else:
z2 = z2 - param
log_det = zero_log_det_like_z(z2)
return [z1, z2], log_det
It seems that some layers do not support conditioning, like the affine coupling block. Would it be possible to extend this, as suggested in e.g. https://arxiv.org/pdf/1912.00042 ? Wouldn't be much effort, just need to make the affine coupling conditional and the rest can stay.
Hello!
This is an excellent repository that I am using for my Ph.D. Thank you for such a fantastic contribution!
I wonder if you have plans to develop Flow++ https://arxiv.org/abs/1902.00275.
By the way, for making likelihood estimation, is another method (on this repo) more suitable than Flow++? What is your opinion? thanks
Thank you for a repo that's easy to handle with a normalizing flow of one's choice!
I would like to implement a normalizing flow that optimizes multiple target distributions at once depending on the context I would provide to it. Yet, currently, afai, no context can be providided in the .forward_kld method of the NormalizingFlow class.
Would be great if that's added!
Cheers,
Yves
In the MADE class, there is the preprocessing
attribute created as lambda. Please remove this or it will be impossible to torch.save()
.
Does normflows support parallel computing on multi-GPU?
Thanks for this amazing and high quality repo. But I meet some issue about ConditionalNormalizingFlow. i checked this repo, However, it seems like only MaskedAffineAutoregressive support the context. how to implement other class support ConditionalNormalizingFlow?
@VincentStimper
Awesome repo. Thanks
I want to work on NICE. Didn't find the NICE example here. Could you give NICE example?
Hey,
thanks for your framework.
Are you currently planning to implement flows on spheres supporting forward and backward KLD as well?
Thanks and all the best
Hi, when doing experiments, I'd suggest doing some other tutorials, for example for the ClassCondFlow, as while trying to one on my own, I keep encountering this error.
Running the following minimal example:
import normflows as nf
import torch
torch.manual_seed(42)
flow = nf.NormalizingFlow(
nf.distributions.DiagGaussian(1, trainable=False),
[
nf.flows.AutoregressiveRationalQuadraticSpline(1, 1, 1),
nf.flows.LULinearPermute(1)
]
)
with torch.no_grad():
samples_flow, _ = flow.sample(4)
print(samples_flow)
raises a UserWarning
about an upcoming deprecation:
/Users/timothy/Desktop/normalizing-flows/normflows/flows/mixing.py:437: UserWarning: torch.triangular_solve is deprecated in favor of torch.linalg.solve_triangular and will be removed in a future PyTorch release.
torch.linalg.solve_triangular has its arguments reversed and does not return a copy of one of the inputs.
X = torch.triangular_solve(B, A).solution
should be replaced with
X = torch.linalg.solve_triangular(A, B). (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/BatchLinearAlgebra.cpp:2189.)
outputs, _ = torch.triangular_solve(
I will submit a PR shortly that fixes the issue ๐
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.