bamresearch / bayem Goto Github PK

Implementation and derivation of "Variational Bayesian inference for a nonlinear forward model." [Chappell et al. 2008] for arbitrary, user-defined model errors.

License: MIT License

Python 100.00%

bayesian-inference inference python variational-bayes

bayem's People

Contributors

Stargazers

Watchers

Forkers

ttitscher

bayem's Issues

Visualization of marginalized posteriors

In the visualization, pair plots are created that plot on the diagonal the marginal distributions and on the off-diagonal elements the bivariate marginal distributions. The plot distinguishes between 2 normal variables (related to parameters in the model) and the other cases assumed to be uncorrelated.

It would be nice to prescribe quantiles to isolines to be plotted. For the bivariate normal this is e.g. alpha/(sqrt((2*pi)**2 det(cov_2). See e.g. this link with alpha the quantile, and then substituting this in the bivariate pdf to obtain the isolines, i.e in visualize_vb_marginal_matrix this might be added with dist being the bivariate marginal normal posterior of the two variables

levels = (1/ np.sqrt((2 * np.pi) ** 2) / np.linalg.det(dist.cov)) *np.atleast_1d(alpha)
axes[i, j].contour(xj, xi, pdf, colors=[color], linewidths=lw,  levels = levels)

For the case with two correlated noise terms, the assumptions of just multiplying the two univariate gamma pdf seems not correct, since they are actually correlated.
For the case of a normal distributed model parameter and a gamma distributed noise precision, the plot seems not correct, since these are currently ellipsoids, but for gamma being non-symmetric I would expect a non-symmetric joint pdf.

Proper VB scaling

To deal with numerically large parameters, the current VB implementation scales the provided prior by its mean and infers this scaled parameters. The main benefit is that the entries of the Jacobians and the precision are far closer to one.

This is not done for numerically small parameters, as there has to be some kind of eps to avoid division by zero (for zero mean parameters). Most of my problems are then solved for eps=1.e-20, but the proper way would be to use some kind of scaling of the precision. For normal distributions, there is this zero-mean unit-variance transformation. How would that work for MVN?

I read something here where an eigenvalue decomposition of the precision/covariance is used, COV = Q . EVs . Q.T, where scaling with Q is performed. However, in our case, COV is diagonal so Q is Identity and there will be no scaling...?

Towards `bayem`

Some time ago, we decided to pack this VB algorithm into a pypi packaged named bayem.

For the usage, I have the following API in mind that is strongly inspired by scipy.optimize

import bayem

n0 = bayem.Gamma(...)
p0 = bayem.MVN(...)

result = bayem.solve(fun=model, param0=p0, noise0=n0, jac=None) # uses central differences Jacobian
# or 
result = bayem.solve(fun=model, param0=p0, noise0=n0, jac=model.jacobian, tolerance=..., callback=...)

bayem.pair_plot(result, show=True)
bayem.trace_plot(result, labels=...)

Also, I'd like to split the code (at least) into

distributions.py contains MVN and Gamma
vb.py contains the actual algorithm
visualzation.py contains some plots

and then automatically load those when import bayem, so that you use bayem.Gamma (see above) instead of bayem.distributions.Gamma.

Is the name OK?
API OK? bayem.vba instead of solve?
Split into files OK?
I'd like to slowly step back from the project and like to also mention another author (or multiple) here. Who is interested? @ajafarihub @aradermacher @joergfunger ?

Update single latent variable

In the optimization procedure, I have to set initial values. Currently I have only a single value which makes it easy to do something like

    start_vector = np.array([0.7])
    result = least_squares(all_experiments_model_error, start_vector)

For the more complex case with multiple experiments, I can set the local variables when creating the multimodelerror, e.g within the user-define derived class of the MultiModelError

   single_model_error = LinearModelError(some_prm)
   parameter = single_model_error.get_parameter_dict()
   parameter.define("b", init_value_b)
   self.add(single_model_error, parameter)

For parameters that are global (shared by different models), it would be nice to set global latent variables jointly for all parameters (and outside of the definition of multimodel error)

    multi_model_error.add_by_name('b')
    multi_model_error.update('b', init_value_b)

I wanted to implement something similar, but I was not sure how exactly the interface should look like, because on the latent_parameter level I only have either the (potentially different) (name,key) for each individual index, or the index itself - both seems to me not intuitive if I want to set a named variable. I could also get the index in the global vector by

latent_parameters.index_of('b', one_model_that_has_b)]

Any better idea on how this should be done?

Length of vector valued parameters

For simplicity, we want to support vector valued parameters to have, for example, a whole displacement field u as a latent variable. That would simply work as

p = bayes.parameters.ParameterList()
p.define("u", [0,0,0,0,0])

If you define this parameter as latent, the LatentParameters must know its length -- 5 in this case -- to assign it to a region in the global latent parameter vector.

There is, however, no check if the length of the parameter keeps constant. So no error on

p["u"] = 42.
p["u"] = [0,0,0]

Also, we support to initialize empty parameters like

p.define("u2")

This is convenient if you want to indicate that this parameter must be latent.In this case, however, LatentParameters has no chance to know its length and (currently) assumes length 1. So this behavior must be clearly documented or the length must be provided explicitly:

p.define("u2") # raises "No length given!"
# or
p.define("u2") # logs: "Parmameter 'u2' defined as scalar."
p.define("u2", length=6) # logs: "Parmameter 'u2' defined as vector of length 6."

Remove vb.MBN and vb.Gamma

Those two distributions are tailored for our VB algorithm but the associated values -- means and precision matrix from vb.MVN and shape and scale from vb.Gamma -- can also be extracted from scipy.stats.multivariate_normal and scipy.stats.gamma.

IMO the only question would be how to transfer the feature of vb.MVN.named_mean("some_prm_name") there...

Check ARD implementation

Can we think of an analytic solution including ARD to validate

the modifications to the update equations
the final free energy
?

Remove sphinx from requirements

The current requirements contain

sphinx
sphinx_rtd_theme

which are not needed to install the package, e.g. in another project or a CI pipeline.

To build the documentation locally, we could have a requirements_website.txt or so that needs to be installed once by a user. But @joergfunger, how does the readthedocs integration work? I did not find the place that tells readthedocs what to build. I assume, by default, they look for a conf.py and index.rst and run sphinx themselves. Is that correct? In that case, the requirements_website.txt plan would work.

Jacobian for global variables

It seems that the Jacobian is not correctly evaluated when multiple individual parameters in a model error (e.g. E0 and E1) are mapped to a single global parameter (assuming E0 and E1 are to be jointly identified). In particular, in the inference problem

stacked_jac[:, self.latent[global_name].global_index_range()] = -J

stacked Jacobian has the dimension of [out, 1], whereas J has the dimension [out, 2].

Failing visualization test

The current master CI fails because of a visualization test that compares a VB generated plot with a (manually checked) reference image. Strange:

The CI ran successfully on the pre-merged branch some days ago
Locally, I cannot reproduce the error

Maybe some version update of dependent libraries...

Sub-optimal control flow

In the case of starting vba with the correct parameters, I expect it to evaluate the model twice to compute two free energies, which will be close enough for the algorithm to stop.

In our current implementation, we need at least three, as we compute no free energy with the result of the first model evaluation.

evaluate vs. call

The inference problem currently only has a __call_function that returns a dictionary (for the experiment) of vectors (concatenation of all sensor outputs). I think it would be nice to have an interface that allows to propagate the complete model error to the output, thus a function that returns a dictionary (experiment) of dictionaries (sensors) - which I could implement. Just asking myself, if we still need the call function (so that I would just change call) and then have another helper function on the top level of the inference problem (so not implementing an evaluate and call in parallel in all classes like ModelError and ForwardModel). I think in these classes a single call returning the dict of dict version would be sufficient, and then we provide an evaluate_as_vector function in the inferenceProblem.

Fancy readme

... including:

how to install
"hello world" example
some plots
...?

Parameters as vector

In some applications, the parameter vec consist of individual components that are are in itself a vector, e.g. when interpreting the displacement vector in a FEM computation as an unknown. Any idea on how we are going to handle this?

Jacobian of individual models

We are currently computing the Jacobian using finite differences, which is done by changing each global parameter, computing the global residual and then getting a complete full matrix.
For problems, where we more than one experiment and latent parameters that are specific to one experiment (e.q. offsets, displacements as latent etc), this is a large overhead (since e.g. if experiment 1 does not depend on d2, then we know a priori that this derivative is zero). As a consequence, I think it could be advantageous to provide an ordered list of only the local latent variables (for a single experiment), provide this to compute the "local" Jacobian in (this order), and then assemble it together e.g. in a sparse fashion. An alternative to the sparse global sensitivity matrix would be returning each individual matrix, similar to the evaluate feature currently proposed and the user is then supposed to sum up the individual components.

VB_postprocess_per_iteration

For studying purposes in particular, it is nice to have the possibility of doing kind of post-processing of intermediate iterations in the VB via some callable(s) given by the user. For example, one might like to monitor:

the condition-number of Lambda matrix
step-size in the LM algorithm
the last update of parameters and noises

One idea would be to consider a list of callable(s) each being called after any iterative update. This can be possibly done after update of parameters and noises in a separate way.

remove python 3.7

python 3.7 is not supported anymore, remove

Noise model

In recent discussions and even implementations the idea of a noise model came up. Maybe even with separate parameters similar to the model_error.

Main purpose:

The multiple model errors of an inference problem currently return the individual model error contributions as a nested dictionary like {experiment_key : {sensor : vector_of_numbers}}. We now need a way to further adapt this output:

VB: define N "noise groups" - e.g. N lists of tuple(experiment_key, sensor) - that ends up with N individual vectors.
sampling: define the individual terms of the loglikelihood function also based on some data structures of exp_key, sensor-pairs

The main interface would be like

class NoiseModel:
    def define_parameters(self):
        pass

    def vector_contribution(self, raw_model_error_output, prm):
        pass

    def loglike_contribution(self, raw_model_error_output, prm):
        pass

Benefits:

The importance of the noise is elevated, somewhat to the same level as the model error. This is purely psychological though ;)
Provide reasonable, well defined default values like

class SingleNoiseForEverything:
    def define_parameters(self):
        p = ParameterList()
        p.define("sigma")
        return p

    def vector_contribution(self, raw_model_error_output, prm):
        # just concatenate _all_ outputs
        vector_terms = []
        for exp_me in raw_model_error_output.values():
            for sensor_me in exp_me.values():
                vector_terms.append(sensor_me)
        return np.concatenate(vector_terms)

    def loglike_contribution(self, raw_model_error_output, prm):
       error = self.vector_contribution(raw_model_error_output, prm)
       return - 0.5 * (len(error) * np.log(2.0 * np.pi *prm["sigma"]**2) + np.sum(np.square(error / prm["sigma"]**2))

Also allow for more complex noise patterns for individual sensors or even exp_key - sensor pairs. Side effect: This lets the user conveniently control which contributions of the many model error outputs he wants to include in the inference problem.

class NoiseBySensor
    def __init__(self, sensor):
        # ... later only returns the model_error contributions of "sensor"

Dealing with more complex noise models:

class SensorOffsetModel # parameters: sigma, offset

Dealing with correlations. Even tough we have not completely figured that out, I think something like

class FixedCorrelatedNoiseModel:
    def __init__(self, sensor, exp_key, cov_matrix):

or even

class CorrelatedNoiseModel:
    def __init__(self, sensor, exp_key, distance): 
   # distance (time or space) between correlated measurements
   # parameters would be: sigma, correlation_length

could work.

The whole "the model error must return a dict {sensor: values}" is practical for complex cases, but pointless for models like "ax+b". A ModelHasSingleOutputNoise could deal with that case - mainly in test files - where it skips the loop over sensors in its evaluation.

User experience:

problem = InferenceProblem()
key_me1 = problem.add_model_error(me1, me1.define_parameters())
key_me2 = problem.add_model_error(me2, me2.define_parameters())

# similar pattern for noise!
noise_model = SingleNoiseForEverything()
key_noise = problem.add_noise_model(noise_model, noise_model.define_parameters())

# assign the shared parameter "A" to the latent parameter "latentA"
problem.latent("latentA", "A", key_me1)
problem.latent("latentA", "A", key_me2)

# similar API for the noise
problem.latent("latentSigma", "sigma", key_noise)

# the following is yet to be discussed and (please) not part of this issue/discussion:
vb_prior = VBPrior(problem.latent)
vb_prior.set("latentA", Normal(...))
vb_prior.set("latentSigma", Gamma(...)) # well actually it should be "precision" for VB...

with pm.Model:
    pm_priors[problem.latent["latentA"].idx] = pm.Normal("latentA", ...)
    ...

Linearity check

Problem

This picture compares a sampled posterior (blue) with the VB posterior (red) and I assume the sampled results to be correct. E_beam and E_conn are MVN model parameters, Lambda is the Gamma noise precision.

The individual parameters are not normally distributed.
By definition, the MVN posterior cannot capture the curved covariance.

This boils down to the fact that a for a nonlinear model the prior and the posterior are not conjugate anymore (right?!). But even though our model is indeed nonlinear, the linearization around a narrow posterior distribution may still by quite linear. So I would like to provide some (optional) linearity checks that warn the user if the linearization assumption is bad at the posterior mean. This matters, if you want to evaluate the model at the tails of the posterior distribution, as the posterior MVN may significantly differ from the true posterior (see figure).

Goal

Define and quantify good and bad in the paragraph above.
Very optional: Derive suggestions for parameter transformations like The linearization of 'E_beam' of has a linearity of 0.2 (bad!), the parametrization '1/E_beam' would have a linearity of 0.96 (good).

Possible solutions

The basis could be a comparison of our model error function M at some point with the posterior-linearized M_lin. For a scalar model error with one parameter theta, we could take a set of points (e.g. theta_i = mean + i * sd for i [-4, -3, ..., 4]) and compare M(theta_i) [as a data set] and M_lin(theta_i) [as our prediction] using R².

How do deal with multiple parameters?
- Doing it for each parameter individually could be sufficient.
The model error is vector valued. How do deal with that?
- We could compute R² for entry individually. If each entry has a meaning (e.g. sensor 4 at time step 17) that could be indeed interesting to see. Or we return some mean(R²).
- Is there something like R² for higher dimensional data. Maybe this?

Bonus: The possible transformations could be explored by a least square fit of M(theat_i) to a set of predefined nonlinear models (log(theta), exp(theta), 1/theta, ...).

Discussion

Would you find that useful?

Any ideas on how to define the linearity?

Prior information as `scipy.stats`

Defining priors is currently only supported for VariationalBayesProblem and restricted to uncorrelated normal distributions for the parameters and gamma distributions for the noise models. This works OKish for automatically converting a VariationalBayesProblem into a sampling problem. Users that do not care about Chappell's algorithm, are likely puzzled by this.

I propose:

Provide prior distributions as scipy.stats distributions (norm, gamma, ...) directly to the InferenceProblem.
Provide a SolverInterface that is implemented by i.e. VBSolver or PyMC3Solver. Those are more like example implementations with a fixed, limited set of options to control to underlying solvers. More experienced users will likely write/interface their own way of handing the InferenceProblem.

Separate hyperparameters

Hyperparameters are parameters that are not direct input to the model error computation, but are rather used outside (, right?!) One example would be parameters of the noise distribution. For the current VB implementation, they are a separate input to the method and for sampling, they are used in the transformation of the model error to a likelihood function.

Having those, conceptually, in two separate objects, allows us to define the problem independent of the inference method used. In some "simple" cases -- in my experience the majority though -- a small wrapper around the problem definition or the inference methods can lead to a drop-in replacement of the methods. So without any effort, you can start your investigations with a scipy.optimize, validate with sampling and switch to VB for performance or larger problem size.

Docstrings

numpydoc is a common docstring style used in many scientific python packages. We should adapt our code to that.

pre-commit hooks

Provide pre-commit hooks similar to probeye including black formatting.

"impure" ModelErrorInterface.call()

In computer programming, a pure function is a function that (roughly) has no side-effects and, given the same arguments, has the same return value. The benefit is that those functions are easy to reason about, do not require much knowledge of their context and are easy to test. An example would be the obvious
error = model_error(parameter_list)

In our implementation, however, we do

model_error.parameter_list[...] = something...
error = model_error()

You need to know that every model error has a member parameter_list
Another function, like latent.update(...) here, may actually change that list without you expecting/noticing it.

Very unintuitive.

I honestly do not know why I chose the impure version. One reason could have been the distinction between the (implementation-wise) completely different (!!!) methods ModelErrorInterface.__call__ and VariationalBayesInterface.__call__.

Anyways, I'll try to revert that to the pure version.

Variational Bayes algorithm with given noise

The current implementation always infers the noise. The option to not do that and instead provide a determinstic noise-sigma (as common in likelihood functions) is missing.

IMO this could be implemented by just skipping the update equations 21-22 (or 30-31 for multiple noises). Then, in the remaining update equations, the noise is only used as "scale x shape" which corresponds to the mean of this distribution, which could be provided instead.

Discussion:

Are my thoughts correct?
The free energy terms include the noise shape and noise scale separately. How to handle that?

levels are not always computed correctly

There is a bug (indentation) in visualize that leads to a problem when levels are not given (this computed) and both variables are model parameters. Can be fixed by changing the indentation.

Multi-Multi-ModelError

We currently have the option to combine multiple parameters to joint parameter lists and, accordingly, multiple model errors to a MultiModelError.

Do we need another recursive layer around this to allow arbitrary combinations? If so, how?

ME1(P1) \
ME2(P2)  | => MutliME1([P1, P2, P3]) \
ME3(P3) /                             | ==> MultiME3([P1, P2, P3, P4, P5] 
                                     /
ME4(P4) \ => MultiME2([P4, P5])  
ME5(P5) /              ||
                       \/
# OR       /  MultiME4([P6, P4, P5])
ME6(P6)  /

Prescribed correlation (in noise model)

I think, the VB can be improved by adding a known nonuniform covariance matrix to the noise model according to eq. (5) of this document.

The idea is to have a known covariance that can be scaled by the unknown \Phi . An example of application of this is when we model the noise in the force residuals of a structure, which should be somehow proportional to the stiffness matrix. So, the stiffness matrix can take part as a basis matrix for the covariance of the noise, which will be scaled by the identified \Phi.

Also at the same time, the terms cancelled out in eq. (34) can be dropped from the VB implementation, as well.

Mix of shape and scale in bayes.vb.Gamma

@aradermacher and @joergfunger figured out a bug in bayes.vb.Gamma where we (most likely I 😄 ) mixed up the shape (should be c) and the scale (should be s) parameter.

Should we test that or just fix it?

Edit: I may have thought scale has a c in it and accidentally connected them this way. Should we maybe adapt the k, theta notation from wikipedia?

Integrating jacobians from Black box forward solver with grad based UQ schemes

Hello,
My first issue here . yay!!. I was thinking about ideas to inform the codebase about the availability of Jacobians from the forward solver. Some kind of flag? Also the size of Jacobians is directly proportional to grid size. Can it pose a problem later on?

I am trying to build this interface in bayes-->calibration. Would be nice to have a very basic fenics based test forward solve. Maybe simple FE based load displacement simulation would also do which passes solver output and the Jacobians for a given input parameters (maybe youngs mod E).

I envision to treat this thread as an evolving log to get this implemented.

initialization of global parameter vector

How would the init function for the parameter vector look like? Currently, I do

latent_length = sum(latent_par.N for latent_par in inference_problem.latent.values())
start_vector = np.empty(latent_length)
start_vector[inference_problem.latent['b'].global_index_range()] = 0.7

Should we have a set function?

inference_problem.latent['b'].set_value(0.7)

Vector valued parameters

A relevant use case for vector valued parameters would be a FE forward model, where all displacements or stresses are latent variables it is just easier to define the whole field with possibly millions of entries.

The library somehow supports vector valued parameters, but there is no example. Also, actually going to 1e6 parameters in the current implementation, causes issues regarding:

dense Jacobian with 1e6 x 1e6 entries
dense precision/covariance matrixes with 1e6 x 1e6 entries
matrix multiplication i.e. Jacobian.T @ Jacobian that is tricky even if the Jacobian is sparse
any matrix inversions

So do we want to support that in a general fashion in our InferenceProblem or would that rather be a special case only relevant for specifically tailored VB algorithm? I vote for the latter.

Analytical example

In order to test our methods (and maybe also to investigate the problem of the free enery being maximal in the first iteration), I think it might be advantageous to have an analytical test case, i.e. not prescribing noise on the data and then identify, but rather prescribe a linear model with a few data points, then derive the analytical solution (the exact posterior). Using VB we should then be able to exactly reproduce the result (even though it might not yield the exact mean) and we can even test the cov.

Inconsistent assignments to "k" and "J"

Here is a simple example to illustrate the issue:

Suppose, a user has defined a model error completely regardless of VB, as follows:

me(x) = 2*x² - 2  # here, the "data" is 2 and the forward model is g(x) = 2*x²
me.jacobian(x) = 4*x

If the user wants to use the VB for this model error, the correct k and J (according to the paper) must be:

k = data - g(x) = 2 - 2*x²  = - me(x)
J = 4*x = me.jacobian(x)

And these are not what we have now in the implementation:

k, J = model_error(param0.mean), model_error.jacobian(param0.mean)

So, I think that, either "model_error(param0.mean)" or "model_error.jacobian(param0.mean)" must receive a minus in front of it.

VB `state` variable

Our implementation follows the paper notation of having separate variables s, c, m, L and s0, c0, m0, L0 for the prior, and things like L_inv for efficiency. This makes it ugly (debatable, I know) to represent the rough structure of the algorithm

while keep_running(s, c, m, ...):
  update_parameters(s, c, m, ...)
  update_noise(s, c, m, ...)
  update_prior(s, c, m,...) # e.g. for ARD
  evaluate_model_and_jacobian(m)
  f = free_energy(s, c, m, ...)

because each methods needs almost all parameters. Introducing a VBState may clear that up.

Disadvantage: The structure and notation of the original paper may be harder to follow. But maybe slight improvements of the algorithm justify to write our own (small) paper, that we can refer to.

Noise group parameters as normal latent parameter

VB has a clear distinction between model parameters and noise parameters. Also, due to the special restrictions of VB, the noise parameters (currently each noise model has a precision) is actually ignored and the noise is inferred anyways.

Especially regarding sampling algorithms, it would make much more sense to have the precision of the noise model as a latent parameter that gets its prior similar to the model parameters. Maybe:

problem = InferenceProblem()
...
noise_model_force = UncorrelatedSensorNoise(force_sensors)
problem.add_noise_model(noise_model_force)
problem.latent["noise_force"].add(noise_model_force.parameter_list, "precision")
problem.set_prior("noise_force", scipy.stats.gamma(...))

As stated above, however, VB needs a clear distinction between model parameters and noise parameters, which is not given right now.

According to #23 , the noise_model_force.parameter_list will somehow vanish and the line could read

problem.define_latent("noise_force",  noise_model_force, "precision")

and we could check, if noise_model_force is of type bayes.noise.NoiseModelInterface.

Motivational example

This motivation is certainly correct, but could be improved by an example to follow along. As @aklawonn wants to tackle this, his current wood compression tests (using dummy data) could be this example. Another idea of mine would be a concrete bridge where one model is the full bridge (1D beam...) with displacement/inclination sensors, another one is a compression test from multiple drilled, cylindrical core samples. Maybe, @aklawonn, you can update us on the general idea here in this issue.

Anyways, we should illustrate:

the advantages of named parameters
that not every parameter must be latent
how one (expensive forward) model can return several outputs in a structured way
that we can combine multiple model errors (aka experiments?)
that parameters of multiple experiments can be inferred simultaneously
the complexity of the example, e.g. by writing down the loglike for the problem (10+ terms! 😄 ) and sketching the laborious and error-prone process of changing things in there
interface to several solvers (VB algorithm, taralli, pymc3,...)

What else?

IMO the challenge will be to introduce the features above in a reasonable order. I have some kind of multi-step example in mind that increases in complexity. Starting with a simple loglike example that is familiar to the reader, introduce named parameters next, combine two examples and so on and so on.

VB_handle_diverging_ME

While VB iterations, it is likely to get a diverging execution of the Model Error (ME) for some reason (e.g. improper values for parameters lastly updated). In such a case, it would be nice at least to get a final VB output, which would be the best fitted parameters corresponding to the highest value of free energy up to the last iteration. Such an output is most likely not a reliable fit, but might be of any interest for example for studying purposes.

"chaining"

Out of curiosity, I quickly implemented a test case here that generates n_data (currently 2) datasets (linear model m * xs + c) and compares

a single VB run on a combined model error with all n_data:

  param:         MVN with 
                  ├── mean: [ 6.757e+00  1.002e+01]
                  ├── std:  [ 6.782e-01  4.187e-02]
  noise:         {'noise0': Gamma with 
                  └── mean: (69.11101249683999, 9)}

n_data VB runs with one of the data sets and taking the posterior as prior for the next run (thus "chaining" 😄 ):

  param:         MVN with 
                  ├── mean: [ 6.672e+00  1.003e+01]
                  ├── std:  [ 7.352e-01  4.543e-02]
  noise:         {'noise0': Gamma with 
                  └── mean: (67.5849803323776, 9)}

Potential issue: The results do not match. Should they?

Unstable number of VB-iterations

Not 100% sure, but this is what I have observed in one case-study I am currently performing, which is the VB inference of a gradient-damage material model using FEMU-F.

If I do either of:

scale some part of my forward-model output (and respectively adjust the corresponding part in Jacobian and prior noises, as well)
use different actualizations for the input parameters; e.g.: "par_actual=some_reference_constant * exp(par_dimensionless)" OR "par_actual = some_reference_constant * par_dimensionless"

, then I get different outputs from the VB, being as a result of different number of iterations the VB performs.

Again emphasize that I am not 100% sure if I have not made some mistake somewhere. Just maybe to keep in mind if anyone else will see such a behavior in his/her own examples.

Potential bug in our F formula

In Chappell's paper, F contains the terms:

F = ... + (N/2 + c0 -1)*stuff + ... - (N/2 + c -1)*stuff

Our F formula contains the terms

F = ... + (N/2 + c0 -1)*stuff + ... - (c -1)*stuff

which, using the update equation c = N/2 + c0, would expand to

F = ... + (N/2 + c0 -1)*stuff + ... - (N/2 + c0 -1)*stuff

and cancel out. Which one is correct?! 😟

Changes in visualization somehow break the CI

Not sure what exactly is happening, but the changes in the visualization break the CI, though the failing tests seem to have little in common with the changes.

Remove Gamma.Noninformative()?

We currently have Gamma.Noninformative() --> shape=0, scale = 1/3.

Several alternatives are discussed here that point out that a
Gamma(alpha=eps, beta=eps) (or Gamma(shape=eps, scale=1/eps) with eps-->0
is a more reasonable choice for a vague prior, but with its own difficulties at the limit eps=0. Plus the mean is always 1.

I am tempted to just remove the method such that every user has to make up his own mind...

What do you think?

Separate VB algorithm

It seems that the focus of this package shifts from being focused on (aka built around) the Chappell VB implementation towards our idea of "convenience code" to define somewhat general inference problems. Nothing against that!

Should we, for clarity, then separate our VB implementation? Especially, if the "convenience code" is about to be moved elsewhere.