Giter Club home page Giter Club logo

Comments (6)

lacerbi avatar lacerbi commented on June 15, 2024 1

Thanks @matt-graham - I am aware of the GPRy approach but we have not tried it out it - it is an interesting idea that might work in practice, and was planning to also give it a try.

Still, it remains an ad hoc heuristic since it doesn't really include the measurement into the GP surrogate; the SVM and the GP model are entirely distinct. For example, not sure how it avoids that whole areas of parameter space are ruled out just because one or two computations failed; or that the acquisition function would keep trying to go to the border of the unfeasible region (since it was a good region to explore before, it likely remains a good region to explore now; the GP has not changed, unless e.g. some additional heuristic "repulsion" term is added to the acquisition function; although the noise introduced by the batch acquisition might help).

The approach I was talking about would be more principled (but, of course, more costly), in that you do include the observation in the GP model, but instead of making it a standard observation with Gaussian likelihood, you make it a "censored" observation that just informs the model that "the observed value is somewhere below this threshold", where the threshold could be some low density value (but not insanely low). The point is that we just want to log-density to be very low, we don't care about its exact value. However, a censored-observation model is non-Gaussian so it requires approximate inference for the GP, which is a pain.

I have been thinking about this for a very long time (since the original VBMC paper more or less), but never got to it mostly because implementing it in MATLAB was out of the question (no autodiff). Incidentally, a recent paper worked out a nice way to implement censored observations in GP models, exactly along the lines we have been thinking about (not for VBMC of course; they just present it for general GP regression): https://link.springer.com/article/10.1007/s11222-023-10225-3

PS: For now, we got to the stage of "unlocking" sparse variational GPs for VBMC in the preprint I linked above; this is a stepping stone towards more stuff (since once you can do approximate inference effectively, then you open up to more complex surrogate GP models).

from pyvbmc.

matt-graham avatar matt-graham commented on June 15, 2024

On some further exploration of the documentation I found this relevant note in the VBMC FAQ, which also suggests my proposed approach of outputting a finite negative value as a proxy for negative infinity is not a good one.

from pyvbmc.

lacerbi avatar lacerbi commented on June 15, 2024

Thanks for the comment! This is actually a non-trivial issue to deal with when working with surrogate models of the log-density.
The original VBMC algorithm does not deal with Infs and NaNs, assuming the output of the log-density is always well-behaved. This is indeed a limitation, for now we should probably mention it more explicitly in the documentation (besides the FAQ), as you also suggest above:

it may be worth explicitly mentioning this is disallowed in the documentation

As for future plans, we do want to address the issues of zero-density (or NaNs)

A naive workaround of outputting an arbitrary large negative number will break the GP surrogate, but a heuristic dynamical approach that sets this value based on the current observed values might work, combined with (fake) observation noise that informs (Py)VBMC not to take that observation too seriously (for the latter, see this preprint, Section 3.5 on noise shaping).

In an ideal world, we should use a logistic non-Gaussian likelihood for the GP model that informs the GP surrogate that the observation is below a threshold. However, introduction of non-Gaussian likelihoods requires approximate inference for the GP, which is a major modification of the algorithm and a separate research project in itself.

from pyvbmc.

matt-graham avatar matt-graham commented on June 15, 2024

Thanks @lacerbi for the detailed explanation. Having something slightly more prominent in the documentation I think would be good though I did end up finding the relevant FAQ so not essential.

Out of interest, in terms of the possible longer term fixes, is the latter approach you suggest of informing the GP surrogate the target is below a threshold in a similar vein to that used by the authors of the GPry package / paper employ for this issue of simultaneously fitting a classifier (they use a support vector machine) which partitions the latent space into regions with well-defined target (log density) values and those with non-finite / non-defined values? That seemed an interesting idea but I haven't had a chance to try out GPry yet to see how well it works in practice.

from pyvbmc.

Bobby-Huggins avatar Bobby-Huggins commented on June 15, 2024

Regarding the extraneous np.any call, you are correct @matt-graham: It has been removed in pull #140.

from pyvbmc.

matt-graham avatar matt-graham commented on June 15, 2024

Thanks @Bobby-Huggins and for the additions to the documentation making the requirements for the target log density function more visible. Closing this as now resolved from my perspective.

from pyvbmc.

Related Issues (12)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.