Comments (6)
Thanks @matt-graham - I am aware of the GPRy approach but we have not tried it out it - it is an interesting idea that might work in practice, and was planning to also give it a try.
Still, it remains an ad hoc heuristic since it doesn't really include the measurement into the GP surrogate; the SVM and the GP model are entirely distinct. For example, not sure how it avoids that whole areas of parameter space are ruled out just because one or two computations failed; or that the acquisition function would keep trying to go to the border of the unfeasible region (since it was a good region to explore before, it likely remains a good region to explore now; the GP has not changed, unless e.g. some additional heuristic "repulsion" term is added to the acquisition function; although the noise introduced by the batch acquisition might help).
The approach I was talking about would be more principled (but, of course, more costly), in that you do include the observation in the GP model, but instead of making it a standard observation with Gaussian likelihood, you make it a "censored" observation that just informs the model that "the observed value is somewhere below this threshold", where the threshold could be some low density value (but not insanely low). The point is that we just want to log-density to be very low, we don't care about its exact value. However, a censored-observation model is non-Gaussian so it requires approximate inference for the GP, which is a pain.
I have been thinking about this for a very long time (since the original VBMC paper more or less), but never got to it mostly because implementing it in MATLAB was out of the question (no autodiff
). Incidentally, a recent paper worked out a nice way to implement censored observations in GP models, exactly along the lines we have been thinking about (not for VBMC of course; they just present it for general GP regression): https://link.springer.com/article/10.1007/s11222-023-10225-3
PS: For now, we got to the stage of "unlocking" sparse variational GPs for VBMC in the preprint I linked above; this is a stepping stone towards more stuff (since once you can do approximate inference effectively, then you open up to more complex surrogate GP models).
from pyvbmc.
On some further exploration of the documentation I found this relevant note in the VBMC FAQ, which also suggests my proposed approach of outputting a finite negative value as a proxy for negative infinity is not a good one.
from pyvbmc.
Thanks for the comment! This is actually a non-trivial issue to deal with when working with surrogate models of the log-density.
The original VBMC algorithm does not deal with Inf
s and NaN
s, assuming the output of the log-density is always well-behaved. This is indeed a limitation, for now we should probably mention it more explicitly in the documentation (besides the FAQ), as you also suggest above:
it may be worth explicitly mentioning this is disallowed in the documentation
As for future plans, we do want to address the issues of zero-density (or NaN
s)
A naive workaround of outputting an arbitrary large negative number will break the GP surrogate, but a heuristic dynamical approach that sets this value based on the current observed values might work, combined with (fake) observation noise that informs (Py)VBMC not to take that observation too seriously (for the latter, see this preprint, Section 3.5 on noise shaping).
In an ideal world, we should use a logistic non-Gaussian likelihood for the GP model that informs the GP surrogate that the observation is below a threshold. However, introduction of non-Gaussian likelihoods requires approximate inference for the GP, which is a major modification of the algorithm and a separate research project in itself.
from pyvbmc.
Thanks @lacerbi for the detailed explanation. Having something slightly more prominent in the documentation I think would be good though I did end up finding the relevant FAQ so not essential.
Out of interest, in terms of the possible longer term fixes, is the latter approach you suggest of informing the GP surrogate the target is below a threshold in a similar vein to that used by the authors of the GPry package / paper employ for this issue of simultaneously fitting a classifier (they use a support vector machine) which partitions the latent space into regions with well-defined target (log density) values and those with non-finite / non-defined values? That seemed an interesting idea but I haven't had a chance to try out GPry yet to see how well it works in practice.
from pyvbmc.
Regarding the extraneous np.any
call, you are correct @matt-graham: It has been removed in pull #140.
from pyvbmc.
Thanks @Bobby-Huggins and for the additions to the documentation making the requirements for the target log density function more visible. Closing this as now resolved from my perspective.
from pyvbmc.
Related Issues (12)
- `test_minimize_adam_matyas_with_noise` intermittently failing HOT 3
- Some possible mistakes in docstring for `VBMC` class HOT 2
- Allowing finer grained control over random variate generation HOT 1
- Some options appear to not have any effect HOT 1
- summer intern feedback and open issues
- Improve Logging HOT 2
- Occasionally the finalization step produces a worse ELBO than the previous steps HOT 1
- `FunctionLogger.Xn` doesn't represent number of training inputs HOT 3
- Bugs on `plb` and `pub` HOT 1
- `self.K` is always `self.options.get("kwarmup")` HOT 4
- Bounds for gp hyperparameters HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pyvbmc.