Raising issue as part of JOSS review <a class="issue-link js-issue-link" data-error-te

Thanks <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-u

On some further exploration of the documentation <a href="https://github.com/acerbilab

Thanks <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-u

`ValueError` raised in `FunctionLogger` when target density is zero about pyvbmc HOT 6 CLOSED

matt-graham commented on June 15, 2024

`ValueError` raised in `FunctionLogger` when target density is zero

from pyvbmc.

Comments (6)

lacerbi commented on June 15, 2024 1

Thanks @matt-graham - I am aware of the GPRy approach but we have not tried it out it - it is an interesting idea that might work in practice, and was planning to also give it a try.

Still, it remains an ad hoc heuristic since it doesn't really include the measurement into the GP surrogate; the SVM and the GP model are entirely distinct. For example, not sure how it avoids that whole areas of parameter space are ruled out just because one or two computations failed; or that the acquisition function would keep trying to go to the border of the unfeasible region (since it was a good region to explore before, it likely remains a good region to explore now; the GP has not changed, unless e.g. some additional heuristic "repulsion" term is added to the acquisition function; although the noise introduced by the batch acquisition might help).

The approach I was talking about would be more principled (but, of course, more costly), in that you do include the observation in the GP model, but instead of making it a standard observation with Gaussian likelihood, you make it a "censored" observation that just informs the model that "the observed value is somewhere below this threshold", where the threshold could be some low density value (but not insanely low). The point is that we just want to log-density to be very low, we don't care about its exact value. However, a censored-observation model is non-Gaussian so it requires approximate inference for the GP, which is a pain.

I have been thinking about this for a very long time (since the original VBMC paper more or less), but never got to it mostly because implementing it in MATLAB was out of the question (no autodiff). Incidentally, a recent paper worked out a nice way to implement censored observations in GP models, exactly along the lines we have been thinking about (not for VBMC of course; they just present it for general GP regression): https://link.springer.com/article/10.1007/s11222-023-10225-3

PS: For now, we got to the stage of "unlocking" sparse variational GPs for VBMC in the preprint I linked above; this is a stepping stone towards more stuff (since once you can do approximate inference effectively, then you open up to more complex surrogate GP models).

from pyvbmc.

matt-graham commented on June 15, 2024

On some further exploration of the documentation I found this relevant note in the VBMC FAQ, which also suggests my proposed approach of outputting a finite negative value as a proxy for negative infinity is not a good one.

from pyvbmc.

lacerbi commented on June 15, 2024

Thanks for the comment! This is actually a non-trivial issue to deal with when working with surrogate models of the log-density.
The original VBMC algorithm does not deal with Infs and NaNs, assuming the output of the log-density is always well-behaved. This is indeed a limitation, for now we should probably mention it more explicitly in the documentation (besides the FAQ), as you also suggest above:

it may be worth explicitly mentioning this is disallowed in the documentation

As for future plans, we do want to address the issues of zero-density (or NaNs)

A naive workaround of outputting an arbitrary large negative number will break the GP surrogate, but a heuristic dynamical approach that sets this value based on the current observed values might work, combined with (fake) observation noise that informs (Py)VBMC not to take that observation too seriously (for the latter, see this preprint, Section 3.5 on noise shaping).

In an ideal world, we should use a logistic non-Gaussian likelihood for the GP model that informs the GP surrogate that the observation is below a threshold. However, introduction of non-Gaussian likelihoods requires approximate inference for the GP, which is a major modification of the algorithm and a separate research project in itself.

from pyvbmc.

matt-graham commented on June 15, 2024

Thanks @lacerbi for the detailed explanation. Having something slightly more prominent in the documentation I think would be good though I did end up finding the relevant FAQ so not essential.

Out of interest, in terms of the possible longer term fixes, is the latter approach you suggest of informing the GP surrogate the target is below a threshold in a similar vein to that used by the authors of the GPry package / paper employ for this issue of simultaneously fitting a classifier (they use a support vector machine) which partitions the latent space into regions with well-defined target (log density) values and those with non-finite / non-defined values? That seemed an interesting idea but I haven't had a chance to try out GPry yet to see how well it works in practice.

from pyvbmc.

Bobby-Huggins commented on June 15, 2024

Regarding the extraneous np.any call, you are correct @matt-graham: It has been removed in pull #140.

from pyvbmc.

matt-graham commented on June 15, 2024

Thanks @Bobby-Huggins and for the additions to the documentation making the requirements for the target log density function more visible. Closing this as now resolved from my perspective.

from pyvbmc.

`ValueError` raised in `FunctionLogger` when target density is zero about pyvbmc HOT 6 CLOSED

Comments (6)

Related Issues (12)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent