** Dynesty version ** Dynesty 2.1.2; pip install ** Your questio

Here's the code I'm using to run it: <a href="https://1drv.ms/u/s!AvRFptpPOuRiwrN40-Q3

Run-to-run instability of logZ about dynesty HOT 8 CLOSED

asmasca commented on July 20, 2024

Run-to-run instability of logZ

from dynesty.

Comments (8)

asmasca commented on July 20, 2024 1

Here's the code I'm using to run it: https://1drv.ms/u/s!AvRFptpPOuRiwrN40-Q3CRuFZAxCrw?e=VLiZ8z

The zip file includes the data (and a script to create the data) for the cases of 1, 2 and 3 signals, and the scripts and functions needed to run it. All this discussion was about the 1 signal case. I plan to test the performance with multiple signals (which I expect will be harder for the sampler).

Ps: For reference, and in case it is useful for you, I'm running the test with Ultranest (Reactive NS). The logZ consistency is similar to that of the rslice in dynamic nested sampling. From my understanding of Ultranest, this is normal.

Cheers and thanks for the help

from dynesty.

segasai commented on July 20, 2024

Hi,

Thanks for a detailed question.
There are multiple points/questions/possible issues here.

Run -to run variation in logZ -- that is expected to from nested sampling as relies on prior volume estimates which have irreducable noise in them. Increasing the number of points does help, but I don't think you'll get an easy sqrt(N) scaling of logz errors.
If you want more accurate logz estimates, you should try use dynamic sampling, that will add points where needed to improve accuracy.
Can you please share your full testing code, this looks like this is good testcase.
Regarding the logz uncertainties. I did some testing a year or two ago, and was reasonanly confident that the errors are somewhat realistic, but there are certainly a lot of approximations done when computing them (and we had long discussion on their caclulation #306 ).

Do you know the true value of evidence for this particular problem ?

Thanks

from dynesty.

asmasca commented on July 20, 2024

Hi, thanks for the suggestions. I tested a few of them, and the results improve.

First I tried going up to 1000x Ndim live points for the regular NS, and nothing really changed except providing smoother trace plots (which I guess it's expected).

Run -to run variation in logZ -- that is expected to from nested sampling as relies on prior volume estimates which have irreducable noise in them.

I expected some run to run variation, but not that large. We usually rely in the logZ for model comparison, and having a peak-to-peak of 5 runs up to 10 makes things difficult. A difference in logZ of 5 between models is usually considered strong evidence in favour on the more significant model.

If you want more accurate logz estimates, you should try use dynamic sampling, that will add points where needed to improve accuracy.

I tried dynamic sampling with different configurations. The difference can be quite large. The results are more in line with my expectations and needs.
I could not test all configurations, as the time it takes grows a lot, but what I have is:

NS Rwalk logZ stdev: 2.21 +- 0.92
NS Rslice logZ stdev: 1.69 +- 0.56
NS Slice logZ stdev: 0.89 +-0.32

dNS Rwalk 0/100 logZ stdev: 1.96 +- 0.75 (0% evidence, 100% posterior)
dNS Rwalk 35/65 logZ stdev: 2.26 +- 0.96
dNS Rwalk 65/35 logZ stdev: 1.53 +- 0.68
dNS Rwalk 80/20 logZ stdev: 1.12 +- 0.36 (Beyond 80% evidence, it gets too slow to be practical)

dNS Slice 80/20 logZ stdev: 0.5 +- 0.12 (incomplete, will be running for 6 more hours).

With this results, I would say the regular NS works well for exploratory work (it's super fast), but for model comparison the dynamic NS with slice sampling, and 80% weight/stop for evidence calculation is neccesary (for my case). Which is not a problem on its own, one of the goals of this exercise was finding the sweetspot between efficiency and accuracy. Although I have to say watching it move from a couple seconds to 2-3 minutes per run made me sad.

Currently, I set the number of initial live points (nlive_init), but I don't touch the number of live points per batch and number of batches. Do you have any recommendation about how to configure these? The code spends most of the time in this part.

Can you please share your full testing code, this looks like this is good testcase.

Sure, no problem. I'll clean and comment the code, and I'll share it with you.

Do you know the true value of evidence for this particular problem ?

Unfortunately no. The data also has some white and correlated noise components (to simulate real data), which I guess would make it difficult to estimate.

I will compare the results with the output of Ultranest, Multinest and Polychord, because my collaborators use them. I'll update here once I have the results.

Cheers and thank you

from dynesty.

ajdittmann commented on July 20, 2024

Regarding the logz uncertainties. I did some testing a year or two ago, and was reasonanly confident that the errors are somewhat realistic, but there are certainly a lot of approximations done when computing them (and we had long discussion on their calculation calculation of H and uncertainties. #306 ).

My experience has been that this is usually the case, but sometimes the uncertainties will be too small. I was testing this the other day with a variation of the log-gamma problem and there were a few cases where the true errors in logZ were ~5 but the reported uncertainties were ~0.4. If this is of interest I can try to investigate this more systematically and share the results. MultiNest would report smaller errors and have logZ errors on the order of 100s or 1000s, which was my main interest at the time.

from dynesty.

segasai commented on July 20, 2024

NS Rwalk logZ stdev: 2.21 +- 0.92
NS Rslice logZ stdev: 1.69 +- 0.56
NS Slice logZ stdev: 0.89 +-0.32
dNS Rwalk 0/100 logZ stdev: 1.96 +- 0.75 (0% evidence, 100% posterior)
dNS Rwalk 35/65 logZ stdev: 2.26 +- 0.96
dNS Rwalk 65/35 logZ stdev: 1.53 +- 0.68
dNS Rwalk 80/20 logZ stdev: 1.12 +- 0.36 (Beyond 80% evidence, it gets too slow to be practical)
dNS Slice 80/20 logZ stdev: 0.5 +- 0.12 (incomplete, will be running for 6 more hours).

I personally prefer to use rslice as it has better tuning properties. (but I also rarely go after evidence)

With this results, I would say the regular NS works well for exploratory work (it's super fast), but for model comparison the dynamic NS with slice sampling, and 80% weight/stop for evidence calculation is neccesary (for my case). Which is not a problem on its own, one of the goals of this exercise was finding the sweetspot between efficiency and accuracy. Although I have to say watching it move from a couple seconds to 2-3 minutes per run made me sad.

Hard to say if that's really too slow or not. I'd say for a single CPU, and high-dimensional problem with the estimate of the integral 2-3 minutes seems sensible.

Currently, I set the number of initial live points (nlive_init), but I don't touch the number of live points per batch and number of batches. Do you have any recommendation about how to configure these? The code spends most of the time in this part.

The number of batches should be decided automatically by the stopping criteria.
Regarding the number of live-points init/batch. I'd say the init large points needs to be increased when you worry about missing posterior modes. If that's not an issue I think default would be fine.

Can you please share your full testing code, this looks like this is good testcase.

Sure, no problem. I'll clean and comment the code, and I'll share it with you.

great, thanks

Do you know the true value of evidence for this particular problem ?

Unfortunately no. The data also has some white and correlated noise components (to simulate real data), which I guess would make it difficult to estimate.

I will compare the results with the output of Ultranest, Multinest and Polychord, because my collaborators use them. I'll update here once I have the results.

from dynesty.

segasai commented on July 20, 2024

Regarding the logz uncertainties. I did some testing a year or two ago, and was reasonanly confident that the errors are somewhat realistic, but there are certainly a lot of approximations done when computing them (and we had long discussion on their calculation calculation of H and uncertainties. #306 ).

My experience has been that this is usually the case, but sometimes the uncertainties will be too small. I was testing this the other day with a variation of the log-gamma problem and there were a few cases where the true errors in logZ were ~5 but the reported uncertainties were ~0.4. If this is of interest I can try to investigate this more systematically and share the results. MultiNest would report smaller errors and have logZ errors on the order of 100s or 1000s, which was my main interest at the time.

I would certainly be interested in seeing cases of wildly incorrect logz error (assuming not too pathological posterior).
A couple of years ago I computed a different equation for the logz error uncertainty, but I didn't have a good testcase to see if that's an improvement comparing to what we have now. The only case where logz deviation wrt the true value much larger than uncertainty is somewhat expected is when using MCMC(rwalk/slice) proposals and not enough steps is made/or steps are too small -- that is guaranteed to bias the logz (and there is no easy fix for that other than do more MCMC steps)

from dynesty.

asmasca commented on July 20, 2024

Just a quick comment, after some tests finished last night. The final values for the stability I got for the rwalk, rslice and slice are

dNS Rwalk 80/20 logZ stdev: 1.12 +- 0.36
dNS Random slice 80/20 logZ stdev: 0.67 +- 0.28
dNS Slice 80/20 logZ stdev: 0.4 +- 0.15

With these results, the 3 samplers are valid for our usual criteria (Delta LogZ 5 ~ 3 sigma), with no obvious dependence on the number of live points (tested 50 to 1000, for 5 dim).

Here's the same plot as in the original post, with the evolution of the logZ and the parameters as a function of live points, this time for the DNS, with the slice sampler. The vertical axis of the logZ is the same as before, for a better visual comparison.

With this, I think my original question can be considered solved. It was all a matter of using a sub-optimal configuration for my problem. Thank you for pointing me in the right direction.

Hard to say if that's really too slow or not. I'd say for a single CPU, and high-dimensional problem with the estimate of the integral 2-3 minutes seems sensible.

It's certainly within reason, and still much faster than doing a regular MCMC. Although the scaling to more "realistic" problem worries me a bit. This is a 5 dimensions model, and we usuallt work up to 30-50 dimensions.

*This was on 2x Epyc system with 112 cores, although with such a small problem most of the CPU was idling.

I question regarding the time it takes in different configurations. In the regular NS, the time increases with the number of live points (which I understand). In the dynamic NS the time decreases with an increased number of initial live points. Is this because the integral is better defined by the time it starts with the "dynamic" part, and then it can solve it with less batches?

from dynesty.

segasai commented on July 20, 2024

With this, I think my original question can be considered solved. It was all a matter of using a sub-optimal configuration for my problem. Thank you for pointing me in the right direction.

That's great!
(I saw earlier that you were testing 2.1.2 version. I would suggest double check 2.1.3; I am not expecting significant improvements/regressions, but it is better to test against the latest version of the code )

I would still appreciate if you can share your test problem. I would still like to understand what's happening with a static nested run. And it's good to have a collection of difficult problem for the test.

Hard to say if that's really too slow or not. I'd say for a single CPU, and high-dimensional problem with the estimate of the integral 2-3 minutes seems sensible.

It's certainly within reason, and still much faster than doing a regular MCMC. Although the scaling to more "realistic" problem worries me a bit. This is a 5 dimensions model, and we usuallt work up to 30-50 dimensions.

My main worry usually in sampling really high dimensional spaces is less with the dimensionality itself, but more whether there multiple modes or very complex posterior shapes, as it is very easy to miss modes in such a huge volume, but if the posterior is reasonably benign, i think it is not too hard to sample.

*This was on 2x Epyc system with 112 cores, although with such a small problem most of the CPU was idling.

I question regarding the time it takes in different configurations. In the regular NS, the time increases with the number of live points (which I understand). In the dynamic NS the time decreases with an increased number of initial live points. Is this because the integral is better defined by the time it starts with the "dynamic" part, and then it can solve it with less batches?

I don't have a formula at hand but.

The initial run of dynamic sampling will take equal time to the static run with the same number of live points
The number of required batches will be typically lower the more live points you have in your initial run.

from dynesty.

Run-to-run instability of logZ about dynesty HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent