Giter Club home page Giter Club logo

Comments (8)

asmasca avatar asmasca commented on July 20, 2024 1

Here's the code I'm using to run it: https://1drv.ms/u/s!AvRFptpPOuRiwrN40-Q3CRuFZAxCrw?e=VLiZ8z

The zip file includes the data (and a script to create the data) for the cases of 1, 2 and 3 signals, and the scripts and functions needed to run it. All this discussion was about the 1 signal case. I plan to test the performance with multiple signals (which I expect will be harder for the sampler).

Ps: For reference, and in case it is useful for you, I'm running the test with Ultranest (Reactive NS). The logZ consistency is similar to that of the rslice in dynamic nested sampling. From my understanding of Ultranest, this is normal.

Cheers and thanks for the help

from dynesty.

segasai avatar segasai commented on July 20, 2024

Hi,

Thanks for a detailed question.
There are multiple points/questions/possible issues here.

  • Run -to run variation in logZ -- that is expected to from nested sampling as relies on prior volume estimates which have irreducable noise in them. Increasing the number of points does help, but I don't think you'll get an easy sqrt(N) scaling of logz errors.
  • If you want more accurate logz estimates, you should try use dynamic sampling, that will add points where needed to improve accuracy.
  • Can you please share your full testing code, this looks like this is good testcase.
  • Regarding the logz uncertainties. I did some testing a year or two ago, and was reasonanly confident that the errors are somewhat realistic, but there are certainly a lot of approximations done when computing them (and we had long discussion on their caclulation #306 ).

Do you know the true value of evidence for this particular problem ?

Thanks

from dynesty.

asmasca avatar asmasca commented on July 20, 2024

Hi, thanks for the suggestions. I tested a few of them, and the results improve.

First I tried going up to 1000x Ndim live points for the regular NS, and nothing really changed except providing smoother trace plots (which I guess it's expected).

Run -to run variation in logZ -- that is expected to from nested sampling as relies on prior volume estimates which have irreducable noise in them.

I expected some run to run variation, but not that large. We usually rely in the logZ for model comparison, and having a peak-to-peak of 5 runs up to 10 makes things difficult. A difference in logZ of 5 between models is usually considered strong evidence in favour on the more significant model.

If you want more accurate logz estimates, you should try use dynamic sampling, that will add points where needed to improve accuracy.

I tried dynamic sampling with different configurations. The difference can be quite large. The results are more in line with my expectations and needs.
I could not test all configurations, as the time it takes grows a lot, but what I have is:

NS Rwalk logZ stdev: 2.21 +- 0.92
NS Rslice logZ stdev: 1.69 +- 0.56
NS Slice logZ stdev: 0.89 +-0.32

dNS Rwalk 0/100 logZ stdev: 1.96 +- 0.75 (0% evidence, 100% posterior)
dNS Rwalk 35/65 logZ stdev: 2.26 +- 0.96
dNS Rwalk 65/35 logZ stdev: 1.53 +- 0.68
dNS Rwalk 80/20 logZ stdev: 1.12 +- 0.36 (Beyond 80% evidence, it gets too slow to be practical)

dNS Slice 80/20 logZ stdev: 0.5 +- 0.12 (incomplete, will be running for 6 more hours).

With this results, I would say the regular NS works well for exploratory work (it's super fast), but for model comparison the dynamic NS with slice sampling, and 80% weight/stop for evidence calculation is neccesary (for my case). Which is not a problem on its own, one of the goals of this exercise was finding the sweetspot between efficiency and accuracy. Although I have to say watching it move from a couple seconds to 2-3 minutes per run made me sad.

Currently, I set the number of initial live points (nlive_init), but I don't touch the number of live points per batch and number of batches. Do you have any recommendation about how to configure these? The code spends most of the time in this part.

Can you please share your full testing code, this looks like this is good testcase.

Sure, no problem. I'll clean and comment the code, and I'll share it with you.

Do you know the true value of evidence for this particular problem ?

Unfortunately no. The data also has some white and correlated noise components (to simulate real data), which I guess would make it difficult to estimate.

I will compare the results with the output of Ultranest, Multinest and Polychord, because my collaborators use them. I'll update here once I have the results.

Cheers and thank you

from dynesty.

ajdittmann avatar ajdittmann commented on July 20, 2024
  • Regarding the logz uncertainties. I did some testing a year or two ago, and was reasonanly confident that the errors are somewhat realistic, but there are certainly a lot of approximations done when computing them (and we had long discussion on their calculation calculation of H and uncertainties. #306 ).

My experience has been that this is usually the case, but sometimes the uncertainties will be too small. I was testing this the other day with a variation of the log-gamma problem and there were a few cases where the true errors in logZ were ~5 but the reported uncertainties were ~0.4. If this is of interest I can try to investigate this more systematically and share the results. MultiNest would report smaller errors and have logZ errors on the order of 100s or 1000s, which was my main interest at the time.

from dynesty.

segasai avatar segasai commented on July 20, 2024

NS Rwalk logZ stdev: 2.21 +- 0.92
NS Rslice logZ stdev: 1.69 +- 0.56
NS Slice logZ stdev: 0.89 +-0.32
dNS Rwalk 0/100 logZ stdev: 1.96 +- 0.75 (0% evidence, 100% posterior)
dNS Rwalk 35/65 logZ stdev: 2.26 +- 0.96
dNS Rwalk 65/35 logZ stdev: 1.53 +- 0.68
dNS Rwalk 80/20 logZ stdev: 1.12 +- 0.36 (Beyond 80% evidence, it gets too slow to be practical)
dNS Slice 80/20 logZ stdev: 0.5 +- 0.12 (incomplete, will be running for 6 more hours).

I personally prefer to use rslice as it has better tuning properties. (but I also rarely go after evidence)

With this results, I would say the regular NS works well for exploratory work (it's super fast), but for model comparison the dynamic NS with slice sampling, and 80% weight/stop for evidence calculation is neccesary (for my case). Which is not a problem on its own, one of the goals of this exercise was finding the sweetspot between efficiency and accuracy. Although I have to say watching it move from a couple seconds to 2-3 minutes per run made me sad.

Hard to say if that's really too slow or not. I'd say for a single CPU, and high-dimensional problem with the estimate of the integral 2-3 minutes seems sensible.

Currently, I set the number of initial live points (nlive_init), but I don't touch the number of live points per batch and number of batches. Do you have any recommendation about how to configure these? The code spends most of the time in this part.

The number of batches should be decided automatically by the stopping criteria.
Regarding the number of live-points init/batch. I'd say the init large points needs to be increased when you worry about missing posterior modes. If that's not an issue I think default would be fine.

Can you please share your full testing code, this looks like this is good testcase.

Sure, no problem. I'll clean and comment the code, and I'll share it with you.

great, thanks

Do you know the true value of evidence for this particular problem ?

Unfortunately no. The data also has some white and correlated noise components (to simulate real data), which I guess would make it difficult to estimate.

I will compare the results with the output of Ultranest, Multinest and Polychord, because my collaborators use them. I'll update here once I have the results.

from dynesty.

segasai avatar segasai commented on July 20, 2024
  • Regarding the logz uncertainties. I did some testing a year or two ago, and was reasonanly confident that the errors are somewhat realistic, but there are certainly a lot of approximations done when computing them (and we had long discussion on their calculation calculation of H and uncertainties. #306 ).

My experience has been that this is usually the case, but sometimes the uncertainties will be too small. I was testing this the other day with a variation of the log-gamma problem and there were a few cases where the true errors in logZ were ~5 but the reported uncertainties were ~0.4. If this is of interest I can try to investigate this more systematically and share the results. MultiNest would report smaller errors and have logZ errors on the order of 100s or 1000s, which was my main interest at the time.

I would certainly be interested in seeing cases of wildly incorrect logz error (assuming not too pathological posterior).
A couple of years ago I computed a different equation for the logz error uncertainty, but I didn't have a good testcase to see if that's an improvement comparing to what we have now. The only case where logz deviation wrt the true value much larger than uncertainty is somewhat expected is when using MCMC(rwalk/slice) proposals and not enough steps is made/or steps are too small -- that is guaranteed to bias the logz (and there is no easy fix for that other than do more MCMC steps)

from dynesty.

asmasca avatar asmasca commented on July 20, 2024

Just a quick comment, after some tests finished last night. The final values for the stability I got for the rwalk, rslice and slice are

dNS Rwalk 80/20 logZ stdev: 1.12 +- 0.36
dNS Random slice 80/20 logZ stdev: 0.67 +- 0.28
dNS Slice 80/20 logZ stdev: 0.4 +- 0.15

With these results, the 3 samplers are valid for our usual criteria (Delta LogZ 5 ~ 3 sigma), with no obvious dependence on the number of live points (tested 50 to 1000, for 5 dim).

Here's the same plot as in the original post, with the evolution of the logZ and the parameters as a function of live points, this time for the DNS, with the slice sampler. The vertical axis of the logZ is the same as before, for a better visual comparison.

With this, I think my original question can be considered solved. It was all a matter of using a sub-optimal configuration for my problem. Thank you for pointing me in the right direction.

params_evol

Hard to say if that's really too slow or not. I'd say for a single CPU, and high-dimensional problem with the estimate of the integral 2-3 minutes seems sensible.

It's certainly within reason, and still much faster than doing a regular MCMC. Although the scaling to more "realistic" problem worries me a bit. This is a 5 dimensions model, and we usuallt work up to 30-50 dimensions.

*This was on 2x Epyc system with 112 cores, although with such a small problem most of the CPU was idling.

I question regarding the time it takes in different configurations. In the regular NS, the time increases with the number of live points (which I understand). In the dynamic NS the time decreases with an increased number of initial live points. Is this because the integral is better defined by the time it starts with the "dynamic" part, and then it can solve it with less batches?

from dynesty.

segasai avatar segasai commented on July 20, 2024

With this, I think my original question can be considered solved. It was all a matter of using a sub-optimal configuration for my problem. Thank you for pointing me in the right direction.

That's great!
(I saw earlier that you were testing 2.1.2 version. I would suggest double check 2.1.3; I am not expecting significant improvements/regressions, but it is better to test against the latest version of the code )

I would still appreciate if you can share your test problem. I would still like to understand what's happening with a static nested run. And it's good to have a collection of difficult problem for the test.

params_evol

Hard to say if that's really too slow or not. I'd say for a single CPU, and high-dimensional problem with the estimate of the integral 2-3 minutes seems sensible.

It's certainly within reason, and still much faster than doing a regular MCMC. Although the scaling to more "realistic" problem worries me a bit. This is a 5 dimensions model, and we usuallt work up to 30-50 dimensions.

My main worry usually in sampling really high dimensional spaces is less with the dimensionality itself, but more whether there multiple modes or very complex posterior shapes, as it is very easy to miss modes in such a huge volume, but if the posterior is reasonably benign, i think it is not too hard to sample.

*This was on 2x Epyc system with 112 cores, although with such a small problem most of the CPU was idling.

I question regarding the time it takes in different configurations. In the regular NS, the time increases with the number of live points (which I understand). In the dynamic NS the time decreases with an increased number of initial live points. Is this because the integral is better defined by the time it starts with the "dynamic" part, and then it can solve it with less batches?

I don't have a formula at hand but.

  1. The initial run of dynamic sampling will take equal time to the static run with the same number of live points
  2. The number of required batches will be typically lower the more live points you have in your initial run.

from dynesty.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.