Hello, I have noticed a strange behavior of <code class="notranslate

I think the <a class="issue-link js-issue-link" data-error-text="Failed to load title"

Parallelization issue with "unif" sampler about dynesty HOT 8 CLOSED

MikhailBeznogov commented on August 20, 2024

Parallelization issue with "unif" sampler

from dynesty.

Comments (8)

segasai commented on August 20, 2024

Hi,

I'll start with this example that shows that the uniform sampler does utilise all the cores if you have a heavy to compute function:

import numpy as np
import pytest
import dynesty
import multiprocessing as mp
import dynesty.pool as dypool


nlive = 1000
printing = True

ndim = 10
gau_s = 0.01


def loglike_gau(x):
    for i in range(1000000):
        1 + 1
    return (-0.5 * np.log(2 * np.pi) * ndim - np.log(gau_s) * ndim -
            0.5 * np.sum((x - 0.5)**2) / gau_s**2)


def prior_transform_gau(x):
    return x



LOGZ_TRUTH_GAU = 0
LOGZ_TRUTH_EGG = 235.856


def test_pool_samplers():
    # this is to test how the samplers are dealing with queue_size>1
    rstate = np.random.default_rng(2)

    with mp.Pool(10) as pool:
        sampler = dynesty.NestedSampler(loglike_gau,
                                        prior_transform_gau,
                                        ndim,
                                        nlive=nlive,
                                        sample='unif',
                                        pool=pool,
                                        queue_size=10,
                                        rstate=rstate,
                                        first_update={
                                            'min_eff': 90,
                                            'min_ncall': 10
                                        })
        sampler.run_nested(print_progress=printing)
        assert (abs(LOGZ_TRUTH_GAU - sampler.results['logz'][-1]) <
                5. * sampler.results['logzerr'][-1])


if __name__ == '__main__':
    test_pool_samplers()

Regarding what's happening in your case. First there is not enough info at the moment to diagnose (one needs a reproduceable example).
But the key difference of uniform sampling is that the proposal of points within ellipsoids is done in a single thread. And only later the likelihoods are being evaluated in parallel.

A possibility for what you're seeing is that the bounding ellipsoid you get for your problem is extreme (i.e. very elongated, much longer than the length of the cube), in that case most of the proposed points within the ellipsoid will be outside the cube and the code will try again etc, etc and that will all be done in a single thread, before the likelihood function is evaluated.

Another possibility is that your function is just too heavy to pickle and that dominates the overheads.

You can test the first hypothesis by lowering the threshold for the warning here

dynesty/py/dynesty/nestedsamplers.py

Line 705 in 2d49cf9

if niter > threshold_warning:

The second hypothesis you can test by using dynesty.pool.Pool which eliminates the pickling overhead.

finally, if you cannot share a reproduceable example, you should at least share the exact way you call dynesty (with all the information, such as nlive, ndim etc. etc ) and ideally all the output from the problematic unif run.

from dynesty.

MikhailBeznogov commented on August 20, 2024

Hello,

Thank you for your help. I will address you questions below.

Here is how I call dynesty with "over-subscription" (ndim=7,npoints=1000):

with Pool() as MP_pool:
    sampler  = dynesty.DynamicNestedSampler(Chi2_Tot,Prior,ndim,bound='multi',
                                            sample='unif', 
                                            first_update={'min_eff':0.50},
                                            update_interval=500.5,
                                            bootstrap=50,enlarge=1.0,
                                            pool = MP_pool,queue_size=1600)
    sampler.run_nested(n_effective=10000,dlogz_init=0.05,nlive_init=npoints,
                       nlive_batch=500)

I do not think that pickling is an issue as dynesty works efficiently in parallel with other samplers (see rslice examples in my opening post). Moreover, emcee and my tests where I evaluate likelihood in parallel using map of multiprocessing pool also work fine and with high parallel efficiency. The former requires enough (>1000) walkers and the latter requires setting chunksize to at least 100 to be really parallel efficient, but I suppose it is due to the fact that the likelihood takes only ~1 ms to evaluate and multiprocessing pool inevitably introduces some overhead.
I think that very elongated ellipsoids might be indeed the cause of the issue. The posterior distributions of some of the parameters have very heavy tails going to the limits of the model parameters' ranges. If this is the case, is there any way to circumvent the issue? I really prefer the unif sampler as it is more robust and does not require to estimate the proper number of walks or slices.

I will send a working example privately.

from dynesty.

segasai commented on August 20, 2024

Hi,

I don't quite agree with your comment number 2. The reason is that when you use rslice or rwalk with say walks=50, it means that each thread will execute at least 50 likelihood calls per one pickle. When using the uniform sampler it'll be always 1 call per 1 pickle, because of that rslice/rwalk will look better in terms of parallelisation.

But certainly option 3 is quite possible.
And I think there are are a few possible improvements in that area. One of which is doing the ellipsoidal sampling in parallel (it will have an overhead of sending over the ellipsoidal bounds), the other one is detecting the crazy ellipsoids and maybe reshaping them somewhat

from dynesty.

segasai commented on August 20, 2024

An update on my side.
I had a bit of time to look into this. And a couple of points

If you make your likelihood function computationally heavier, the parallelization works as it should, so it's clearly just an inefficiency that is present for fast likelihood functions.
It seems there are some issues (see #427), but maybe other ones that lead to inefficient sampling in some cases. I'm still investigating this.

from dynesty.

MikhailBeznogov commented on August 20, 2024

Hello,

Thanks for the update.

If fast likelihood functions cause the sampler to lose parallel efficiency, perhaps adding an option to vectorize them (i.e., evaluate likelihood at more than one point per call) will help? For example, this is implemented in JohannesBuchner/UltraNest and seems to be efficient if the number of requested points per one call is big enough. If the likelihood does not support vectorization itself, it can be easily "wrapped" by mapping:

def Chi2_Tot_Vect(points):
    return np.array(list(map(Chi2_Tot,points)))

from dynesty.

segasai commented on August 20, 2024

I think the #427 addresses some of the issues of updating bounds, so I am reasonably convinced that now things are functioning as they should be.
But it is clear that in this case

the ellipsoidal sampling in your problem is inefficient in the end (I don't have time to investigate that, but I assume that must be related to the shape of the posterior)
If the likelihood is very fast the parallelisation of the uniform sampler is not very efficient. That has various solutions, but that's a long-term project.

from dynesty.

segasai commented on August 20, 2024

I'm closing this issue for the time being.
The different parallelization scheme for unif sampler would be still be good, but I don't think there is a bug per se.

from dynesty.

MikhailBeznogov commented on August 20, 2024

Hello,

Sorry for not replying earlier.

Yes, I agree, it is not a bug, just a specific feature of implementation.

Thank you for your help.

from dynesty.

Parallelization issue with "unif" sampler about dynesty HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent