Giter Club home page Giter Club logo

xyzpy's Introduction

xyzpy logo

tests codecov Codacy Badge Docs PyPI Anaconda-Server Badge


xyzpy is python library for efficiently generating, manipulating and plotting data with a lot of dimensions, of the type that often occurs in numerical simulations. It stands wholly atop the labelled N-dimensional array library xarray. The project's documentation is hosted on readthedocs.

The aim is to take the pain and errors out of generating and exploring data with a high number of possible parameters. This means:

  • you don't have to write super nested for loops
  • you don't have to remember which arrays/dimensions belong to which variables/parameters
  • you don't have to parallelize over or distribute runs yourself
  • you don't have to worry about loading, saving and merging disjoint data
  • you don't have to guess when a set of runs is going to finish
  • you don't have to write batch submission scripts or leave the notebook to use SGE, PBS or SLURM

As well as the ability to automatically parallelize over runs, xyzpy provides the Crop object that allows runs and results to be written to disk, these can then be run by any process with access to the files - e.g. a batch system such as SGE, PBS or SLURM - or just serve as a convenient persistent progress mechanism.

Once your data has been aggregated into a xarray.Dataset or pandas.DataFrame there exists many powerful visualization tools such as seaborn, altair and holoviews / hvplot. To these xyzpy adds also a simple 'oneliner' interface for interactively plotting the data using bokeh, or for static, publication ready figures using matplotlib, whilst being able to see the dependence on up to 4 dimensions at once.

example

Please see the docs for more information.

xyzpy's People

Contributors

adamcallison avatar jcmgray avatar toddrme2178 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

xyzpy's Issues

Merging case_runner crop fails

Hi @jcmgray,

I noticed that crop.reap() fails when the there are several crops created by a Harvester that uses a case_runner instance. The reasons seems to be that case_runners return pandas DataFrames instead of xarray DataSets (unlike combo_runners) and hence the merge function is called on the resulting DataFrame object, which fails accordingly:

TypeError: merge() got an unexpected keyword argument 'compat'

My current workaround is to merge all cases manually so that I can avoid creating multiple crops in the first place.

Using holoviews

This package looks great!

You seem to have implemented an interface on top of bokeh and matplotlib, are you aware of the existence of holoviews?

I think that using HoloViews will greatly reduce the amount of code here :)

Accessing intermediate results

Hi jcmgray,

this package is really fantastics, it solves exactly the problems that I've been struggling with for years! Thank for your work!

I've just started using the package, though, and I have a question concerning batch processing: is there any straightforward way to access intermediate results of the computation by storing them on the disk? I've thought about two ways in particular:

  1. Accessing the on-disk dataset created by the harvester. However, by default, the dataset is created only after all combos are evaluated. Is there some workaround / flag to set?
  2. Using the crop functionality. However, I cannot reap the results during the computation since it gives the following error: This crop is not ready to reap yet - results are missing

Any thoughts?

Can't pickle ...

Hi @jcmgray, I'm currently using your awesome package to automate my experiments and noticed a problem related to pickling certain data types. While the cloudpickle backend of joblib should work fine to handle, for example, lambda functions, I get an error when working with certain modules based on torch.

Here is a minimal example:

import xyzpy
import botorch
import torch

@xyzpy.label(['model'])
def fun(a):
	x = torch.tensor([[0.]])
	y = torch.tensor([[0.]])
	return botorch.models.SingleTaskGP(x, y)

combos = dict(
	a=range(10)
)

h = xyzpy.Harvester(fun, 'result')
c = h.Crop('test')
c.sow_combos(combos)
c.grow_missing()
c.reap()

It produces the following error:

_pickle.PicklingError: Can't pickle <function _HomoskedasticNoiseBase.__init__.<locals>.<lambda> at 0x14cd89e18>: it's not found as gpytorch.likelihoods.noise_models._HomoskedasticNoiseBase.__init__.<locals>.<lambda>

Tested with Python 3.7.3 and

botorch==0.2.1
torch==1.6.0
xyzpy==1.0.0

After a short search, I found this related post: cornellius-gp/gpytorch#907
A potential solution seems to be using dill instead of pickle. Do you think this option can be added to xyzpy?

For now, my workaround is to remove all problematic variables from the object returned by function to be evaluated after all internal computations have been completed. However, it would be much nicer, of course, if the objects could be naturally handled by xyzpy.

Kind regards,
Adrian

Failing tests related to netcdf

Hi,

I recently reinstalled xyzpy (from my up-to-date fork of the develop branch) in to a fresh anaconda environment and ran the test suite to make sure I had things set up properly. After installing any missing libraries that had caused tests to fail, I ran the tests again and saw that two tests were still failing with the following error messages:

FAILED tests/test_manage.py::TestSaveAndLoad::test_io_complex_data[h5netcdf-h5netcdf] - h5netcdf.core.CompatibilityError: complex dtypes are not a supported NetCDF feature, and are not allowed by h5net...
FAILED tests/test_manage.py::TestSaveAndLoad::test_save_merge_ds - h5netcdf.core.CompatibilityError: complex dtypes are not a supported NetCDF feature, and are not allowed by h5net...

The actual tests in question look as follows:

    @mark.parametrize(("engine_save, engine_load"),
                      [('h5netcdf', 'h5netcdf'),
                       ('zarr', 'zarr'),
                       ('joblib', 'joblib'),
                       param('h5netcdf', 'netcdf4', marks=mark.xfail),
                       param('netcdf4', 'h5netcdf', marks=mark.xfail),
                       param('netcdf4', 'netcdf4', marks=mark.xfail)])
    def test_io_complex_data(self, ds1, engine_save, engine_load):
        with tempfile.TemporaryDirectory() as tmpdir:
            save_ds(ds1, os.path.join(tmpdir, "test.h5"), engine=engine_save)
            ds2 = load_ds(os.path.join(tmpdir, "test.h5"), engine=engine_load)
            assert ds1.identical(ds2)
    def test_save_merge_ds(self, ds1, ds2, ds3):
        with tempfile.TemporaryDirectory() as tmpdir:
            fname = os.path.join(tmpdir, "test.h5")
           save_merge_ds(ds1, fname)
            save_merge_ds(ds2, fname)
            with raises(xr.MergeError):
                save_merge_ds(ds3, fname)
            save_merge_ds(ds3, fname, overwrite=True)
            exp = ds3.combine_first(xr.merge([ds1, ds2]))
            assert load_ds(fname).identical(exp)

Is there perhaps a specific version requirement for netcdf that is not specified in the docs?

RecursionError when using `allow_incomplete` option on Crop.reap

When I use the allow_incomplete option it sometimes fails with this error (error output shown below). I have an equal size dataset that all completed so I did not need the allow_incomplete option and that reaped fine. I tried creeping up the sys.getrecursionlimit() in the Jupyter notebook I was using and it still failed at 100,000, which is when I thought I should not push my luck any more. I am using the latest GitHub version of xyzpy.

---------------------------------------------------------------------------
RecursionError                            Traceback (most recent call last)
<ipython-input-6-d4907a788b7f> in <module>
----> 1 crop.reap(allow_incomplete=True,) #allow_incomplete=True,overwrite=True,

~/anaconda3/envs/qcoptim-qiskit-up-to-date/lib/python3.8/site-packages/xyzpy/gen/batch.py in reap(self, wait, sync, overwrite, clean_up, allow_incomplete)
    745         if isinstance(self.farmer, Harvester):
    746             opts['overwrite'] = overwrite
--> 747             return self.reap_harvest(self.farmer, **opts)
    748 
    749         if isinstance(self.farmer, Sampler):

~/anaconda3/envs/qcoptim-qiskit-up-to-date/lib/python3.8/site-packages/xyzpy/gen/batch.py in reap_harvest(self, harvester, wait, sync, overwrite, clean_up, allow_incomplete)
    685             raise ValueError("Cannot reap and harvest if no Harvester is set.")
    686 
--> 687         ds = self.reap_runner(harvester.runner, wait=wait, clean_up=clean_up,
    688                               allow_incomplete=allow_incomplete)
    689 

~/anaconda3/envs/qcoptim-qiskit-up-to-date/lib/python3.8/site-packages/xyzpy/gen/batch.py in reap_runner(self, runner, wait, clean_up, allow_incomplete)
    664         # Can ignore `Runner.resources` as they play no part in desecribing the
    665         #   output, though they should be supplied to sow and thus grow.
--> 666         ds = self.reap_combos_to_ds(
    667             var_names=runner._var_names,
    668             var_dims=runner._var_dims,

~/anaconda3/envs/qcoptim-qiskit-up-to-date/lib/python3.8/site-packages/xyzpy/gen/batch.py in reap_combos_to_ds(self, var_names, var_dims, var_coords, constants, attrs, parse, wait, clean_up, allow_incomplete)
    616         check_ready_to_reap(self, allow_incomplete, wait)
    617 
--> 618         clean_up, default_result = calc_clean_up_default_res(
    619             self, clean_up, allow_incomplete
    620         )

~/anaconda3/envs/qcoptim-qiskit-up-to-date/lib/python3.8/site-packages/xyzpy/gen/batch.py in calc_clean_up_default_res(crop, clean_up, allow_incomplete)
    137 
    138     if allow_incomplete:
--> 139         default_result = crop.all_nan_result
    140     else:
    141         default_result = None

~/anaconda3/envs/qcoptim-qiskit-up-to-date/lib/python3.8/site-packages/xyzpy/gen/batch.py in all_nan_result(self)
    422                                "one finished result.")
    423             reference_result = joblib.load(result_files[0])[0]
--> 424             self._all_nan_result = nan_like_result(reference_result)
    425 
    426         return self._all_nan_result

~/anaconda3/envs/qcoptim-qiskit-up-to-date/lib/python3.8/site-packages/xyzpy/gen/batch.py in nan_like_result(res)
    124 
    125     try:
--> 126         return tuple(np.broadcast_to(np.nan, infer_shape(x)) for x in res)
    127     except TypeError:
    128         return np.nan

~/anaconda3/envs/qcoptim-qiskit-up-to-date/lib/python3.8/site-packages/xyzpy/gen/batch.py in <genexpr>(.0)
    124 
    125     try:
--> 126         return tuple(np.broadcast_to(np.nan, infer_shape(x)) for x in res)
    127     except TypeError:
    128         return np.nan

~/anaconda3/envs/qcoptim-qiskit-up-to-date/lib/python3.8/site-packages/xyzpy/gen/batch.py in infer_shape(x)
    102     try:
    103         shape += (len(x),)
--> 104         return shape + infer_shape(x[0])
    105     except TypeError:
    106         return shape

... last 1 frames repeated, from the frame below ...

~/anaconda3/envs/qcoptim-qiskit-up-to-date/lib/python3.8/site-packages/xyzpy/gen/batch.py in infer_shape(x)
    102     try:
    103         shape += (len(x),)
--> 104         return shape + infer_shape(x[0])
    105     except TypeError:
    106         return shape

RecursionError: maximum recursion depth exceeded while calling a Python object

`var_names` not needed when DataArrays are passed

@dcherian pointed me towards this library in pydata/xarray#7498, it looks awesome!

Very small point — when I have something like the linked example, but add a cast to a DataArray in the func:

def generate_timeseries(x, y):
    return xr.DataArray(np.random.normal(loc=x, scale=y, size=100))

...then var_names doesn't seem to do anything; I get back a DataArray anyway (which is great!). But it still requires passing something, so I'm just passing Runner(gen, var_names=["foo"]) atm.

Could we avoid having to pass anything there? Or is there a different construction I should be choosing?

cross-pollination from/to `adaptive`

I am impressed with this package!

In my field we very often do these loops over multiple dimensions and generate many curves for different dimensions.

We (me and my colleagues) tried to tackle a very similar problem that xyzpy is trying to solve. We wrote adaptive that does things similar to xyzpy, the biggest difference is that it can adaptively sample one (or two) of the dimensions.

As an example I adapted your Basic Output Example to do the same but with adaptive:

import adaptive
import holoviews as hv
from functools import partial
from itertools import product
from scipy.special import eval_jacobi
import numpy as np
adaptive.notebook_extension()

def jacobi(x, n, alpha, beta):
     return eval_jacobi(n, alpha, beta, x)

combos = {
    'n': [1, 2, 4, 8, 16],
    'alpha': np.linspace(0, 2, 3),
    'beta': np.linspace(0, 1, 5),
}

def named_product(**items):
    names = items.keys()
    vals = items.values()
    return [dict(zip(names, res)) for res in product(*vals)]

learners = {}
for combo in named_product(**combos):
    learners[tuple(combo.values())] = adaptive.Learner1D(partial(jacobi, **combo), bounds=[0, 1])
    
balancing_learner = adaptive.BalancingLearner(list(learners.values()))

which creates "learners", which are essentially objects from which you can request new points and tell new points to.

then you "learn" the function by creating a Runner (this doesn't block the kernel and runs on all the cores, optionally you provide a excecutor to run it on a cluster)

runner = adaptive.Runner(balancing_learner, goal=lambda learner: learner.loss() < 0.01)
runner.live_info()

screen shot 2018-06-12 at 15 46 54

Then plot the data with:

balancing_learner.plot(cdims=named_product(**combos)).overlay('beta').grid()

screen shot 2018-06-12 at 15 48 42

As you can see, it is not nearly as short are your code and neither do we provide the functionality to save the data. Also the interface we have is not really optimized easily generate the combos, but this is where we can learn from xyzpy. On the other hand I think there is probably something usefull for you in adaptive too.

(P.S. this is not really an "issue", but more of a place to hopefully exchange some ideas)

EDIT
Inspired on your work, I've created this PR, after which one can just do:

learner = adaptive.BalancingLearner.from_combos(
    jacobi, adaptive.Learner1D, dict(bounds=(0, 1)), combos)
runner = adaptive.BlockingRunner(learner, goal=lambda l: l.loss() < 0.01)
learner.plot(cdims=adaptive.utils.named_product(**combos)).overlay('beta').grid()

conda package missing?

Documentation states that there should be conda package.
I cannot find it though... Am I missing something?

petrucci ➜  ~ conda search xyzpy
Loading channels: done

PackagesNotFoundError: The following packages are not available from current channels:

  - xyzpy

Current channels:

  - https://conda.anaconda.org/conda-forge/linux-64
  - https://conda.anaconda.org/conda-forge/noarch
  - https://repo.anaconda.com/pkgs/main/linux-64
  - https://repo.anaconda.com/pkgs/main/noarch
  - https://repo.anaconda.com/pkgs/free/linux-64
  - https://repo.anaconda.com/pkgs/free/noarch
  - https://repo.anaconda.com/pkgs/r/linux-64
  - https://repo.anaconda.com/pkgs/r/noarch
  - https://repo.anaconda.com/pkgs/pro/linux-64
  - https://repo.anaconda.com/pkgs/pro/noarch

To search for alternate channels that may provide the conda package you're
looking for, navigate to

    https://anaconda.org

and use the search bar at the top of the page.

Need numba in install_requires

Installing xyzpy with

pip install xyzpy

does not install the required numba package since it is not included in the install_requires argument in setup.py.

ps. Thank you for working on this project, it looks very useful!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.