jiaweizhuang / xesmf Goto Github PK

View Code? Open in Web Editor NEW

267.0 20.0 49.0 2.85 MB

Universal Regridder for Geospatial Data

Home Page: http://xesmf.readthedocs.io/

License: MIT License

Python 100.00%

regridding remapping esmf esmpy xarray python interpolation geospatial earth-science dask

xesmf's Introduction

xESMF: Universal Regridder for Geospatial Data

xESMF is a Python package for regridding. It is

Powerful: It uses ESMF/ESMPy as backend and can regrid between general curvilinear grids with all ESMF regridding algorithms, such as bilinear, conservative and nearest neighbour.
Easy-to-use: It abstracts away ESMF's complicated infrastructure and provides a simple, high-level API, compatible with xarray as well as basic numpy arrays.
Fast: It is faster than ESMPy's original Fortran regridding engine in serial case, and also supports dask for out-of-core, parallel computation.

Please see online documentation, or play with example notebooks on Binder.

For new users, I also recommend reading How to ask for help and How to support xESMF.

xesmf's People

Contributors

Stargazers

Watchers

xesmf's Issues

MPI Errors when regridding multi-dimensional data

The following just makes the example data to be used by the regridder

import pandas as pd
import numpy as np
import xarray as xr

def make_dataset():
    time = pd.date_range('2010-01-01','2018-12-31',freq='M')
    lat = np.linspace(-5.175003, -4.7250023, 10)
    lon = np.linspace(33.524994, 33.97499, 10)
    precip = np.random.normal(0, 1, size=(len(time), len(lat), len(lon)))

    ds = xr.Dataset(
        {'precip': (['time', 'lat', 'lon'], precip)},
        coords={
            'lon': lon,
            'lat': lat,
            'time': time,
        }
    )
    return ds


def make_fcast_dataset(date_str):
    initialisation_date = pd.date_range(start=date_str, periods=1, freq='M')
    number = [i for i in range(0, 10)]
    lat = np.linspace(-5.175003, -5.202, 5)
    lon = np.linspace(33.5, 42.25, 5)
    forecast_horizon = np.array(
        [ 2419200000000000,  2592000000000000,  2678400000000000,
          5097600000000000,  5270400000000000,  5356800000000000,
          7689600000000000,  7776000000000000,  7862400000000000,
          7948800000000000, 10368000000000000, 10454400000000000,
          10540800000000000, 10627200000000000, 12960000000000000,
          13046400000000000, 13219200000000000, 15638400000000000,
          15724800000000000, 15811200000000000, 15897600000000000,
          18316800000000000, 18489600000000000, 18576000000000000 ],
          dtype='timedelta64[ns]'
    )
    valid_time = initialisation_date[:, np.newaxis] + forecast_horizon
    precip = np.random.normal(
        0, 1, size=(len(number), len(initialisation_date), len(forecast_horizon), len(lat), len(lon))
    )

    ds = xr.Dataset(
        {'precip': (['number', 'initialisation_date', 'forecast_horizon', 'lat', 'lon'], precip)},
        coords={
            'lon': lon,
            'lat': lat,
            'initialisation_date': initialisation_date,
            'number': number,
            'forecast_horizon': forecast_horizon,
            'valid_time': (['initialisation_date', 'step'], valid_time)
        }
    )
    return ds

# make the example datasets
regrid = make_dataset().precip
ds = make_fcast_dataset(date_str='2000-01-01')

When using the regridder I get the following error.

import xesmf as xe

xe.Regridder(ds, regrid, 'nearest_s2d')

ipdb> xe.Regridder(d_, regrid.precip, 'nearest_s2d')
[mpiexec@linux1.ouce.ox.ac.uk] match_arg (utils/args/args.c:159): unrecognized argument pmi_args
[mpiexec@linux1.ouce.ox.ac.uk] HYDU_parse_array (utils/args/args.c:174): argument matching returned error
[mpiexec@linux1.ouce.ox.ac.uk] parse_args (ui/mpich/utils.c:1596): error parsing input array
[mpiexec@linux1.ouce.ox.ac.uk] HYD_uii_mpx_get_parameters (ui/mpich/utils.c:1648): unable to parse user arguments
[mpiexec@linux1.ouce.ox.ac.uk] main (ui/mpich/mpiexec.c:149): error parsing parameters

I am very inexperienced with MPI and so would really appreciate any pointers you might have!

Thanks very much

Setting the coordinate system in xESMF

Hi Jiawei,

First of all, thank a lot for developing and maintaining this great package. I was was wondering if it is possible to switch between planar and spherical coordinates in xESMF. I guess it is important how the coordinates are interpreted (as planar or spherical) when e.g. area-weights are computed for the conservative regridding. It seems possible to set this in ESMPy (see: http://www.earthsystemmodeling.org/esmf_releases/last_built/esmpy_doc/html/CoordSys.html#ESMF.api.constants.CoordSys.SPH_DEG). Is this feature also available in xESMF (and if so, how?)?

Cheers & thanks,
Christian

Kernel dies running tutorial example

I am running the tutorial at https://xesmf.readthedocs.io/en/latest/Curvilinear_grid.html and the kernel dies. I updated all packages and still have issues. Does anybody have any idea where the incompatibility is coming from? here is the environment I am running in:

name: pangeo
channels:
  - conda-forge
  - defaults
dependencies:
  - asn1crypto=0.24.0=py36_3
  - bleach=2.1.3=py_0
  - bokeh=0.13.0=py36_0
  - bottleneck=1.2.1=py36h7eb728f_1
  - cffi=1.11.5=py36h5e8e0c9_1
  - cftime=1.0.1=py36h7eb728f_0
  - chardet=3.0.4=py36_3
  - click=6.7=py_1
  - cloudpickle=0.5.2=py_0
  - cryptography=2.3.1=py36hdffb7b8_0
  - cryptography-vectors=2.3.1=py36_0
  - curl=7.59.0=0
  - cycler=0.10.0=py36_0
  - cytoolz=0.9.0.1=py36_0
  - dask=0.17.1=py_2
  - dask-core=0.17.1=py_0
  - dask-jobqueue=0.1.1=py_0
  - decorator=4.2.1=py36_0
  - distributed=1.21.3=py36_0
  - docrep=0.2.3=py36_0
  - entrypoints=0.2.3=py36_1
  - esmf=7.1.0r=1
  - esmpy=7.1.0r=py36_1
  - expat=2.2.5=0
  - fontconfig=2.13.1=h65d0f4c_0
  - freetype=2.9.1=h6debe1e_4
  - geos=3.6.2=hfc679d8_3
  - gettext=0.19.8.1=0
  - gmp=6.1.2=0
  - h5netcdf=0.6.2=py_0
  - h5py=2.8.0=py36h470a237_0
  - hdf4=4.2.13=0
  - hdf5=1.10.1=2
  - heapdict=1.0.0=py36_0
  - html5lib=1.0.1=py_0
  - icu=58.2=0
  - idna=2.7=py36_2
  - ipykernel=4.8.2=py36_0
  - ipython=6.2.1=py36_1
  - ipython_genutils=0.2.0=py36_0
  - jedi=0.11.1=py36_0
  - jinja2=2.10=py36_0
  - jpeg=9c=h470a237_1
  - jsonschema=2.6.0=py36_1
  - jupyter_client=5.2.3=py36_0
  - jupyter_core=4.4.0=py_0
  - jupyterlab=0.31.12=py36_1
  - jupyterlab_launcher=0.10.5=py36_0
  - kiwisolver=1.0.1=py36_1
  - krb5=1.14.2=0
  - libedit=3.1.20170329=0
  - libffi=3.2.1=3
  - libiconv=1.15=0
  - libnetcdf=4.5.0=3
  - libpng=1.6.34=0
  - libsodium=1.0.15=1
  - libssh2=1.8.0=2
  - libtiff=4.0.9=he6b73bb_1
  - libuuid=2.32.1=h470a237_2
  - libxcb=1.13=0
  - libxml2=2.9.8=0
  - libxslt=1.1.32=h88dbc4e_2
  - locket=0.2.0=py36_1
  - lxml=4.2.4=py36hc9114bc_0
  - markupsafe=1.0=py36_0
  - mistune=0.8.3=py_0
  - mpi=1.0=mpich
  - mpi4py=3.0.0=py36_0
  - mpich=3.2.1=0
  - msgpack-python=0.5.5=py36_0
  - nbconvert=5.3.1=py_1
  - nbformat=4.4.0=py36_0
  - nbserverproxy=0.8.3=py36_0
  - ncurses=5.9=10
  - netcdf-fortran=4.4.4=7
  - netcdf4=1.4.0=py36_0
  - notebook=5.4.1=py36_0
  - olefile=0.45.1=py_1
  - openblas=0.2.20=8
  - owslib=0.16.0=py_1
  - packaging=17.1=py_0
  - pandas=0.23.4=py36hf8a1672_0
  - pandoc=2.1.3=0
  - pandocfilters=1.4.1=py36_0
  - parso=0.1.1=py_0
  - partd=0.3.8=py36_0
  - pexpect=4.4.0=py36_0
  - pickleshare=0.7.4=py36_0
  - pillow=5.2.0=py36hc736899_1
  - pip=9.0.2=py36_0
  - proj4=4.9.3=h470a237_8
  - prompt_toolkit=1.0.15=py36_0
  - psutil=5.4.3=py36_0
  - ptyprocess=0.5.2=py36_0
  - pycparser=2.18=py_1
  - pyepsg=0.3.2=py_1
  - pygments=2.2.0=py36_0
  - pyopenssl=18.0.0=py36_0
  - pyparsing=2.2.0=py36_0
  - pyproj=1.9.5.1=py36h508ed2a_3
  - pyshp=1.2.12=py_0
  - pysocks=1.6.8=py36_2
  - python-dateutil=2.7.0=py_0
  - pytz=2018.3=py_0
  - pyyaml=3.12=py36_1
  - pyzmq=17.0.0=py36_3
  - readline=7.0=0
  - requests=2.19.1=py36_1
  - send2trash=1.5.0=py_0
  - setuptools=39.0.1=py36_0
  - shapely=1.6.4=py36h164cb2d_1
  - simplegeneric=0.8.1=py36_0
  - six=1.11.0=py36_1
  - sortedcontainers=1.5.9=py36_0
  - tblib=1.3.2=py36_0
  - terminado=0.8.1=py36_0
  - testpath=0.3.1=py36_0
  - tk=8.6.8=ha92aebf_0
  - toolz=0.9.0=py_0
  - tornado=5.0.1=py36_1
  - traitlets=4.3.2=py36_0
  - urllib3=1.23=py36_1
  - wcwidth=0.1.7=py36_0
  - webencodings=0.5=py36_0
  - wheel=0.30.0=py36_2
  - xarray=0.10.8=py36_0
  - xorg-libxau=1.0.8=3
  - xorg-libxdmcp=1.1.2=3
  - xz=5.2.4=h470a237_1
  - yaml=0.1.7=0
  - zeromq=4.2.3=2
  - zict=0.1.3=py_0
  - zlib=1.2.11=0
  - blas=1.0=mkl
  - ca-certificates=2018.03.07=0
  - cartopy=0.16.0=py36hf32a85a_0
  - certifi=2018.8.24=py36_1
  - dbus=1.13.2=h714fa37_1
  - glib=2.56.1=h000015b_0
  - gst-plugins-base=1.14.0=hbbd80ab_1
  - gstreamer=1.14.0=hb453b48_1
  - intel-openmp=2018.0.0=hc7b2577_8
  - libgcc=7.2.0=h69d50b8_2
  - libgcc-ng=7.2.0=hdf63c60_3
  - libgfortran=3.0.0=1
  - libgfortran-ng=7.2.0=hdf63c60_3
  - libopenblas=0.2.20=h9ac9557_7
  - libstdcxx-ng=7.2.0=hdf63c60_3
  - matplotlib=2.2.3=py36hb69df0a_0
  - mkl=2019.0=117
  - mkl_fft=1.0.4=py36h4414c95_1
  - mkl_random=1.0.1=py36h4414c95_1
  - numpy=1.15.0=py36h1b885b7_0
  - numpy-base=1.15.0=py36h3dfced4_0
  - openssl=1.0.2p=h14c3975_0
  - pcre=8.42=h439df22_0
  - pyqt=5.9.2=py36h22d08a2_0
  - python=3.6.3=h1284df2_4
  - qt=5.9.6=h52aff34_0
  - scikit-learn=0.19.1=py36h7aa7ec6_0
  - scipy=1.1.0=py36hc49cb51_0
  - sip=4.19.8=py36hf484d3e_0
  - sqlite=3.24.0=h84994c4_0
prefix: /glade/u/home/gloege/miniconda3/envs/pangeo

pip packages

Package              Version
-------------------- -----------
asn1crypto           0.24.0
bleach               2.1.3
bokeh                0.13.0
Bottleneck           1.2.1
Cartopy              0.16.0
certifi              2018.8.24
cffi                 1.11.5
cftime               1.0.1
chardet              3.0.4
click                6.7
cloudpickle          0.5.2
cryptography         2.3.1
cryptography-vectors 2.3.1
cycler               0.10.0
Cython               0.28.3
cytoolz              0.9.0.1
dask                 0.17.1
dask-jobqueue        0.1.1
decorator            4.2.1
distributed          1.21.3
docrep               0.2.3
entrypoints          0.2.3
ESMPy                7.1.0.dev0
h5netcdf             0.6.2
h5py                 2.8.0
heapdict             1.0.0
html5lib             1.0.1
idna                 2.7
ipykernel            4.8.2
ipython              6.2.1
ipython-genutils     0.2.0
jedi                 0.11.1
Jinja2               2.10
jsonschema           2.6.0
jupyter-client       5.2.3
jupyter-core         4.4.0
jupyterlab           0.31.12
jupyterlab-launcher  0.10.5
kiwisolver           1.0.1
locket               0.2.0
lxml                 4.2.4
MarkupSafe           1.0
matplotlib           2.2.3
mistune              0.8.3
mkl-fft              1.0.4
mkl-random           1.0.1
mpi4py               3.0.0
msgpack-python       0.5.5
nbconvert            5.3.1
nbformat             4.4.0
nbserverproxy        0.8.3
netCDF4              1.4.0
notebook             5.4.1
numpy                1.15.0
olefile              0.45.1
OWSLib               0.16.0
packaging            17.1
pandas               0.23.4
pandocfilters        1.4.1
parso                0.1.1
partd                0.3.8
pexpect              4.4.0
pickleshare          0.7.4
Pillow               5.2.0
pip                  18.0
prompt-toolkit       1.0.15
psutil               5.4.3
ptyprocess           0.5.2
pycparser            2.18
pyepsg               0.3.2
Pygments             2.2.0
pyOpenSSL            18.0.0
pyparsing            2.2.0
pyproj               1.9.5.1
pyshp                1.2.12
PySocks              1.6.8
python-dateutil      2.7.0
pytz                 2018.3
PyYAML               3.12
pyzmq                17.0.0
requests             2.19.1
scikit-learn         0.19.1
scipy                1.1.0
Send2Trash           1.5.0
setuptools           39.0.1
Shapely              1.6.4.post1
simplegeneric        0.8.1
six                  1.11.0
sortedcontainers     1.5.9
tblib                1.3.2
terminado            0.8.1
testpath             0.3.1
toolz                0.9.0
tornado              5.0.1
traitlets            4.3.2
urllib3              1.23
wcwidth              0.1.7
webencodings         0.5
wheel                0.30.0
xarray               0.10.8
xesmf                0.1.1
zict                 0.1.3

Using ESMF unstructured grids

I'm wondering what it would take to support using ESMF's unstructured grids with xESMF. I had a conversation with @bekozi today where we discussed this briefly but we weren't sure where xESMF stood on the issue.

Ultimately, it would be great if xESM could support bi-directional remapping between structured/unstructured grids/meshes.

ImportError: Regrid(filename) requires PIO and does not work if ESMF has not been built with MPI support

Hello, I installed successfully esmf_7_1_0r with ESMPy using option ESMF_COMM=mpiuni. However, I got the error "Regrid(filename) requires PIO and does not work if ESMF has not been built with MPI support" when I use xesmf "regridder = xe.Regridder(grid_in, grid_out, 'bilinear')". I wonder if the ESMF has to been compiled by MPI option?

Conservative regridding of DataArray, N+1 dim issue

Is there a temporary workaround to regridding DataArrays using the conservative method that avoids the issue raised here (pydata/xarray#1475) of DataArrays not being able to store N+1 lat_b/lon_b variables? Drop down to numpy arrays?

esmpy needs GFortran

It's better to mention that esmpy requires GFortran.
Otherwise, someone will get this error.
I'm still struggling with that.

Regridding introduces missing values at wraparound longitude

See this gist for details: https://gist.github.com/anonymous/7e5dcfc3d84938266ddc1c2159770823

Entirely possible I'm making a silly mistake, but I can't figure it out so far at least. Thanks in advance for help.

Gridding sparse data (fine to coarse)

hi @JiaweiZhuang, I am trying to regrid sparse (~1 km) Ice Bridge data (airborne flight transects) to a coarser 25 km grid, using the new conservative_normed method from your master branch. But I am getting only NaNs out.

Even if there is only one non-NaN fine cell within a coarse cell, it should assign that only fine cell's value to the coarse cell, correct?

A runable example can be found here.

regridding cubed-sphere-type datasets

Does xESMF currently support cubed-sphere-type datasets? For example, reading MITgcm llc grids with xmitgcm, we get an xarray dataset that looks something like this:

<xarray.Dataset>
Dimensions:  (face: 13, i: 1080, j: 1080)
Coordinates:
    XC       (face, j, i) >f4 dask.array<shape=(13, 1080, 1080), chunksize=(1, 1080, 1080)>
    YC       (face, j, i) >f4 dask.array<shape=(13, 1080, 1080), chunksize=(1, 1080, 1080)>
  * face     (face) int64 0 1 2 3 4 5 6 7 8 9 10 11 12
  * i        (i) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 ...
  * j        (j) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 ...
...

where XC and YC are longitude and latitude of the cell centers. There are 13 locally orthogonal rectilinear grid faces indexed by dimension face.

I would like to regrid this data to a regular lat-lon grid. I scanned the docs but couldn't find an answer to this. But presumably @JiaweiZhuang is using this with GEOS5, which uses a cubed sphere grid.

Handle weight files more elegantly

(continuing #2 and #9)

In the just-released v0.1.1, the default behavior is to write weights to disk, and there is a regridder.clean_weight_file() method to optionally remove it (demo).

This exactly matches how ESMPy works (directly dump weights to disk, not returning a numpy array) but feels a bit odd. A more intuitive workflow is only getting in-memory weights by default, and have a regridder.save() method to optionally save it to disk (like the model.save() method in Keras).

With current ESMPy (up to 7.1.0dev41) I need to do very weird things to implement this new design:

Use ESMPy to write weights to disk
Read it back to memory
Then automatically delete the weight file by default (what a waste if you actually need it)
Finally, have a method to optionally re-write the weight file to disk. (duplicates step1)

This incurs a lot of unnecessary disk I/O. If ESMPy has the plan to return in-memory weights I will definitely use the new design. Otherwise I'd rather keep the current approach.

@bekozi Any plans on that?

Detect when old and new grid don't overlap?

I was getting a strange error when initiating a regridder (from PET0.ESMF_LogFile):

20180926 112440.533 ERROR            PET0 ESMF_IOScrip.F90:211 ESMF_OutputWeightFile Operation not yet supported  - "factorList" has size 0 and PET count is 1. There is nothing to write.
20180926 112440.534 ERROR            PET0 ESMF_IOScrip.F90:136 ESMF_SparseMatrixWrite Operation not yet supported  - Internal subroutine call returned Error
20180926 112440.534 ERROR            PET0 ESMF_Field_C.F90:1176 f_esmf_regridstorefile Operation not yet supported  - Internal subroutine call returned Error

After some messing around I realised it was because I had written my new grid incorrectly and so the error above (There is nothing to write) makes sense 😄 I was wondering if it might be worth adding a check for this to provide a better error message than:

ESMC_FieldRegridStoreFile() failed with rc = 18.    Please check the log files (named "*ESMF_LogFile").

as that could save other people some time too.

I'm happy to add this and create a pull request, but I was just wondering whether you think this would be a good idea and where you would put this check (for example read the log file from ESMF/use something existing routine inside if esmfpy/check the bounds in xesmf when creating the regridder)?

Regridding to physically invalid latitudes should probably warn or raise

I accidentally switched the lat and lon arguments for xe.grid_2d, such that the target grid had latitudes from -180 to 180 and longitudes -90 to 90. Then, using this as the target on xe.regrid led to no crash or warning; the data was in fact interpolated to latitudes spanning from -180 to 180.

I believe it would be helpful to either warn or raise an error when the latitude range is physically invalid. For longitudes, it's harder, since some arrays use (-180, 180), others use (0, 360), etc.

Mimetic Interpolation of Vector Fields (Pletzer and Hayek, 2019)

I just found this fascinating paper:
Pletzer, A. and W. Hayek, 2019: Mimetic Interpolation of Vector Fields on Arakawa C/D Grids. Mon. Wea. Rev., 147, 3–16, https://doi.org/10.1175/MWR-D-18-0146.1

https://journals.ametsoc.org/doi/full/10.1175/MWR-D-18-0146.1

There appears to be a python library!
https://github.com/pletzer/mint
The author is @pletzer.

Interestingly it uses VTK to deal with the grid cell problem.

Just posting here because I think this is the best place for discussion regridding. It would be great to have this tool available in an xarray / dask friendly way.

Input array is not F_CONTIGUOUS

The issue I am having is that I am getting a warning when creating a regridder (with xesmf.Regridder) that the input is not Fortran contiguous. Independent of what I do with the input the warning always appears.

Digging through the code I can see that the lat and lon variables are always transposed (to make them Fortran contiguous), but because as_2d_mesh simply passes the lat/lon arrays through if they are already 2D then these arrays might already be Fortran contiguous, and so transposing them will make them C contiguous instead. My suggestion would be to in ds_to_ESMFgrid only to transpose if necessary, or to use np.asfortranarray instead to ensure the data is always Fortran contiguous.

I've put a jupyter notebook here to demonstrate the issue: https://gist.github.com/leifdenby/a91b616ce5fb077e44d28556208626b4

If this sounds reasonable I'd be happy to make a pull requests which would include tests to ensure this change doesn't break anything else.

Thanks for a great tool btw! Love how easy it is to use :)

Masked data is returned as 0.0 after gridding, how can these pixels be identified if zeros exist in input data?

So I'm using the "conservative_normed" algorithm provided in the "masking" branch of xESMF to grid some MODIS GPP data to a lower spatial resolution (fine to coarse). Being a land-only product, the ocean pixels are invalid and so need to be masked. After masking and running xESMF on the data these regions now appear as zeroes, as expected.

My problem is that valid zero values also exist in the input data over regions with no vegetation (e.g. deserts). Therefore, in the resulting array I can't readily tell which pixels are invalid, and which contain real data. How do I get around this? Is there any way to output a mask of which pixels contain real (i.e. no data was binned at all)? Thanks.

Installing Issue: dyld: lazy symbol binding failed

Hi Jiawei,
I was trying to install xesmf on a MAC and I had the following error:

(base) host210-186:~ jayant$ pip install xesmf
Collecting xesmf
Downloading https://files.pythonhosted.org/packages/2a/34/b551535704101b7ce4e4cbdeb3806800e860df814d73728e376fdc410a0c/xesmf-0.1.1.tar.gz
dyld: lazy symbol binding failed: Symbol not found: _fdopendir$INODE64
Referenced from: /anaconda3/lib/libpython3.6m.dylib
Expected in: /usr/lib/libSystem.B.dylib

dyld: Symbol not found: _fdopendir$INODE64
Referenced from: /anaconda3/lib/libpython3.6m.dylib
Expected in: /usr/lib/libSystem.B.dylib

I installed xarray and esmpy previously. Do you have any hints here?

Parallel regridding support

"Parallel regridding" could mean two things: (see #2 for more about this two-step regridding)

Generate regridding weights in parallel.
Apply regridding weights in parallel.

The native parallel support in ESMF/ESMPy is based on MPI and horizontal domain decomposition. It works for both generating and applying weights. See https://github.com/nawendt/esmpy-tutorial/blob/master/esmpy_mpi_example.py as an example.

MPI-based horizontal domain decomposition makes perfect sense for earth system model simulation, but for data analysis I would absolutely want to avoid MPI's complexity. With dask, there's a simple way to go:

Avoid horizontal domain decomposition and only parallelize over additional dimensions like time and level. Writing such code withdask.array will be trivial.
Still generate regridding weights in serial. My impression is people tend to have a fixed pair of source and destination grids and regrid a lot of data fields between them. In this case, weights generation only needs to be done once. So we would only care about applying the weights in parallel.

Is there any case that we have to parallelize over horizontal dimensions?

PS: Need to profile the regridding on very large data sets and figure out the bottleneck (generating vs applying weights) before starting to implement anything.

Possibly incorrect terminology in documentation

This is a documentation issue.

I believe you have mixed up the terms 'downsample' and 'upsample' in documentation. It is said:

When regridding from low-resolution to high-resolution (down-sampling)

And:

When regridding from high-resolution to low-resolution (up-sampling)

But the opposite seems to be the generally accepted definition:
https://en.wikipedia.org/wiki/Decimation_(signal_processing)
https://en.wikipedia.org/wiki/Upsampling

Going from low-res (coarse) grid to high-res (fine) grid increases the sampling rate => 'upsampling'.
Going from high-res (fine) grid to low-res (coarse) grid reduces the sampling rate => 'downsampling'

Let me know if I'm wrong and I have mixed up the terms myself!

AttributeError: module 'ESMF' has no attribute 'api'

I've just installed the libraries as directed in the installation guide. When I import xesmf I get the following stack trace.

Should I also raise this upstream in the esmfpy channel?

reuse weights only checked by shape

When reusing weight it seems to only check whether the on-disk weights have the same size as the new ones. This can lead to wrong results. Example:

import numpy as np
import xarray as xr
import xesmf


def sample_dataset(nlon, dlon, lon_start):

    nlat = 2

    data = np.random.randn(nlat, nlon)

    lon = np.arange(lon_start, lon_start + nlon * dlon, dlon)
    lat = np.arange(nlat)

    coords = dict(lon=lon, lat=lat)

    return xr.Dataset(dict(data=(('lat', 'lon'), data)), coords=coords)


# create a dataset to regrid
in_data = sample_dataset(nlon=50, dlon=1, lon_start=1)

# create 2 sample destination grids with same shape but different coords
grid1 = sample_dataset(nlon=10, dlon=2, lon_start=1)
grid2 = sample_dataset(nlon=10, dlon=2, lon_start=10)

If I now regrid with reuse_weights=True the result for grid2 is wrong:

regridder = xesmf.Regridder(in_data, grid1, 'bilinear', reuse_weights=True)
print('-- grid1, reuse=True', regridder.regrid_dataarray(in_data['data']).values[0, 0])

regridder = xesmf.Regridder(in_data, grid2, 'bilinear', reuse_weights=True)
print('-- grid2, reuse=True', regridder.regrid_dataarray(in_data['data']).values[0, 0])

regridder = xesmf.Regridder(in_data, grid1, 'bilinear', reuse_weights=False)
print('-- grid1, reuse=False', regridder.regrid_dataarray(in_data['data']).values[0, 0])

regridder = xesmf.Regridder(in_data, grid2, 'bilinear', reuse_weights=False)
print('-- grid2, reuse=False', regridder.regrid_dataarray(in_data['data']).values[0, 0])

Notice how results 1 and 2 are the same while 3 and 4 are different. I think it is important that this is checked.

Regrid summed variables

I want to regrid total lightning flashcounts (TL) generated by WRF-Chem from 12 km to 1 degree. TL is the sum of flashcounts in each 12 km grid.

I've tried xesmf with bilinear or conservative method, but both results fails. You can check notebook1 and notebook2.

Do you have any advice of regridding variables like TL?

The best practice for sparse matrix multiplication

The information below is terribly outdated. See the latest comparisons at: https://github.com/JiaweiZhuang/sparse_dot

Since #2 is largely solved, I've done some preliminary tests on sparse matrix multiplication (SMM), ~~using the sparse package.~~ Just realized the basic scipy.sparse.coo_matrix.dot() is enough. I can flatten the input array to shape [Ntime*Nlev, Nlat*Nlon] so it can be passed to coo_matrix.dot().

The test input data has the shape (600, 400, 10, 50), which is then regridded to shape (400, 300, 10, 50). The timing results are

Calculating weights: ~14 s
(This only needs to be done once, so I am not trying to optimize it)
- Applying weights with ESMPy: 1.46 s
~~(transferring numpy data to ESMPy internal takes additional 6.6s, due to converting C-ordered array to Fortran-ordered)~~
- Applying weights with sparse.tensordot(): 579 ms

~~It is quite surprising that sparse.tensordot() is 3 times faster than ESMPy's internal Fortran routine.~~

Applying weights: (The previous comparison was not fair. I've removed all unnecessary array copying and re-ordering, and here's the new timing)

Method	Applying weights	Write output data to file
ESMPy	0.3 s ^[1] + 1 s ^[2]	2 s ^[3]
coo_matrix.dot	0.6 s	2 s
sparse.tensordot ^[4]	1 s	4 s
loop with numba	3 s	2 s
loop with pure numpy	6 s	2 s

[1] : Passing input data to Fortran internal. It also doubles memory use, which is not a problem in other methods.
[2] : Actual SMM.
[3] : Slow, due to the use of Docker. Would be ~0.3 s on native machine.
[4] : The slow-down compared to scipy is due to memory layout (C-ordering vs Fortran-ordering). See the notebook below.

The notebooks are available at

compute_weights.ipynb: offline weights generation and ESMPy SMM timing
apply_weights.ipynb: Python SMM timing

It might be useful to have both ESMPy and scipy as different backends. ESMPy results can be used as reference. At least one non-ESMPy method is needed since ESMPy cannot reuse offline weights right now.

@bekozi If you have time (no hurry), could you run my notebooks to see whether it is true that ESMPy's SMM is slower than Python's? I want to make sure the comparison was fair and I was not using ESMPy in a wrong way.

This thread is for discussing the best practice for SMM (i.e. applying weights).

Before developing high-level APIs I want to make sure I am using the most efficient way to perform SMM. Any suggestions will be appreciated. @mrocklin @pelson

Support for unmeshed lat/lon w/ different names and in Datasets w/ multiple data_vars

Thanks for xESMF! It's been great for my recent re-gridding needs.

The data I need to regrid generally has lat and lon each stored as 1D arrays, rather than as meshes, and with names 'lat' and 'lon', respectively, rather than 'x', and 'y'. Also, I am regridding Datasets with multiple data_vars.

Each of these issues required a little bit of extra code before it would work with xESMF. Since I think these are common use cases, I thought I'd paste it here. Hopefully they're pretty self explanatory...I didn't take the time to write full docstrings. The logic could likely be improved also, especially the last one.

You're welcome to incorporate whatever you'd like of them into xESMF if you'd like. But no problem if not.

import numpy as np
import xarray as xr
import xesmf as xe

LON_STR = 'lon'
LAT_STR = 'lat'

def _replace_dim(da, olddim, newdim):
    """From https://github.com/pydata/xarray/issues/1273."""
    renamed = da.rename({olddim: newdim.name})

    # note that alignment along a dimension is skipped when you are overriding
    # the relevant coordinate values
    renamed.coords[newdim.name] = newdim
    return renamed


def meshed_latlon_arr(arr, lon_str=LON_STR, lat_str=LAT_STR):
    """Create new array with meshed lat and lon coordinates."""
    lonmesh, latmesh = np.meshgrid(arr[lon_str], arr[lat_str])
    renamed = arr.rename({lat_str: 'y', lon_str: 'x'})
    renamed.coords[lat_str] = (('y', 'x'), latmesh)
    renamed.coords[lon_str] = (('y', 'x'), lonmesh)
    return renamed
       

def meshed_latlon_ds(ds, lon_str=LON_STR, lat_str=LAT_STR):
    """Create new dataset with meshed lat and lon coordinates."""
    meshed_arrs = {}
    for name, arr in ds.data_vars.items():
        meshed_arrs.update({name: meshed_latlon_arr(arr)})
    return xr.Dataset(data_vars=meshed_arrs, attrs=ds.attrs)
    

def regrid_unmeshed_arr(arr, target_grid, method='bilinear'):
    """Regrid the DataArray that has 1D lat and lon coordinates."""
    meshed = meshed_latlon_ds(arr)
    return xe.regrid(meshed, target_grid, 
                     meshed, method=method, verbose=True)

    
def regrid_unmeshed_ds(ds, target_grid, method='bilinear'):
    """Regrid the Dataset that has 1D lat and lon coordinates."""
    meshed = meshed_latlon_ds(ds)
    
    data_vars = {}
    for name, data_var in meshed.data_vars.items():
        regridded = xe.regrid(meshed, target_grid, 
                              data_var, method=method)
        data_vars.update({name: regridded})
    
    coords = {}
    xy_set = False
    for name, coord in meshed.coords.items():
        if ('y' in coord or 'x' in coord) and name not in ('y', 'x', 'lat', 'lon'):
            regridded = xe.regrid(meshed, target_grid, 
                                  coord, method=method)
            coords.update({name: regridded})
            if not xy_set:
                coords.update({'y': regridded['y']})
                coords.update({'x': regridded['x']})
                coords.update({'lat': regridded['lat']})
                coords.update({'lon': regridded['lon']})
                xy_set = True
        elif name not in ('y', 'x', 'lat', 'lon'):
            coords.update({name: coord})

    return xr.Dataset(data_vars=data_vars,
                      coords=coords, attrs=ds.attrs)

Adding logically rectangular curvilinear grid generation to xesmf.util

I’m new to using xESMF, and I’ve been finding it very useful in my research workflow, particularly in regridding from lat-lon coordinates to a conformal or equal area projection in order to perform image analysis. So, thank you for all your work!

One utility I’ve had to write myself, however, is generating those logically rectangular curvilinear grids to supply to xESMF. I think this could be a useful utility for more than just my own work, and so, would something like this be within scope to add to xesmf.util alongside grid_2d()?

If so, I’d be glad to submit a PR, but there could be a couple issues to discuss. First, I’ve been using pyproj to handle the projections and coordinate transformations. And so, would having that as an optional dependency be acceptable? (I could also implement it using another CRS handling library, but pyproj seemed simplest as I was working on it.) Second, I’m not sure what the best API would be for this utility. Perhaps something like

projection = pyproj.Proj(proj='lcc', ...)
ds_out = xe.util.grid_rectangular(projection, nx=850, ny=600, dx=3000, dy=3000)

to create a 850x600 LCC grid centered on the origin with 3 km grid spacing?

Dask and ESMPy

This looks like a great project, and I'm looking forward to following its progress! I know the ACME group has been using Dask with ESMPy (https://github.com/ACME-Climate/acme_diags), and I wanted to let you know in case this was something useful to you.

Value of cells in the new grid that are outside the old grid's domain

Hi,

xESMF seems like a great tool, I've just started testing it!
I have one issue though. I am regridding a limited domain with UTM-projection to a larger domain with a regular lat lon grid. In my case it seems that the cells in the new grid that are outside the old grid's domain are set to zero and not nan. Are there any options concerning what the default values are outside the input-grid's domain?

Regards, Helene

I'm using Python 3.6.3, xarray 0.10.0 xesmf 0.1.1, scipy 0.19.1

The script I am using:
import matplotlib.pyplot as plt
import numpy as np
import xarray as xr
from pyproj import Proj,transform
import xesmf as xe

SeN=xr.open_dataset('http://thredds.met.no/thredds/dodsC/metusers/senorge2/seNorge2/archive/PREC1d/seNorge2_PREC1d_grid_1957.nc')
projsrs=SeN['UTM_Zone_33'].proj4

LowerLeftEast= -75000
LowerLeftNorth= 6450000
UpperRightEast= 1120000
UpperRightNorth= 8000000
dx=1000.
ZoneNo = "33"

myP=Proj(SeN['UTM_Zone_33'].proj4)

Xcorners=np.arange(SeN['X'].data[0]-dx/2., SeN['X'].data[-1]+3dx/2., dx)
Ycorners=np.flipud(np.arange(SeN['Y'].data[-1]-dx/2., SeN['Y'].data[0]+3dx/2., dx))
Lon2, Lat2 = myP(*np.meshgrid(SeN['X'].data,SeN['Y'].data),inverse=True)
Lon2b, Lat2b = myP(*np.meshgrid(Xcorners,Ycorners),inverse=True) #

lons=np.asarray(Lon2)
lats=np.asarray(Lat2)
SeN.coords['lat'] = (('Y','X'),Lat2)
SeN.coords['lon'] = (('Y','X'),Lon2)
SeN.set_coords(['lat','lon'])

SeN.coords['Xb'] = (Xcorners)
SeN.coords['Yb'] = (Ycorners)
SeN.set_coords(['Xb','Yb'])

SeN.coords['lat_b'] = (('Yb','Xb'),Lat2b)
SeN.coords['lon_b'] = (('Yb','Xb'),Lon2b)
SeN.set_coords(['lat_b','lon_b'])
res=0.025
reslon=0.05
ds_out = xe.util.grid_2d(lons.min()-0.06,lons.max()+0.2,0.05,lats.min()-0.08,lats.max()+0.03, 0.025)
dr=SeN['precipitation_amount'][0:110]

regridder_cons = xe.Regridder(SeN, ds_out, 'conservative',reuse_weights=True)
dr_out = regridder_cons(dr)
plt.imshow(np.isnan(dr_out[0].data),origin='lower')

Check if grid coordinate values have discontinuity

To make issues like #14 easier to discover, xESMF should check if the input grid coordinates are well-defined, monotonically increasing ("smoothly-changing" would be more appropriate), single-tile quadrilateral grids. Basic steps would be:

Convert grid coordinate to Cartesian (x-y-z) space so that polar singularities and periodic boundaries can pass the check.
Compute grid sizes by something like np.diff()
If there's an abrupt change in grid size (criteria to be decided), throw a warning with element index.

Make it an option if it incurs visible slow-down.

Error AttributeError: module 'ESMF' has no attribute 'api'

I installed the required library but got the following error when import xesmf. It has problem when import ESMF.
I am working under CentOS 6.5. I installed xESMF using conda install -c conda-forge esmpy.

Below is the error:

AttributeError Traceback (most recent call last)
in ()
4 import numpy as np
5 import xarray as xr
----> 6 import xesmf as xe

~/local/bin/miniconda3/envs/pyenv/lib/python3.6/site-packages/xesmf/init.py in ()
1 from . import util
2 from . import data
----> 3 from . frontend import Regridder

~/local/bin/miniconda3/envs/pyenv/lib/python3.6/site-packages/xesmf/frontend.py in ()
7 import os
8
----> 9 from . backend import (esmf_grid, add_corner,
10 esmf_regrid_build, esmf_regrid_finalize)
11

~/local/bin/miniconda3/envs/pyenv/lib/python3.6/site-packages/xesmf/backend.py in ()
17
18 import numpy as np
---> 19 import ESMF
20 import warnings
21 import os

~/local/bin/miniconda3/envs/pyenv/lib/python3.6/site-packages/ESMF/init.py in ()
54 #### IMPORT LIBRARIES #########################################################
55
---> 56 from ESMF.api.esmpymanager import *
57 from ESMF.api.grid import *
58 from ESMF.api.mesh import *

~/local/bin/miniconda3/envs/pyenv/lib/python3.6/site-packages/ESMF/api/esmpymanager.py in ()
9 from ESMF.api.constants import *
10 from ESMF.util.exceptions import *
---> 11 from ESMF.interface.cbindings import *
12 from ESMF.util.decorators import initialize
13

~/local/bin/miniconda3/envs/pyenv/lib/python3.6/site-packages/ESMF/interface/cbindings.py in ()
9 import sys
10
---> 11 import ESMF.api.constants as constants
12 from ESMF.util.decorators import deprecated, netcdf
13 from ESMF.interface.loadESMF import _ESMF

AttributeError: module 'ESMF' has no attribute 'api'

Contributor guide?

So far I've been developing xESMF on my own. There are sporadic PRs (#23, #27), but I am actually not sure how to best handle them. Given the increasing community interest, it would be useful to talk about how people can contribute to xESMF.

Contributing to xarray is a good reference on the software engineering side (style, testing, documentation, bug fixes...). Here I am thinking more about the science & usability & algorithm sides that are specific to xESMF.

There are several types of contributions I can think of:

1. Contribution to examples and tutorials

This is the easiest one and is highly welcome. I am very interested in how people regrid all kinds of data in different Earth science disciplines (environmental, atmospheric, oceanic, land, remote sensing...). I often see xESMF being successfully used to deal with grid meshes that I haven't seen before (e.g. the "tri-polar grid" #14).

An example can be just a Jupyter notebook, focusing one or more of the following aspects:

An specific scientific application (e.g. converting CMIP5 data from multiple models grids to a common grid for comparison pangeo-data/pangeo#309)
The type of grid mesh (e.g. WRF's lambert conformal projection, MITgcm's lat-lon-cap, or even the Yin-Yang grid that most people have little experience with. Grid meshes are fun!)
The choice of algorithm (e.g. while bilinear and conservative are most common, the nearest neighbor method is actually great for categorical data such as land type index)

Guideline for tutorials/examples:

Full reproducibility is required. NCL has a wonderful page of regridding examples, but I have trouble running most scripts due to missing data. For xESMF doc, I only use data from xr.tutorial.load_dataset() or data computed on the fly. Small data used in the example (say less than 20 MB?) can be submitted and added to a xESMF-data repo, just like the xarray-data repo. For large data, a stable link must be provided. The Pangeo platform on GCP/AWS seems a good place for hosting large data.
A brief introduction to the scientific problem and why regridding is needed would be very useful.

2. Contribution to standalone, small utilities
Many issues on GitHub belong to "small utilities" (e.g. #15, #16). They do not have a large impact on the core regridding, but are crucial for usability and user experience. Developing those small utilities is much easier than hacking the regridding core, and they often do not require ESMF/ESMPy knowledge. It is also much easier for me to handle dependencies.

General principles:

The functionality should be closely related to regridding. An example is computing cell area, which is useful for checking mass conservation before/after conservative regridding. The computation of a certain grid mesh (unless extremely common one like regular lat-lon) had better go to examples, not utilities.
Avoid complicated data structure. Stick to xarray.Dataset and numpy whenever possible. Compatibility with pure numpy arrays is encouraged.
Minimize dependency on other functions, especially other "small utilities". xESMF is still young and the code base is in flux.

3. Contribution to core functionalities

I extremely welcome hard-core xarray/dask/ESMF/Pangeo developers to tackle some of the most challenging problems. For example:

Out-of-core, parallel (#3), and even distributed (pangeo-data/pangeo#334, #26) regridding
Unstructured grid (#18)

For those big questions, better discuss on GitHub before starting serious coding.

When & Where to start

I am still planing some significant refactor of the code base, to better support critical features, notably dask support #3, accept Dataset #5, and retrieve weights #11. (It is slowly moving because xESMF is my personal, unfunded, side project😐. Have a lot of other projects in hand. My life would probably be easier if I write a GMD/JORS paper on it, so it can count towards my PhD...) At this stage, hacking the core might not be the best choice, because it is very likely to change (talking about internal code, not user API). Contributing examples & tutorials & use cases is the safe bet.

TODO:

Add a Contributor Guide to online docs

conservative methods doesn't work

I'm regridding CMIP5-CCSM4 ocean output with a dipole grid (in Greenland) onto 1x1 grid. I created the grid as following

sss_out = xe.util.grid_global(1, 1)
sss_out  # contains lat/lon values of cell centers and boundaries.

which has grid corners

<xarray.Dataset>
Dimensions:  (x: 360, x_b: 361, y: 180, y_b: 181)
Coordinates:
    lon      (y, x) float64 -179.5 -178.5 -177.5 -176.5 -175.5 -174.5 -173.5 ...
    lat      (y, x) float64 -89.5 -89.5 -89.5 -89.5 -89.5 -89.5 -89.5 -89.5 ...
    lon_b    (y_b, x_b) int64 -180 -179 -178 -177 -176 -175 -174 -173 -172 ...
    lat_b    (y_b, x_b) int64 -90 -90 -90 -90 -90 -90 -90 -90 -90 -90 -90 ...
Dimensions without coordinates: x, x_b, y, y_b
Data variables:
    *empty*

However, when I try to use conservative method there seems to be problems with the corner and not recognizing lon_b.

regridder = xe.Regridder(sss, sss_out, 'conservative')
sss_out = regridder(sss)
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
~/miniconda3/lib/python3.6/site-packages/xarray/core/dataarray.py in _getitem_coord(self, key)
    460         try:
--> 461             var = self._coords[key]
    462         except KeyError:

KeyError: 'lon_b'

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-140-83962fdabcdd> in <module>()
----> 1 regridder = xe.Regridder(sss, sss_out, 'conservative')
      2 sss_out = regridder(sss)

~/miniconda3/lib/python3.6/site-packages/xesmf/frontend.py in __init__(self, ds_in, ds_out, method, periodic, filename, reuse_weights)
    128         self._grid_in, shape_in = ds_to_ESMFgrid(ds_in,
    129                                                  need_bounds=self.need_bounds,
--> 130                                                  periodic=periodic
    131                                                  )
    132         self._grid_out, shape_out = ds_to_ESMFgrid(ds_out,

~/miniconda3/lib/python3.6/site-packages/xesmf/frontend.py in ds_to_ESMFgrid(ds, need_bounds, periodic, append)
     59 
     60     if need_bounds:
---> 61         lon_b = np.asarray(ds['lon_b'])
     62         lat_b = np.asarray(ds['lat_b'])
     63         lon_b, lat_b = as_2d_mesh(lon_b, lat_b)

~/miniconda3/lib/python3.6/site-packages/xarray/core/dataarray.py in __getitem__(self, key)
    469     def __getitem__(self, key):
    470         if isinstance(key, basestring):
--> 471             return self._getitem_coord(key)
    472         else:
    473             # xarray-style array indexing

~/miniconda3/lib/python3.6/site-packages/xarray/core/dataarray.py in _getitem_coord(self, key)
    463             dim_sizes = dict(zip(self.dims, self.shape))
    464             _, key, var = _get_virtual_variable(
--> 465                 self._coords, key, self._level_coords, dim_sizes)
    466 
    467         return self._replace_maybe_drop_dims(var, name=key)

~/miniconda3/lib/python3.6/site-packages/xarray/core/dataset.py in _get_virtual_variable(variables, key, level_vars, dim_sizes)
     71         ref_var = dim_var.to_index_variable().get_level_variable(ref_name)
     72     else:
---> 73         ref_var = variables[ref_name]
     74 
     75     if var_name is None:

KeyError: 'lon_b'

The other four methods all work fine, but I'd like to use the conservative method. Any idea why this happens?

Option to normalize conservative regridding result by overlapping area

There are actually two types of conservative regridding, controlled by the NormType option in ESMPy.

Say that's the input data covering a limited region:

The option currently used by xESMF's conservative regridding will "dilute" the boundary cells. That's because the cell on the input grid's boundary are "averaged" with non-existing "cells" outside the boundary (which are assumed to have zero values).

The other option (not in xESMF yet) will normalize boundary cells so they are not "diluted":

The two options are only different at boundaries:

(Full code available in this gist, which was about an already-fixed bug)

I chose the first option mainly for multi-tile grids (regrid one by one and add up). The second option perhaps makes more sense for single-tile grids, but will double count the boundary cells in multi-tile grids. If the second option is useful I can add a new 'conservative_normed' method.

Generate SCRIP grid file from scratch

To solve #2 (the biggest, or perhaps the only barrier for xESMF), I would like a Python version of ESMF_RegridWeightGen. According to @bekozi and @rokuingh, this function will eventually be available in ESMPy. Before that, we can actually use the Fortran version of ESMF_RegridWeightGen to generate offline regridding weights. This command line utility comes with conda ESMF so users don't need to worry about compiling Fortran source code. In the future we can easily replace it by the Python equivalent, without affecting other parts of xESMF.

However,ESMF_RegridWeightGen requires the grid information to be stored in SCRIP Grid File Format. Here are possible ways to get this format:

Use SCRIP itself? Not a good approach because SCRIP is an old, no-longer-maintained software.
Use ESMPy? I can only find ways to read this format, but not to create this format.
Use this NCL function? This is perhaps the only easy way right now. I want to write a Python version of it, but its source code looks unnecessarily complicated.

I am wondering what's the best&quickest way to convert pure numpy arrays to SCRIP format, which can then be fed to ESMF_RegridWeightGen? xESMF only focuses on quadrilateral (i.e. logically rectilinear) grids so there's no need to worry about unstructured meshes. @bekozi and @rokuingh do you have minimal Python scripts to make this format?

integration with xgcm.Grid

xESMF is awesome!

I have spent quite a bit of effort in creating an xarray-friendly data model for finite volume GCM data in xgcm:
http://xgcm.readthedocs.io/en/latest/grids.html

It seems like this data model could help resolve some other issues in xESMF, in particular, the generalization of coordinate names (#5) and the specification of cell boundaries vs centers.

I wanted to open this issue just to discuss how we can make these two packages more mutually compatible and potentially avoid solving the same problems twice.

Cannot allocate very large (global ~1km) ESMF grid

@sdeastham failed to use xESMF to regrid global ~1 km resolution data. The issue is that ESMPy cannot create very large grid object. A minimal example is:

import numpy as np
import ESMF

# not-so-large grid works
grid = ESMF.Grid(np.array([20000, 10000]), 
                 staggerloc=ESMF.StaggerLoc.CENTER, 
                 coord_sys=ESMF.CoordSys.SPH_DEG)

# larger grid breaks
grid = ESMF.Grid(np.array([20000, 15000]), 
                 staggerloc=ESMF.StaggerLoc.CENTER, 
                 coord_sys=ESMF.CoordSys.SPH_DEG)

The last command leads to TypeError: buffer is too small for requested array:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-18-e629ff74bc4d> in <module>()
      1 grid = ESMF.Grid(np.array([20000, 15000]), 
      2                  staggerloc=ESMF.StaggerLoc.CENTER,
----> 3                  coord_sys=ESMF.CoordSys.SPH_DEG)

~/Research/Computing/miniconda3/envs/sci/lib/python3.6/site-packages/ESMF/util/decorators.py in new_func(*args, **kwargs)
     62 
     63         esmp = esmpymanager.Manager(debug = False)
---> 64         return func(*args, **kwargs)
     65     return new_func
     66 

~/Research/Computing/miniconda3/envs/sci/lib/python3.6/site-packages/ESMF/api/grid.py in __init__(self, max_index, num_peri_dims, periodic_dim, pole_dim, coord_sys, coord_typekind, staggerloc, filename, filetype, reg_decomp, decompflag, is_sphere, add_corner_stagger, add_user_area, add_mask, varname, coord_names, tilesize, regDecompPTile, name)
    451         # Add coordinates if a staggerloc is specified
    452         if staggerloc is not None:
--> 453             self.add_coords(staggerloc=staggerloc, from_file=from_file)
    454 
    455         # Add items if they are specified, this is done after the

~/Research/Computing/miniconda3/envs/sci/lib/python3.6/site-packages/ESMF/api/grid.py in add_coords(self, staggerloc, coord_dim, from_file)
    797 
    798                 # and now for Python
--> 799                 self._allocate_coords_(stagger, from_file=from_file)
    800 
    801                 # set the staggerlocs to be done

~/Research/Computing/miniconda3/envs/sci/lib/python3.6/site-packages/ESMF/api/grid.py in _allocate_coords_(self, stagger, localde, from_file)
   1022         if (self.ndims == self.rank) or (self.ndims == 0):
   1023             for xyz in range(self.rank):
-> 1024                 self._link_coord_buffer_(xyz, stagger, localde)
   1025         # and this way if we have 1d coordinates
   1026         elif self.ndims < self.rank:

~/Research/Computing/miniconda3/envs/sci/lib/python3.6/site-packages/ESMF/api/grid.py in _link_coord_buffer_(self, coord_dim, stagger, localde)
   1072         lb, ub = ESMP_GridGetCoordBounds(self, staggerloc=stagger, localde=localde)
   1073 
-> 1074         gridCoordP = ndarray_from_esmf(data, self.type, ub-lb)
   1075 
   1076         # alias the coordinates to a grid property

~/Research/Computing/miniconda3/envs/sci/lib/python3.6/site-packages/ESMF/util/esmpyarray.py in ndarray_from_esmf(data, dtype, shape)
     37 
     38     esmfarray = np.ndarray(tuple(shape[:]), constants._ESMF2PythonType[dtype],
---> 39                            buffer, order="F")
     40 
     41     return esmfarray

TypeError: buffer is too small for requested array

@bekozi Any idea about this?

Regridding CMIP5 data

I'm regridding CMIP5 ocean temperature data from a curvilinear to rectilinear grid. Following the relevant example in the documentation, I've tried to use xESMF for this task without success:
https://github.com/DamienIrving/ocean-analysis/blob/master/development/xESMF_test.ipynb

I was wondering if there's an obvious mistake I've made in that notebook?

High memory usage

The memory usage of xESMF seems quite high, higher than standalone ESMF CLI, and I wonder if this is due to a possible leak, as loading from a saved weights file uses much less RAM? Additionally, perhaps it can be improved by using lower precision (32bit vs 64bit).
I also note it doesn't seem possible to destroy a Regridder object to free memory?

import xesmf
import numpy as np
import gc
@profile
def test():
    src_ds = {'lat': np.arange(29.5,70.5,0.05), 'lon': np.arange(-23.5,45.0,0.05)} 
    dst_ds = xesmf.util.grid_2d(29, 70, 0.03, -23, 45, 0.03)
    regridder = xesmf.Regridder(src_ds, dst_ds, 'bilinear')
    regridder = None
    gc.collect()
    print("collected")
test()

python3 -m memory_profiler memtest.py
Line #    Mem usage    Increment   Line Contents
================================================
     4   65.500 MiB   65.500 MiB   @profile
     5                             def test():
     6   65.500 MiB    0.000 MiB       src_ds = {'lat': np.arange(29.5,70.5,0.05), 'lon': np.arange(-23.5,45.0,0.05)} 
     7  160.461 MiB   94.961 MiB       dst_ds = xesmf.util.grid_2d(29, 70, 0.03, -23, 45, 0.03)
     8 1246.199 MiB 1085.738 MiB       regridder = xesmf.Regridder(src_ds, dst_ds, 'bilinear')
     9 1246.203 MiB    0.004 MiB       regridder = None
    10 1246.203 MiB    0.000 MiB       gc.collect()
    11 1246.203 MiB    0.000 MiB       print("collected")

Comparison with xarray's built-in interp()

Recently realized that xarray has a nice new interpolation module added by @crusaderky and @fujiisoup in pydata/xarray#2079 (also cc @shoyer, @rabernat).

It does have some overlap with xESMF (fortunately not too much), so I think it would be necessary to:

Compare their advantages & limitations
Identify their proper use cases
Better define the development roadmap and avoid duplication of effort

interp() wraps scipy.interpolate; while xESMF wraps ESMPy's regridding functionality. The subtle difference between "interpolation" and "regridding" is that, the former often refers to traditional 1D interpolation (sometimes N-D), mostly in Cartesian space; while the latter specifically means geospatial data on Earth's sphere.

I personally think interp() is a great fit for:

Interpolation over 1D coordinate (e.g. vertical layers, time). I've been using Scipy for vertical interpolation, too. ESMPy does support 3D grid, but this is generally an overkill.
Data that are not on Earth's sphere. Say the output from any other physical models. ESMPy can actually handle Cartesian coordinates, but everyone seems to only use spherical coordinates
High-dimensional regular grid (>=4D? rarely seen in Earth science but can occur in other physical sciences or machine learning). Seems like interp's API tries to generalize to arbitrary dimensions, while xESMF is specific to horizontal regridding on the sphere.
Sampling over a trajectory via "Advanced Interpolation". ESMPy does have a similar support via LocStream¶ but I personally don't use it. Glad that xarray has native support for this feature.

For geospatial regridding tasks, xESMF has some important strengths. Many already reviewed in the docs, but more specifically:

Performance, especially with large data. xESMF reuses weights but scipy.interpolate does not.
This simple test shows that xESMF is 16x faster than interp(), once the weights are computed (computing weights is also faster than interp). Indeed, this performance gap will be narrowed down on distributed cloud platforms, where the I/O time dominates (pangeo-data/pangeo#334).
Curvilinear grid (pydata/xarray#2281). This is the major reason why I wrote xESMF...
Conservative algorithm, to conserve the integral for density-like fields such as air density, heat flux, emission intensity... (This algorithm is used everywhere in Earth science but is never taught in numerical analysis classes. Scipy is basically "everything in a numerical analysis textbook", so unsurprisingly it has no conservative scheme.)

In short, interp is a general-purpose interpolation module; xESMF is a geospatial regridding package targeting at Earth science needs. Looks like their objectives can be distinguished. Should we consider merging some efforts? Or just perhaps let them evolve independently?

ImportError: The ESMF shared library did not load properly.

The error

Traceback (most recent call last):
  File "/soge-home/users/chri4118/.conda/envs/crp/lib/python3.7/site-packages/ESMF/interface/loadESMF.py", line 118, in <module>
    mode=ct.RTLD_GLOBAL)
  File "/soge-home/users/chri4118/.conda/envs/crp/lib/python3.7/ctypes/__init__.py", line 356, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /soge-home/users/chri4118/.conda/envs/crp/lib/./libmpicxx.so.12: undefined symbol: MPIR_Keyval_set_proxy

----> 3 import xesmf as xe
      4 import numpy as np
      5

~/.conda/envs/crp/lib/python3.7/site-packages/xesmf/__init__.py in <module>
      1 from . import util
      2 from . import data
----> 3 from . frontend import Regridder

~/.conda/envs/crp/lib/python3.7/site-packages/xesmf/frontend.py in <module>
      7 import os
      8
----> 9 from . backend import (esmf_grid, add_corner,
     10                        esmf_regrid_build, esmf_regrid_finalize)
     11

~/.conda/envs/crp/lib/python3.7/site-packages/xesmf/backend.py in <module>
     17
     18 import numpy as np
---> 19 import ESMF
     20 import warnings
     21 import os

~/.conda/envs/crp/lib/python3.7/site-packages/ESMF/__init__.py in <module>
     68 #### IMPORT LIBRARIES #########################################################
     69
---> 70 from ESMF.api.esmpymanager import *
     71 from ESMF.api.grid import *
     72 from ESMF.api.mesh import *

~/.conda/envs/crp/lib/python3.7/site-packages/ESMF/api/esmpymanager.py in <module>
      9 from ESMF.api.constants import *
     10 from ESMF.util.exceptions import *
---> 11 from ESMF.interface.cbindings import *
     12 from ESMF.util.decorators import initialize
     13

~/.conda/envs/crp/lib/python3.7/site-packages/ESMF/interface/cbindings.py in <module>
     11 import ESMF.api.constants as constants
     12 from ESMF.util.decorators import deprecated, netcdf
---> 13 from ESMF.interface.loadESMF import _ESMF
     14
     15 def copy_struct(src):

~/.conda/envs/crp/lib/python3.7/site-packages/ESMF/interface/loadESMF.py in <module>
    119 except:
    120     traceback.print_exc(file=sys.stdout)
--> 121     raise ImportError('The ESMF shared library did not load properly.')

ImportError: The ESMF shared library did not load properly.

my environment

Conda list:

conda (crp) chri4118@linux1:/soge-home/projects/crop_yield/ml_drought$ conda list
# packages in environment at /soge-home/users/chri4118/.conda/envs/crp:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main
affine                    2.2.2                      py_0    conda-forge
argparse                  1.4.0                     <pip>
asn1crypto                0.24.0                    <pip>
asn1crypto                0.24.0                   py37_0
atomicwrites              1.3.0                      py_0    conda-forge
atomicwrites              1.3.0                     <pip>
attrs                     19.1.0                     py_0    conda-forge
aws-sam-translator        1.11.0                    <pip>
aws-xray-sdk              2.4.2                     <pip>
backcall                  0.1.0                    py37_0
basemap                   1.2.0           py37h673bf1a_1001    conda-forge
basemap-data-hires        1.2.0                         0    conda-forge
bcrypt                    3.1.6                     <pip>
beautifulsoup4            4.7.1                     <pip>
blas                      1.0                         mkl
bleach                    3.1.0                    py37_0
bokeh                     1.0.4                    py37_0
boltons                   19.1.0                     py_0    conda-forge
boltons                   19.1.0                    <pip>
boost-cpp                 1.68.0            h11c811c_1000    conda-forge
boto                      2.49.0                    <pip>
boto3                     1.9.153                   <pip>
botocore                  1.12.153                  <pip>
Bottleneck                1.2.1                     <pip>
bottleneck                1.2.1           py37h3010b51_1001    conda-forge
bs4                       0.0.1                     <pip>
bzip2                     1.0.6             h14c3975_1002    conda-forge
ca-certificates           2018.11.29           ha4d7672_0    conda-forge
cachetools                3.0.0                     <pip>
cairo                     1.14.12              h8948797_3
cartopy                   0.17.0           py37hbb7e04d_1
cdsapi                    0.1.4                     <pip>
certifi                   2019.6.16                py37_0    conda-forge
cffi                      1.12.3           py37h8022711_0    conda-forge
cfgrib                    0.9.7                      py_0    conda-forge
cfn-lint                  0.21.3                    <pip>
cftime                    1.0.3.4                   <pip>
cftime                    1.0.3.4          py37hdd07704_0
chardet                   3.0.4                    py37_1
chardet                   3.0.4                     <pip>
click                     7.0                        py_0    conda-forge
click-plugins             1.0.4                      py_0    conda-forge
cligj                     0.5.0                      py_0    conda-forge
climate-indices           1.0.4                     <pip>
cloudpickle               0.6.1                    py37_0
colorcet                  1.0.0                    py37_0
cryptography              2.3.1            py37hc365091_0
cryptography              2.4.2                     <pip>
cudatoolkit               10.0.130                      0
curl                      7.64.1               hf8cf82a_0    conda-forge
cycler                    0.10.0                   py37_0
Cython                    0.29.3                    <pip>
cytoolz                   0.9.0.1          py37h14c3975_1
dask                      1.2.2                      py_2    conda-forge
dask-core                 1.2.2                      py_0    conda-forge
datashader                0.6.8                      py_0    pyviz
datashape                 0.5.4                    py37_1
dbus                      1.13.2               h714fa37_1
decorator                 4.3.0                    py37_0
descartes                 1.1.0                      py_2    conda-forge
dill                      0.2.9                     <pip>
distributed               1.28.1                   py37_0    conda-forge
docker                    4.0.1                     <pip>
docutils                  0.14                      <pip>
earthengine-api           0.1.164                   <pip>
eccodes                   2.12.3               h1ba12e4_2    conda-forge
ecdsa                     0.13.2                    <pip>
entrypoints               0.3                      py37_0
esmf                      7.1.0r            hdfb41a0_1003    conda-forge
esmpy                     7.1.0r           py37h24bf2e0_2    conda-forge
expat                     2.2.5                he0dffb1_0
ffmpeg                    3.1.3                         0    menpo
fiona                     1.8.4           py37h1c6dbfb_1002    conda-forge
fire                      0.1.3                      py_1    conda-forge
flake8                    3.7.7                    py37_0    conda-forge
fontconfig                2.13.1            h2176d3f_1000    conda-forge
freetype                  2.9.1                h8a8886c_1
freexl                    1.0.5                h14c3975_0
future                    0.17.1                    <pip>
gdal                      2.4.0           py37heb36068_1001    conda-forge
geojson                   2.4.1                      py_0    conda-forge
geopandas                 0.4.0                      py_1    conda-forge
geoplot                   0.2.4                      py_0    conda-forge
geos                      3.7.1                he6710b0_0
geotiff                   1.4.3             h1105359_1000    conda-forge
gettext                   0.19.8.1             hd7bead4_3
giflib                    5.1.4                h14c3975_1
glib                      2.56.2            had28632_1001    conda-forge
gmp                       6.1.2                h6c8ec71_1
google-api-python-client  1.7.7                     <pip>
google-auth               1.6.2                     <pip>
google-auth-httplib2      0.0.3                     <pip>
gst-plugins-base          1.12.5            h3865690_1000    conda-forge
gstreamer                 1.12.5            h0cc0488_1000    conda-forge
hdf4                      4.2.13            h9a582f1_1002    conda-forge
hdf5                      1.10.5          nompi_h3c11f04_1100    conda-forge
heapdict                  1.0.0                    py37_2
holoviews                 1.11.2                    <pip>
holoviews                 1.11.0             pyh39e3cac_0    pyviz
httplib2                  0.12.0                    <pip>
icu                       58.2                 h9c2bf20_1
idna                      2.8                       <pip>
idna                      2.8                      py37_0
imageio                   2.4.1                    py37_0
inspect2                  0.1                       <pip>
intel-openmp              2019.1                      144
ipdb                      0.11                       py_2    conda-forge
ipykernel                 5.1.0            py37h39e3cac_0
ipython                   7.2.0            py37h39e3cac_0
ipython_genutils          0.2.0                    py37_0
ipywidgets                7.4.2                    py37_0
jasper                    1.900.1           h07fcdf6_1006    conda-forge
jedi                      0.13.2                   py37_0
Jinja2                    2.10.1                    <pip>
jinja2                    2.10                     py37_0
Jinja2                    2.10                      <pip>
jmespath                  0.9.4                     <pip>
joblib                    0.13.1                    <pip>
jpeg                      9c                h14c3975_1001    conda-forge
json-c                    0.13.1               h1bed415_0
jsondiff                  1.1.2                     <pip>
jsonpatch                 1.23                      <pip>
jsonpickle                1.2                       <pip>
jsonpointer               2.0                       <pip>
jsonschema                2.6.0                    py37_0
jupyter                   1.0.0                    py37_7
jupyter_client            5.2.4                    py37_0
jupyter_console           6.0.0                    py37_0
jupyter_core              4.4.0                    py37_0
kealib                    1.4.10            he7154bc_1002    conda-forge
kiwisolver                1.0.1            py37hf484d3e_0
krb5                      1.16.3            h05b26f9_1001    conda-forge
libaec                    1.0.4                hf484d3e_0    conda-forge
libboost                  1.67.0               h46d08c1_4
libcurl                   7.64.1               hda55be3_0    conda-forge
libcxx                    7.0.0             h6bb024c_1002    conda-forge
libdap4                   3.19.1            hd48c02d_1000    conda-forge
libedit                   3.1.20170329      hf8c457e_1001    conda-forge
libffi                    3.2.1             he1b5a44_1006    conda-forge
libgcc-ng                 9.1.0                hdf63c60_0
libgdal                   2.4.0             h982c1cc_1001    conda-forge
libgfortran-ng            7.3.0                hdf63c60_0
libiconv                  1.15                 h63c8f33_5
libkml                    1.3.0                h590aaf7_4
libnetcdf                 4.6.2             h056eaf5_1002    conda-forge
libpng                    1.6.37               hed695b0_0    conda-forge
libpq                     10.6              h13b8bad_1000    conda-forge
libsodium                 1.0.16               h1bed415_0
libspatialindex           1.8.5             hf484d3e_1003    conda-forge
libspatialite             4.3.0a              hb08deb6_19
libssh2                   1.8.2                h22169c7_2    conda-forge
libstdcxx-ng              8.2.0                hdf63c60_1
libtiff                   4.0.10            h2733197_1001
libuuid                   2.32.1            h14c3975_1000    conda-forge
libxcb                    1.13                 h1bed415_1
libxml2                   2.9.8                h26e45fe_1
libxslt                   1.1.33               h7d1a2b0_0
linecache2                1.0.0                     <pip>
llvm-meta                 7.0.0                         0    conda-forge
llvmlite                  0.27.0           py37hd408876_0
lmoments3                 1.0.4                     <pip>
locket                    0.2.0                    py37_1
lxml                      4.3.0            py37hefd8a0e_0
MarkupSafe                1.1.0                     <pip>
markupsafe                1.1.0            py37h7b6447c_0
matplotlib                3.0.2           py37h8a2030e_1001    conda-forge
matplotlib-base           3.0.2           py37h167e16e_1001    conda-forge
mccabe                    0.6.1                      py_1    conda-forge
mistune                   0.8.4            py37h7b6447c_0
mkl                       2019.1                      144
mkl_fft                   1.0.11           py37h14c3975_0    conda-forge
mkl_random                1.0.2            py37h637b7d7_2    conda-forge
mock                      3.0.5                     <pip>
more-itertools            7.0.0                     <pip>
more-itertools            4.3.0                 py37_1000    conda-forge
moto                      1.3.8                     <pip>
mpi                       1.0                       mpich    conda-forge
mpich                     3.2.1             h1c2f66e_1008    conda-forge
mpl-animation             0.1                       <pip>
msgpack-python            0.6.1            py37hfd86e86_1
multipledispatch          0.6.0                    py37_0
multiprocess              0.70.7                    <pip>
munch                     2.3.2                      py_0    conda-forge
nbconvert                 5.3.1                    py37_0
nbformat                  4.4.0                    py37_0
nco                       1.0.0                     <pip>
ncurses                   6.1               hf484d3e_1002    conda-forge
netcdf-fortran            4.4.5             hea25ff8_1000    conda-forge
netcdf4                   1.4.2            py37h808af73_0    anaconda
networkx                  2.2                      py37_1
ninja                     1.9.0            py37hfd86e86_0
notebook                  5.7.4                    py37_0
numba                     0.42.0           py37h962f231_0
numpy                     1.16.2           py37h7e9f1db_0
numpy-base                1.16.2           py37hde5b4d6_0
olefile                   0.46                     py37_0
openjpeg                  2.3.0                h05c96fa_1
openssl                   1.1.1b               h14c3975_1    conda-forge
osmnx                     0.8.2                     <pip>
owslib                    0.17.0                   py37_0
packaging                 18.0                     py37_0
pandas                    0.24.2           py37hb3f55d8_0    conda-forge
pandoc                    2.2.3.2                       0
pandocfilters             1.4.2                    py37_1
param                     1.8.2                     <pip>
param                     1.8.2                      py_0    pyviz
paramiko                  2.4.2                     <pip>
parso                     0.3.1                    py37_0
partd                     0.3.9                    py37_0
pathos                    0.2.3                     <pip>
patsy                     0.5.1                      py_0    conda-forge
pcre                      8.41                 hc27e229_1
pexpect                   4.6.0                    py37_0
pickleshare               0.7.5                    py37_0
pillow                    5.4.1            py37h34e0f95_0
pint                      0.9                      py37_2    conda-forge
pip                       19.1.1                   py37_0    conda-forge
pixman                    0.34.0               hceecf20_3
pluggy                    0.11.0                     py_0    conda-forge
poppler                   0.67.0            h2fc8fa2_1002    conda-forge
poppler-data              0.4.9                         0
postgresql                10.6              h66cca7a_1000    conda-forge
pox                       0.2.5                     <pip>
ppft                      1.6.4.9                   <pip>
proj4                     5.2.0                he6710b0_1
prometheus_client         0.5.0                    py37_0
prompt_toolkit            2.0.7                    py37_0
psutil                    5.4.8            py37h7b6447c_0
psycopg2                  2.7.6.1         py37hb7f436b_1000    conda-forge
ptyprocess                0.6.0                    py37_0
py                        1.8.0                      py_0    conda-forge
pyasn1                    0.4.5                     <pip>
pyasn1-modules            0.2.3                     <pip>
pycodestyle               2.5.0                      py_0    conda-forge
pycparser                 2.19                     py37_1    conda-forge
pycparser                 2.19                      <pip>
pyct                      0.4.6                      py_0    pyviz
pyct-core                 0.4.6                      py_0    pyviz
pyepsg                    0.4.0                    py37_0
pyflakes                  2.1.1                      py_0    conda-forge
pygments                  2.3.1                    py37_0
pykdtree                  1.3.1            py37hdd07704_2
PyNaCl                    1.3.0                     <pip>
pyopenssl                 18.0.0                   py37_0
pyparsing                 2.3.1                    py37_0
pyproj                    1.9.6           py37hc0953d3_1000    conda-forge
pyqt                      5.6.0            py37h22d08a2_6
pysal                     1.14.4.post2          py37_1001    conda-forge
pysheds                   0.2.6            py37h14c3975_0    conda-forge
pyshp                     1.2.12                   py37_0    anaconda
pysocks                   1.6.8                    py37_0
pytest                    4.4.1                     <pip>
pytest                    4.5.0                    py37_0    conda-forge
pytest-flake8             1.0.4                    py37_0    conda-forge
python                    3.7.3                h33d41f4_1    conda-forge
python-dateutil           2.8.0                      py_0    conda-forge
python-jose               3.0.1                     <pip>
pytorch                   1.1.0           py3.7_cuda10.0.130_cudnn7.5.1_0    pytorch
pytz                      2019.1                     py_0    conda-forge
pyviz-comms               0.7.0                     <pip>
pyviz_comms               0.7.0                      py_0    pyviz
pywavelets                1.0.1            py37hdd07704_0
pyyaml                    3.13             py37h14c3975_0
pyzmq                     17.1.2           py37he6710b0_2
qt                        5.6.2                hf70d934_9    conda-forge
qtconsole                 4.4.3                    py37_0
rasterio                  1.0.13                    <pip>
rasterio                  1.0.22           py37h5b3f9e8_0    conda-forge
rasterstats               0.13.0                    <pip>
readline                  8.0                  hf8c457e_0    conda-forge
requests                  2.21.0                   py37_0
responses                 0.10.6                    <pip>
rsa                       4.0                       <pip>
rtree                     0.8.3                 py37_1000    conda-forge
s3transfer                0.2.0                     <pip>
salem                     0.2.4                     <pip>
scikit-image              0.15.0                    <pip>
scikit-image              0.14.1           py37he6710b0_0
scikit-learn              0.20.2           py37hd81dba3_0
scipy                     1.2.1            py37h7c811a0_0
scipy                     1.3.0                     <pip>
seaborn                   0.9.0                      py_0    conda-forge
seaborn                   0.9.0                     <pip>
send2trash                1.5.0                    py37_0
setuptools                41.0.1                   py37_0    conda-forge
shap                      0.28.5                    <pip>
shapely                   1.6.4            py37h86c5351_0
simplejson                3.16.0                    <pip>
sip                       4.18.1           py37hf484d3e_2
six                       1.12.0                py37_1000    conda-forge
snuggs                    1.4.2                     <pip>
snuggs                    1.4.3                      py_0    conda-forge
sortedcontainers          2.1.0                    py37_0
soupsieve                 1.9.1                     <pip>
sqlalchemy                1.2.16          py37h14c3975_1000    conda-forge
sqlite                    3.28.0               hcee41ef_1    conda-forge
statsmodels               0.9.0           py37h3010b51_1000    conda-forge
tblib                     1.3.2                    py37_0
terminado                 0.8.1                    py37_1
testpath                  0.4.2                    py37_0
tk                        8.6.9             hed695b0_1002    conda-forge
toolz                     0.9.0                    py37_0
torchvision               0.2.2                      py_3    pytorch
tornado                   5.1.1            py37h7b6447c_0
tqdm                      4.30.0                    <pip>
traceback2                1.4.0                     <pip>
traitlets                 4.3.2                    py37_0
tzcode                    2018g             h14c3975_1001    conda-forge
unittest2                 1.1.0                     <pip>
uritemplate               3.0.0                     <pip>
urllib3                   1.24.1                   py37_0
urllib3                   1.24.1                    <pip>
wcwidth                   0.1.7                    py37_0
webencodings              0.5.1                    py37_1
websocket-client          0.56.0                    <pip>
Werkzeug                  0.15.4                    <pip>
wheel                     0.33.4                   py37_0    conda-forge
widgetsnbextension        3.4.2                    py37_0
wrapt                     1.11.1                    <pip>
xarray                    0.12.2                     py_0    conda-forge
xclim                     0.9b0                      py_0    conda-forge
xerces-c                  3.2.2                h780794e_0
xesmf                     0.1.1                     <pip>
xmltodict                 0.12.0                    <pip>
xz                        5.2.4             h14c3975_1001    conda-forge
yaml                      0.1.7                had09818_2
zeromq                    4.3.1                he6710b0_3
zict                      0.1.3                    py37_0
zlib                      1.2.11            h14c3975_1004    conda-forge

Need a test suite

Progress seems to be moving forward here quite nicely. I'm sensing a growing acknowledgement that the approach you are developing may be the path forward for many in the xarray/geoscience community. That said, to be fully adopted by the community, xESMF is going to need a comprehensive test suite. I'll suggest three key components:

Continuous integration (e.g. TravisCI)
Unit Tests (e.g. pytest)
Benchmarks (e.g. asv)

If you are not familiar with how to work with these tools, xarray is going to be a good reference and I think you'll find many of us are willing to chip in.

Cite this repo?

Hi @JiaweiZhuang, I would like to cite this repo in my paper. Do you have a prefered citation?? My best guess was below.

@misc{Zhuang2018,
author = {Zhuang, J.},
title = {xESMF: Universal Regridder for Geospatial Data},
year = {2018},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/JiaweiZhuang/xESMF}},
commit = {c123a5b}
}

tripolar grid

I'm trying to regrid the cmip5 data from the ocean model NEMO, which used a tripolar grid (still curvilinear). It seems none of the bilinear, patch and conservative method works. I thought bilinear and patch do not require grid corners but do they require some other specific grid configurations? The conservative method doesn't work because I can't figure out the corners. The corners in the data are specified as 4 vertices but the orientations seem not constant. I mean in the tripolar grid the four faces of the grid are not necessarily oriented east-west or north-south, what is the orientation that is acceptable? The notebook is uploaded here.
https://github.com/JianghuiDu/cmip5

Making silent the reuse of weights message...

Hey @JiaweiZhuang

I am using xESMF since some weeks ago and I am very happy with it. Its very fast compared to my former regridding tool ESMPy.. Well done.

I was just wondering if there is a way of avoiding the messages when the weights are reused? I tried with ; at the end of the code snipet but still the message is printed out. I am looping over several layer and I really want to avoid printing. This is an excerpt of the loop I have:

for time in range(co.shape[0]):
    for lev in range(co.shape[1]):
        co_xr                  = xr.Dataset(data_vars={'co': (('lat','lon'),co[time,lev]),
                                                        'pres':(('lat','lon'),pi_ecmwf[time,lev])},
                                             coords={'lat':co_lat,'lon':co_lon })
        regridder              = xe.Regridder(co_xr,ds_out,'bilinear',reuse_weights=True)
        co_re                  = regridder(co_xr['co'])
        pi_re                  = regridder(co_xr['pres'])
        co_fields_re[time,lev] = co_re.data
        pi_fields_re[time,lev] = pi_re.data

This is printed after each loop...
Reuse existing file: bilinear_451x900_184x309.nc

Thanks in advance..

Retrieve regridding weights as numpy arrays

Most regridding schemes are linear, i.e. the output data field is linearly dependent on the input data field. Any linear transform can be viewed as a matrix-vector multiplication y = A*x, where A is a matrix (typically sparse) containing regridding weights, and x, y are input and output data fields flatten to 1D.

In practice, linear regridding schemes are broken into two steps:

Calculate regridding weights (i.e. get the matrix A), from the knowledge of source and destination grids. In ESMPy, this is done by regrid = ESMF.Regrid(...).
Apply regridding weights to the data field (i.e. perform A*x). In ESMPy, this is done by destfield = regrid(sourcefield, destfield), where regrid was created in the previous step.

However, ESMPy's regrid object is like a black box. It knows how to perform regridding but there's no way to explicitly view the regridding weights (the matrix A).

In the Fortran version of ESMF, the function ESMF_RegridWeightGen dumps regridding weights to NetCDF files. The content of the file is shown in "12.8 Regrid Interpolation Weight File Format" section in ESMF documention (link). The matrix A is stored in a sparse matrix form by variables row,col and S in that NetCDF file. But in ESMPy there's no function equivalent to ESMF_RegridWeightGen.

Being able to view the regridding weights in the Python-level will solve many troubles:

Dimension matching. To broadcast the regridding operation across additional dimensions (call them N1, N2...), ESMPy requires the input data to have the shape [Nlat, Nlon, N1, N2,...]. Instead of using xarray's transpose() function to match ESMPy's expection, we can write the sparse matrix multiplication directly in numpy, taking care of dimension broadcasting.
Dask intergration. By using numpy instead of the underlying Fortran routine to apply regridding weights, we can use dask.array easily and natively. Otherwise, we need to let each dask worker call the underlying Fortran routine separately to regrid each hyberslab -- sounds like a very ugly solution.
Allow other programs to use ESMF regridding. One use case is NASA-GEOS5's MAPL software. Calculating regridding weights is very hard and we don't want to rebuild the wheel, but applying the weights to the data field can be done by ~5 lines of code in most languages.

@bekozi seems to have some Python tools for ESMF_RegridWeightGen. Maybe we could start with that.

Using xESMF with WRFChem

Thank you for your work!

I'm trying to regridding variables in the wrfout* file generated by WRFChem.
So, I have modified lon and lat of frontend.py for bilinear regridding.

Here's my plot script and result:

import matplotlib.pyplot as plt
import cartopy.crs as ccrs
import numpy as np
import xarray as xr
import xesmf as xe

ds = xr.open_dataset("sample")

CG = ds['CG_FLASHCOUNT']
CG_scaled = CG/1E3;
CG_scaled.attrs['units'] = 'kFlashes'

fig = plt.figure(figsize=[8, 4])

proj = ccrs.LambertConformal(central_latitude=20,central_longitude=-95,
                            standard_parallels = (20, 20))

ax = plt.axes(projection=proj)
ax.coastlines()
ax.gridlines(linestyle='--')

nested_grid = {'XLONG': np.linspace(-110, -80, 61),
               'XLAT': np.linspace(20, 50, 61)
              }

regridder_bilinear = xe.Regridder(ds, nested_grid, method='bilinear')

# regridder_bilinear = xe.Regridder(ds, nested_grid, method='conservative')

dr_nested = regridder_bilinear(CG_scaled)
dr_nested.plot(ax=ax, cmap='coolwarm', cbar_kwargs={'label': 'kFlashes'})
ax.set_title('CG_FLASHCOUNT');

plt.show()

The coastlines and boundary aren't shown correct.

By the way, I want to use Conservative regridding, but lon_b and lat_b aren't variables in wrfout files. How do you get the value of lon_b and lat_b?

After interpolation, 0s show up on edges

Is it possible to have it return np.nan instead?

!wget https://www.ncei.noaa.gov/data/outgoing-longwave-radiation-daily/access/olr-daily_v01r02-preliminary_20170101_20171231.nc

import numpy as np
import xarray as xr
import xesmf

ds = xr.open_dataset('olr-daily_v01r02-preliminary_20170101_20171231.nc').olr
ds_out = xr.Dataset({'lat': (['lat'], np.arange(-90, 92.5, 2.5)),
                     'lon': (['lon'], np.arange(0, 360, 2.5)),
                    }
                   )
regridder = xe.Regridder(ds, clim_u_ds, 'bilinear')
regridder(ds).isel(time=0).plot()

xESMF python2 support?

Looks like a great interface for a daunting package, esmf. I'm working towards migrating to python3 but is it possible to support both python2/3 with this package? I tried pip installing but it stopped as expected from your documentation. Thanks,

Regridding API design

I was looking at Pangeo's Regridding Design Document, which suggests an interface like da.remap.remap_like(da_target, how='bilinear'). It looks clean but doesn't match the "two-step" procedure for most regridding algorithms.

Regridding weight calculation and weight application should be done separately, as reviewed by #2. The weights only depend on the source and target grids, not the input data. As long as the grids are not changed, users only need to calculate the weights once and can apply them to any data. Applying weights is often orders of magnitude faster than calculating weights (see the timing in #6 as an example), so separating the two steps has a huge impact on performance, especially when the task is to regrid a lot of data between a fixed pair of grids.

xESMF's current design that dr_out = xe.regrid(ds_in, ds_out, dr_in) basically re-computes the weights every time. Using two steps should boost the performance by 10~100x.

Thus I am thinking about sklearn-like API, which is also two-step:

In sklearn you train a model by
model.fit(x_train, y_train)
then make new predictions by
y_pred = model.predict(x_test)

xESMF can do similar things:

Calculate regridding weights by
weights = xe.compute_weights(ds_in, ds_out, method='bilinear')
where ds_in and ds_out are xarray DataSet containing input and output grid information.
then apply weights to data by
dr_out = weights.apply(dr_in)
where dr_in is the input DataArray
Because ESMPy writes the weights to a file, the next time you can read it from file instead of computing it again.
weights = xe.read_weights("weights.nc")
The IO time is negligible compared to re-computing the weights.

Here weights is a tiny class that holds the weights and knows how to apply them to data. Alternatively, weights can be simply an xarray DataSet read by raw_weights = xr.open_dataset("weights.nc"). Then step 2 is changed to dr_out = xe.apply_weights(raw_weights, dr_in). I prefer the first approach because it feels more like sklearn and people might feel it more familiar.

Any comments are welcome, otherwise I'll proceed this way. @rabernat @jhamman @spencerahill. Please also @ anyone who are interested in regridding with xarray.

python 3.7 support

With python 3.7 being around for quite a while now, it would be great if xesmf would support this python version. Currently a
conda install xesmf
works only for python 3.6.x. Perhaps only the installation files on the conda-forge channel needs to be updated.

Thanks a lot!
Leo

How does xESMF treat missing values?

I have a use case where I am regridding from/to grids that have missing values in them (water mask over land). I'm curious how xESMF is treating these grid cells. I am noticing a large difference in how the grids come out between the bilinear and conservative regridding methods. For example:

bilinear:

conservative:

I should note, this regridding operation is a coarse-graining operation so there are up to 64 1/8th deg source grid cells going into each 1 deg dest grid cell.