(continuing <a class="issue-link js-issue-link" data-error-text="Failed to load title"

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Talked with <a class="user-mention notranslate" data-hovercard-type="user" data-hoverc

That's great news <a class="user-mention notranslate" data-hovercard-type="user" data-

Handle weight files more elegantly,about jiaweizhuang/xesmf

Comments (21)

bekozi commented on June 6, 2024 2

@JiaweiZhuang, the in-memory weights branch is ready for testing. A Dockerfile for building ESMF from that branch is available here.

For reference, here are links to the docstrings for the new interfaces. ESMF doesn't build docs for branches...

In our testing, the factor arrays contain zero elements with no spatial overlap (disjoint geometries) and unmapped action is ignore. Again, sorry for the delay on this. Hopefully we won't encounter too many issues during your implementation.

from xesmf.

bekozi commented on June 6, 2024 1

We have factors in the Python interface now and are working on testing. There are still some compiler-related things to address.

Before charging forward, I want to get some input on the interface. The factors themselves are allocated in Fortran, so we need to make sure their deallocation is handled smartly. With bigger grids, we want to minimize copies too. Nothing is fixed now, so open to any suggestions.

regrid = Regrid(..., factors=True)  # factors is False by default
factor_list, factor_index_list = regrid.get_factors(deep_copy=True)  # deep_copy is False by default
# Preferred deallocation method for factors but __del__ will also do this.
regrid.destroy()

I am thinking it makes sense to add an option for zero-based factor indexing. Maybe a start_index option to Regrid that defaults to 1? It needs to happen in the Regrid call so we can adjust the indexing in ESMF and not in the Python layer - again to avoid copies.

from xesmf.

bekozi commented on June 6, 2024 1

This slows down the sparse MM by ~40%. See this demo. It can be fixed by copying and rearranging the memory layout, anyway.

This is a downer. I think what we'll need is a follow on issue where we adapt the regrid store interface in ESMF to return vectors for the index lists directly.

Looks good me if you don't mind adding & maintaining one more function.

Don't mind.

Even though the old ESMF.Regrid object is not referenced by any variable identifiers, the memory will not be freed (in contrast to the automatic garbage collection for normal numpy arrays).

It's hard to guarantee how the Python garbage collection will run. As usual, SO provides a good overview. This is why we introduced the destroy method. The memory will be cleaned up eventually, it may just not happen when you really need it. destroy will be called via __del__ / the destructor on the ESMF.Regrid object when gc runs.

In summary

Thanks for this.

My mistake -- field.get_area() is to compute the area and add to field.data, not directly returning the area array.

You are correct, and you are forgiven. 😄 Not like I caught it anyway.

I'm really hoping to ship these features soon.

from xesmf.

JiaweiZhuang commented on June 6, 2024

Also the weight file needs to have more complete metadata, not just numbers (can be done outside of ESMPy)

from xesmf.

jhamman commented on June 6, 2024

@JiaweiZhuang / @bekozi - has there been any update on this issue? As I understand it, we need a way to have ESMPy return in memory weights. Has that issue been raised upstream?

from xesmf.

JiaweiZhuang commented on June 6, 2024

I raised this issue a while ago; not sure if there's any progress

also @rokuingh

from xesmf.

bekozi commented on June 6, 2024

Talked with @rokuingh today, and we have a plan for moving this forward. I hope it won't be longer than a few weeks.

from xesmf.

JiaweiZhuang commented on June 6, 2024

That's great news @bekozi ! Is it planned for some patch version of 7.1.0 or 8.0.0?

from xesmf.

bekozi commented on June 6, 2024

Good question. It will be in 8 but rolled out in a beta snapshot much sooner...

from xesmf.

JiaweiZhuang commented on June 6, 2024

Glad to see progress on this important issue! I have several comments & questions:

What's in factor_list and factor_index_list? I guess factor_list is an 1D numpy array of weights (S in the offline weights file). Is factor_index_list a 2D array containing col and row, or a list of two 1D arrays? I guess it is a list so you need deep_copy, not just copy?

I would prefer just returning 3 numpy arrays and use copy :

row, col, S = regrid.get_factors(copy=True)

Those arrays will then construct a Scipy sparse matrix by scipy.sparse.coo_matrix(S, (row, col), copy=False) without copying.

Zero-based indexing: Currently in xESMF I simply subtract 1 from default grid indices. Will the start_index implementation be more computationally efficient? If not, this correction can be done in Python level, in-place, so not extra copies will be made.

My general feeling is that direct ESMF/ESMPy calls should only deal with the very core functionalities that must be done in Fortran-level. Then we rarely, rarely need to update ESMPy's API. Once a new ESMPy function like get_factors() is implemented, it can remain stable for many versions. Additional small fixes can be quickly done in the Python level, which is much easier to debug and update.

Naming: would regrid.get_weights() be more intuitive than regrid.get_factors()?
For the sake of consistency, the Field.get_area() method should probably need a copy option as well. It can indeed by done by Field.get_area().copy(), but an copy option is pretty common in numpy/scipy functions.

from xesmf.

JiaweiZhuang commented on June 6, 2024

With bigger grids, we want to minimize copies too.

A zero-copying workflow would be:

row, col, S = regrid.get_weights(copy=False)
row -= 1  # modify the Fortran array in-place. Any potential issue?
col -= 1
weights = scipy.sparse.coo_matrix(S, (row, col), copy=False)
regridder = xesmf.Regridder()
regridder.weights = weights  # will be done in class __init__ instead
# do not call regrid.destroy()

In this case, the xesmf.Regridder directly uses the internal Fortran arrays in the ESMF.Regrid object. regrid.destroy() releases the Fortran array, which will destroy the xESMF regridder as well. Is there any potential danger with this approach, even if explicit destroy() is prohibited in xESMF user API? Will other ESMF error mess up the Fortran array? If so, the default behavior should be copy=True followed by destroy(), so the numpy array for weights is decoupled from ESMF; while copy=False without destroy() is only an option to minimize copying. The eventual memory use of the two cases are actually the same; the former uses more peak memory during copying.

from xesmf.

bekozi commented on June 6, 2024

Thanks for the input! All good stuff.

Is factor_index_list a 2D array containing col and row, or a list of two 1D arrays? I guess it is a list so you need deep_copy, not just copy?

Out of habit, I prefer to be explicit between shallow and deep copies in Python even when working with NumPy arrays. However, we don't have a policy for naming these in ESMPy. I agree copy is cleaner...need to do some thinking on this...

The function will return a np.float64 ndarray with shape (nfactors,) for the factor list and an np.int32 ndarray with shape (2, nfactors) for the factor index list. The first column in the factor index list is the source index with the second column being the destination index. We'll make sure the docs are very clear about this. My head spins sometimes when trying to map Fortran onto Python and vice versa.

With ESMPy, we have tried to stay close to how ESMF implements things under the hood. Hence, the factor and factor index list will take the same form as the regrid store call in ESMF. Mostly this is an attempt to reduce duplication with the docs.

I would prefer just returning 3 numpy arrays and use copy. Those arrays will then construct a Scipy sparse matrix by scipy.sparse.coo_matrix(S, (row, col), copy=False) without copying.

We could add a convenience function or options that would do this. We may want to expose the ESMF sparse matrix utility through Python at some point. In that case, we would want the factor array forms to be compatible with ESMF.

I thought at one point to consider a get_factors_dict which would return a dictionary with keys (row, col, S) mapping to their associated values (numpy arrays). What do you think? This function would also take a copy parameter. We could definitely name this one get_weights_dict as I expect it could be the preferred entry point for xESMF.

Will the start_index implementation be more computationally efficient?

I don't really know the answer to this. My sense is that it is not worth implementing as in-place NumPy stuff is pretty fast, but it was a question worth posing at least.

My general feeling is that direct ESMF/ESMPy calls should only deal with the very core functionalities that must be done in Fortran-level. Then we rarely, rarely need to update ESMPy's API.

This. 😄 Summed up the ESMPy's design mentality. It doesn't feel Pythonic most of the time!

Naming: would regrid.get_weights() be more intuitive than regrid.get_factors()

It would. It's mostly an attempt to establish consistency with the named factor arguments to ESMF. I don't feel strongly about it though. In my mind, we should consider a get_weights method that just passes through to get_factors...

For the sake of consistency, the Field.get_area() method should probably need a copy option as well.

Good point. The copy flags to these functions are meant to be additional indicators to users that they need to be conscious of ESMF memory.

Is there any potential danger with this approach, even if explicit destroy() is prohibited in xESMF user API? Will other ESMF error mess up the Fortran array?

The safest bet is to deep copy everything as you know. Code that does not explicitly call destroy but creates numerous ESMF.Regrid(..., factors=True) objects in the same scope can get some serious memory accumulations. One option could be to introduce a context manager for ESMF.Regrid to ensure destroys are appropriately called. I hope to answer this better with additional testing.

I'm not quite sure what you mean by other ESMF errors. I think once these arrays are created they should be stable independent of other ESMF usage (in ESMF these arrays are not attached to anything). The same goes for the deallocation. I'm probably not understanding your second question.

from xesmf.

JiaweiZhuang commented on June 6, 2024

The function will return a np.float64 ndarray with shape (nfactors,) for the factor list and an np.int32 ndarray with shape (2, nfactors) for the factor index list.

We could add a convenience function or options that would do this.

(2, nfactors) looks good to me. No need to change it or add additional wrappers. The ESMPy API can be as lean as possible.

A minor concern is that factor_index_list probably has Fortran-ordering as other ESMPy output, so row = factor_index_list[0,:] and col = factor_index_list[1,:] are not contiguous in memory. This slows down the sparse MM by ~40%. See this demo. It can be fixed by copying and rearranging the memory layout, anyway.

It would. It's mostly an attempt to establish consistency with the named factor arguments to ESMF. I don't feel strongly about it though. In my mind, we should consider a get_weights method that just passes through to get_factors...

It makes perfect sense for ESMPy to stay close to ESMF. I am OK with the current name. Multiple aliases to the same method will probably confuse pure-ESMPy users.

I thought at one point to consider a get_factors_dict which would return a dictionary with keys (row, col, S)

Looks good me if you don't mind adding & maintaining one more function😀

The safest bet is to deep copy everything as you know. Code that does not explicitly call destroy but creates numerous ESMF.Regrid(..., factors=True) objects in the same scope can get some serious memory accumulations.

This is exactly what I am concerning about. I am not super familiar with ESMPy's memory management, but from what you said I guess repeated calls to regrid = ESMF.Regrid(...) without regrid.destroy() will lead to lots of memory leak. Even though the old ESMF.Regrid object is not referenced by any variable identifiers, the memory will not be freed (in contrast to the automatic garbage collection for normal numpy arrays).

If true, with the zero-copying implementation mentioned above, there will be memory leak if a user repeatedly calls regridder=xesmf.Regridder(...). With copying and explicit destroy(), the regridder will only hold normally-created numpy arrays, and when theregridder variable identifier is overwritten (i.e. points to a new regridder object), the old memory should be automatically garbage-collected.

I'm not quite sure what you mean by other ESMF errors.

I guess I more mean "mis-uses" (like the above case) than "errors"...

In summary:

Current API and naming conventions are all fine to me, as I understand that many of them come from ESMF conventions.
I don't particularly need any convenience wrappers (a stable and simple API is more important to me, not matter how it looks). But if you want to add some wrappers for pure-ESMPy users, I am happy to take advantage of them.
xESMF should make copies of the weights and then explicitly destroy the ESMF object, to safely manage memory. Directly referencing the Fortran array might be an (unrecommended) option for extremely large grids.

from xesmf.

JiaweiZhuang commented on June 6, 2024

@bekozi My mistake -- field.get_area() is to compute the area and add to field.data, not directly returning the area array. So it behaves differently from get_factors() and there is no way to add copy option...

from xesmf.

bekozi commented on June 6, 2024

Quick update. We hit an impasse with compilers that we are working through. This was not unexpected. The implementation is working with the standard GNU compiler stack but is unstable on others (i.e. Discover standard modules). The rub with this feature is that the weight arrays are dynamically allocated in Fortran complicating pointer management. @rokuingh and I are working through how best to move the arrays up to Python using Fortran's iso_c_binding module. We used a temporary solution to allow Python development to move forward.

You can see the ESMPy method for retrieving the weight dictionary on SF if you like. The good news is that deep copying out of this method will result in contiguous memory from what I can tell.

from xesmf.

JiaweiZhuang commented on June 6, 2024

Thanks very much for the work! I think all xESMF users use conda (with GNU compiler) and won't be affected by this compiler issue.

from xesmf.

JiaweiZhuang commented on June 6, 2024

@bekozi Aha, finally!!

I can use the Dockerfile without problems. Will let you know if I hit any issues.

from xesmf.

JiaweiZhuang commented on June 6, 2024

@bekozi One minor documentation thing: create_rh is not documented in ESMF.Regrid's docstring.

Simply using the one in ESMF docs should be fine:

create_rh
Specifies whether or not to create a routehandle, or just write weights to file. If not specified, defaults to ESMF_TRUE.

I assume that it should be set to False if the Regrid call is only for weight retrieval, either offline (filename = '...') or online (factors=True) .

from xesmf.

bekozi commented on June 6, 2024

One minor documentation thing: create_rh is not documented in ESMF.Regrid's docstring.

Noted! Thanks.

I assume that it should be set to False if the Regrid call is only for weight retrieval, either offline (filename = '...') or online (factors=True) .

Yes, it makes sense to use create_rh=False for best performance when returning factors. This should be noted in the documentation. ~~I'll update the test loop too.~~ Edit: It looks like this is already tested appropriately.

from xesmf.

JiaweiZhuang commented on June 6, 2024

I already have a mostly-working code for in-memory weight retrieval, but it relies on ESMF 8.0.0 that is not publicly released yet. So far I've been building the development branch of ESMF at https://github.com/JiaweiZhuang/Docker_ESMPy. Will push my new code when ESMPy/ESMF 8.0.0 becomes conda-installable.

This is of top-priority as the current offline weight approach is quite awkward and hard to test.

from xesmf.

JiaweiZhuang commented on June 6, 2024

ESMF 8.0.0 is released today and supports online weight retrieval (see release notes):

The following capabilities were added to the ESMF Python interface (ESMPy):
Added an in-memory weight generation option to the Regrid class, allowing re-use of weight vectors without writing them to netCDF. The weight arrays can be returned as NumPy objects or Python dictionary of weight vectors. This allows retrieval of the weights by source and destination key/value pairs.

from xesmf.

Handle weight files more elegantly about xesmf HOT 21 OPEN

Comments (21)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent