Giter Club home page Giter Club logo

Comments (13)

mohoch1 avatar mohoch1 commented on May 26, 2024 1

Hi.

Is there any news regarding this issue?
We are experiencing similar problems.

Our application needs to perform regriding many times, and we have tracked that each usage of the Regridder causes a massive increase in memory usage, which is not released.

Has this issue come to any resolution?

Thanks

from xesmf.

JiaweiZhuang avatar JiaweiZhuang commented on May 26, 2024

Thanks for reporting this issue. I've been wanting to diagnose the memory problem for a long time, and have just taken a closer look at this.

As background knowledge, ESMPy relies on the explicit destroy() call to release the Fortran array memory, for almost every ESMF object. I have definitely released the memory after regridder construction (code), but there still seems to be uncleaned, module-level memory allocations. The next version of ESMPy (v8.0.0) adds a new ESMF.Manager().destroy() call which should further clean-up the memory.

The higher-level xesmf.Regridder API is almost just a SciPy sparse matrix, so the garbage collection would work as for normal NumPy/SciPy objects.

If you use xesmf.Regridder(src_ds, dst_ds, 'bilinear', reuse_weights=True), the memory usage will be much lower because it doesn't involve ESMPy calls.

More details

To demonstrate that the memory issue comes from the underlying ESMPy calls, consider this esmpy_memory.py script:

"""A minimum script to test ESMPy memory allocation."""
import numpy as np
import ESMF
from memory_profiler import profile


def create_grid(shape):
    grid = ESMF.Grid(np.array(shape),
                     staggerloc = ESMF.StaggerLoc.CENTER,
                     coord_sys = ESMF.CoordSys.SPH_DEG)
    
    return grid

def fill_grid(grid, lons, lats):
    lon_pointer = grid.get_coords(coord_dim=0, 
                                  staggerloc=ESMF.StaggerLoc.CENTER)
    lat_pointer = grid.get_coords(coord_dim=1, 
                                  staggerloc=ESMF.StaggerLoc.CENTER)
    lon_pointer[:] = lons
    lat_pointer[:] = lats

@profile
def test_esmpy():
    # define test grids
    lons_in, lats_in = np.meshgrid(
        np.arange(-120, 120, 0.4), 
        np.arange(-60, 60, 0.3)
        )

    lons_out, lats_out = np.meshgrid(
        np.arange(-120, 120, 0.6), 
        np.arange(-60, 60, 0.4)
        )

    # build ESMPy regridder
    sourcegrid = create_grid(lons_in.shape)
    destgrid = create_grid(lons_out.shape)
    
    fill_grid(sourcegrid, lons_in, lats_in)
    fill_grid(destgrid, lons_out, lats_out)

    sourcefield = ESMF.Field(sourcegrid)
    destfield = ESMF.Field(destgrid)

    regrid = ESMF.Regrid(sourcefield, destfield, filename=None,
                         regrid_method=ESMF.RegridMethod.BILINEAR,
                         unmapped_action=ESMF.UnmappedAction.IGNORE)

    # release underlying Fortran memory
    sourcegrid.destroy()
    destgrid.destroy()
    sourcefield.destroy()
    destfield.destroy()
    regrid.destroy()

    # de-reference Python objects
    sourcegrid = None
    destgrid = None
    sourcefield = None
    destfield = None
    regrid = None
    
    lons_in = None
    lats_in = None
    lons_out = None
    lats_out = None

if __name__ == '__main__':
    test_esmpy()

python -m memory_profiler esmpy_memory.py generates:

Filename: esmpy_memory.py

Line #    Mem usage    Increment   Line Contents
================================================
    21     59.7 MiB     59.7 MiB   @profile
    22                             def test_esmpy():
    23                                 # define test grids
    24     59.7 MiB      0.0 MiB       lons_in, lats_in = np.meshgrid(
    25     59.7 MiB      0.0 MiB           np.arange(-120, 120, 0.4), 
    26     63.6 MiB      3.8 MiB           np.arange(-60, 60, 0.3)
    27                                     )
    28                             
    29     63.6 MiB      0.0 MiB       lons_out, lats_out = np.meshgrid(
    30     63.6 MiB      0.0 MiB           np.arange(-120, 120, 0.6), 
    31     65.4 MiB      1.8 MiB           np.arange(-60, 60, 0.4)
    32                                     )
    33                             
    34                                 # build ESMPy regridder
    35     76.3 MiB     11.0 MiB       sourcegrid = create_grid(lons_in.shape)
    36     78.4 MiB      2.1 MiB       destgrid = create_grid(lons_out.shape)
    37                                 
    38     78.4 MiB      0.0 MiB       fill_grid(sourcegrid, lons_in, lats_in)
    39     78.4 MiB      0.0 MiB       fill_grid(destgrid, lons_out, lats_out)
    40                             
    41     78.4 MiB      0.0 MiB       sourcefield = ESMF.Field(sourcegrid)
    42     78.4 MiB      0.0 MiB       destfield = ESMF.Field(destgrid)
    43                             
    44     78.4 MiB      0.0 MiB       regrid = ESMF.Regrid(sourcefield, destfield, filename=None,
    45     78.4 MiB      0.0 MiB                            regrid_method=ESMF.RegridMethod.BILINEAR,
    46    434.2 MiB    355.8 MiB                            unmapped_action=ESMF.UnmappedAction.IGNORE)
    47                             
    48                                 # release underlying Fortran memory
    49    430.8 MiB      0.0 MiB       sourcegrid.destroy()
    50    430.8 MiB      0.0 MiB       destgrid.destroy()
    51    430.8 MiB      0.0 MiB       sourcefield.destroy()
    52    430.8 MiB      0.0 MiB       destfield.destroy()
    53    390.2 MiB      0.0 MiB       regrid.destroy()
    54                             
    55                                 # de-reference Python objects
    56    390.2 MiB      0.0 MiB       sourcegrid = None
    57    390.2 MiB      0.0 MiB       destgrid = None
    58    390.2 MiB      0.0 MiB       sourcefield = None
    59    390.2 MiB      0.0 MiB       destfield = None
    60    390.2 MiB      0.0 MiB       regrid = None
    61                                 
    62    388.3 MiB      0.0 MiB       lons_in = None
    63    386.5 MiB      0.0 MiB       lats_in = None
    64    385.6 MiB      0.0 MiB       lons_out = None
    65    384.7 MiB      0.0 MiB       lats_out = None

The regrid.destroy() call slightly reduced the memory usage, but not too much. This memory profiling result should be correct, as free -h or docker stats reports a similar memory usage.

I am going to test the new module-level ESMF.Manager().destroy() to see if it improves things.

from xesmf.

JiaweiZhuang avatar JiaweiZhuang commented on May 26, 2024

So it seems like ESMF.Manager().destroy() is still not implemented in the latest version of ESMF (just checked with ESMF_8_0_0_beta_snapshot_40 built by this script). Fortunately it has a __del__() method. For most objects, __del__() simply calls destroy(), for example see ESMF.Grid.

I added this extra code to the end of my original test script:

mg = ESMF.Manager()
mg.__del__()

Then, memory_profiler gives:

    69    384.5 MiB      0.0 MiB       mg = ESMF.Manager()
    70    201.6 MiB      0.0 MiB       mg.__del__()

So __del__() frees half of the memory, but still not all.

This top-level destroy also has serious side-effect: later attempts to build new regridders will lead to Segmentation fault, because we have lost connection to the Fortran internal .

from xesmf.

JiaweiZhuang avatar JiaweiZhuang commented on May 26, 2024

Still, my current suggestion is to restart the kernel and load existing weights, if memory usage becomes a problem.

I will need to check with the ESMF team on the proper use of __del__()/destroy().

from xesmf.

Plantain avatar Plantain commented on May 26, 2024

How do we restart the kernel with the xESMF API? Or should that not leak memory?

from xesmf.

JiaweiZhuang avatar JiaweiZhuang commented on May 26, 2024

How do we restart the kernel with the xESMF API?

I mean restart Python kernel, and set reuse_weights=True to load the regridder you generated previously

from xesmf.

Plantain avatar Plantain commented on May 26, 2024

That doesn't seem to behave as I expected, it still seems the regridder is never free'd.

Line #    Mem usage    Increment   Line Contents
================================================
     4   65.508 MiB   65.508 MiB   @profile
     5                             def test():
     6   65.508 MiB    0.000 MiB       src_ds = {'lat': np.arange(29.5,70.5,0.05,dtype=np.float32), 'lon': np.arange(-23.5,45.0,0.05,dtype=np.float32)} 
     7  160.383 MiB   94.875 MiB       dst_ds = xesmf.util.grid_2d(np.float32(29), np.float32(70), np.float32(0.03), np.float32(-23), np.float32(45), np.float32(0.03))
     8 1246.383 MiB 1086.000 MiB       regridder = xesmf.Regridder(src_ds, dst_ds, 'bilinear', filename="out/weights")
     9 1246.383 MiB    0.000 MiB       regridder = None
    10 1246.387 MiB    0.004 MiB       gc.collect()
    11 1270.051 MiB   23.664 MiB       regridder = xesmf.Regridder(src_ds, dst_ds, 'bilinear', reuse_weights=True, filename="out/weights")
    12 1222.730 MiB    0.000 MiB       dst_ds = None
    13 1222.730 MiB    0.000 MiB       src_ds = None
    14 1222.730 MiB    0.000 MiB       gc.collect()
    15 1222.730 MiB    0.000 MiB       print("done")

from xesmf.

JiaweiZhuang avatar JiaweiZhuang commented on May 26, 2024

@Plantain Remove the first xesmf.Regridder() call in your test script.

from xesmf.

bolliger32 avatar bolliger32 commented on May 26, 2024

@JiaweiZhuang @Plantain curious if any more work has been done on this. We just encountered this issue when trying to run repeated tasks using different regridders with reuse_weights=True. Even if we never make calls to xesmf.Regridder without reuse_weights=True, our memory use builds with each call to build a new regridder from a saved file (even if we bring the previous regridder out of the namespace, e.g. by loading each regridder to the same variable name or calling del regridder).

from xesmf.

JiaweiZhuang avatar JiaweiZhuang commented on May 26, 2024

our memory use builds with each call to build a new regridder from a saved file

The memory use increases by how much?

With reuse_weights=True, there is no call to ESMF.Regrid(), so the huge 400 MB allocation won't occur. (see #53 (comment)). 0.2.0 can still have a ~10 MB memory leak due to ESMF grid objects, but it should be fixed in 0.2.1 (9963d95)

#75 should completely solve this problem. The new load_regridder() call won't involve any call into the ESMF module at all.

from xesmf.

bolliger32 avatar bolliger32 commented on May 26, 2024

Here's an example where I load a series of regridder files and then go back to the first regridder file. And the memory use keeps expanding (for the most part). Does this seem unexpected to you?:

Line #    Mem usage    Increment   Line Contents
================================================
     4   2085.2 MiB   2085.2 MiB   def test(srtm_tile_ds, ds_out_grid, regridder_files):
     5   2085.2 MiB      0.0 MiB       gc.collect()
     6   2224.0 MiB    138.8 MiB       regridder = xesmf.Regridder(srtm_tile_ds,ds_out_grid[0][0],'bilinear', filename=str(regridder_files[0]), reuse_weights=True)
     7   2224.0 MiB      0.0 MiB       gc.collect()
     8   2321.3 MiB     97.3 MiB       regridder = xesmf.Regridder(srtm_tile_ds,ds_out_grid[0][1],'bilinear', filename=str(regridder_files[1]), reuse_weights=True)
     9   2321.3 MiB      0.0 MiB       gc.collect()
    10   2377.0 MiB     55.7 MiB       regridder = xesmf.Regridder(srtm_tile_ds,ds_out_grid[0][2],'bilinear', filename=str(regridder_files[2]), reuse_weights=True)
    11   2377.0 MiB      0.0 MiB       gc.collect()
    12   2432.6 MiB     55.6 MiB       regridder = xesmf.Regridder(srtm_tile_ds,ds_out_grid[0][3],'bilinear', filename=str(regridder_files[3]), reuse_weights=True)
    13   2432.6 MiB      0.0 MiB       gc.collect()
    14   2377.1 MiB      0.0 MiB       regridder = xesmf.Regridder(srtm_tile_ds,ds_out_grid[0][0],'bilinear', filename=str(regridder_files[0]), reuse_weights=True)
    15   2377.1 MiB      0.0 MiB       gc.collect()
    16   2488.1 MiB    111.1 MiB       regridder = xesmf.Regridder(srtm_tile_ds,ds_out_grid[0][1],'bilinear', filename=str(regridder_files[1]), reuse_weights=True)
    17   2488.1 MiB      0.0 MiB       gc.collect()
    18   2488.1 MiB      0.0 MiB       return None

from xesmf.

JiaweiZhuang avatar JiaweiZhuang commented on May 26, 2024

load a series of regridder files and then go back to the first regridder file.

Interesting that line 14 has no memory increment. If it is an ESMF memory leak, there should be a steady increment.

The problem might be related to uncleaned ESMF objects, or xarray.open_dataset when reading the weight file (e.g. pydata/xarray#2186), or Python's own garbage collection with numpy/scipy objects.

Garbage collection on numpy seems a tricky issue itself, and gc.collect() doesn't necessarily work as naively expected:
https://stackoverflow.com/questions/23977904/how-to-implement-garbage-collection-in-numpy
https://stackoverflow.com/questions/16261240/releasing-memory-of-huge-numpy-array-in-ipython

If there is still problem after #75 is implemented, then it will be an numpy/scipy/xarray issue that is out of my control.

from xesmf.

rokuingh avatar rokuingh commented on May 26, 2024

I just became aware of this issue, and thought I would chime in from the ESMPy perspective (ESMPy is the engine behind xESMF). The ESMF 8.1.0 release, expected at the end of March '21, will include a fix for a memory leak in the search algorithm of the regridding code. This may resolve the memory issues discussed in this thread. There should be a new conda package version of ESMPy 8.1.0 by the first of April.

from xesmf.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.