Giter Club home page Giter Club logo

esma-baselibs's Introduction

GEOS-ESM

Metarepository for Organization-wide information

Please also checkout the wiki.

esma-baselibs's People

Contributors

mathomp4 avatar tclune avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar

esma-baselibs's Issues

Update config.guess in sublibraries

Testing on AWS Graviton3 showed that many of the config.guess files in Baselibs sublibraries are quite old and need updating.

Per testing:

> ##   Then due to the age of some config.guess files in the sub-libraries, I had to do:
> ##      cp config.guess antlr2/scripts/config.guess
> ##      cp config.guess gsl-2.7/config.guess
> ##      cp config.guess szlib/bin/config.guess
> ##      cp config.guess hdf4/bin/config.guess
> ##      cp config.guess hdf5/bin/config.guess
> ##      cp config.guess netcdf-cxx4/config.guess
> ##      cp config.guess nccmp/config.guess
> ##      cp config.guess hdf-eos2-3.0/config/config.guess
> ##      cp config.guess hdf-eos5-2.0/config/config.guess
> ##      cp config.guess TOOLKIT/config/config.guess

Now, I'm not the autotools expert like some so maybe I'll invoke the name of @WardF or @DennisHeimbigner here as they are the autotools gurus of netCDF and ask:

Should I just instead run autoreconf -f -v -i instead in these libraries if they are using autotools? Perhaps that is the "right" way to get a good config.guess when building on a new machine.

min os version out of date in ESMF?

ESMF_CXXCOMPILEOPTS    += -x c++ -mmacosx-version-min=10.7 -stdlib=libc++

On laptop I got link errors that imply that 10.7 should be changed to 11.6

Really an ESMF bug. (or apple)

Update to ESMF 8.5.0b12

Per @theurich:

The current HConfig work is already merged into develop, and will be part of the 8.5.0 release. Right now we are at beta tag v8.5.0b12, and that is what I'd recommend if you wanted to move to a mainline develop beta tag right now.

So we get hconfig (and a bug fix). Score!

Add Support for NVHPC

I'm going to use this as an omnibus issue to track support for NVHPC in Baselibs.

One issue was HDF5 could not build. A workaround for this was to build Open MPI without zlib support (as it was having issues with the zlib in Baselibs). So the current modules being used to build on Discover are:

comp/gcc/10.3.0
comp/nvhpc/21.5-nompi
other/localrc/nvhpc-21.5/gcc-10.3.0
mpi/openmpi/4.0.6/nvhpc-21.5-carl-nozlib

Current libraries with issues:

  • CDO
  • ESMF
  • GFE
  • FLAP → Fixed with new FLAP version (see #33)
  • SDPToolkit → Turned off for NVHPC (see #33)

cc: @tclune @cponder

Error raised by Zoltan_Malloc when running GEOSgcm

Hi Matt @mathomp4 ,
Thank you very much for preparing me a test case to run on the ESSIC server!
@sanAkel kindly taught me how to make a test run with the GEOSgcm this morning,
but My GEOsgcm exited with error when I use 6 processes to run the GEOsgcm. Could you give me some suggestions? Thank you!

Building env:

The same env as shown in (GEOS-ESM/GEOSgcm#446), which is

  • Machine:
    • Memory: 8GB
    • cpu: Intel(R) Xeon(R) Silver 4110 CPU @ 2.10GHz (** 8 Cores**)
  • Building compiler:
    • Fortran compiler: gcc-12.1.0
    • MPI: openmpi-4.1.4
  • Geosgcm:
    • Baselibs v7.5.0
    • GEOSgcm v10.22.5

Running command

(cda_suite) [cda@measures1 scratch]$ /data2/cda/pkg/openmpi-4.1.4-gcc12.1.0/bin/mpirun  -np 6 ./GEOSgcm.x

Error message

 EXTDATA: Updating R bracket for TR_LAI_FRAC
   EXTDATA:  ... file processed: ExtData/g5chem/sfc/LAI/lai_x720_y360_v72_t12_2008.nc
Zoltan_Malloc (from /data2/cda/model/ESMA-Baselibs/esmf/src/Infrastructure/Mesh/src/Zoltan/shared.c,89) No space on proc 0 - number of bytes requested = 116655624
[0] Zoltan ERROR in Zoltan_RB_Build_Structure (line 92 of /data2/cda/model/ESMA-Baselibs/esmf/src/Infrastructure/Mesh/src/Zoltan/shared.c):  Insufficient memory.
[0] Zoltan ERROR in Zoltan_RCB_Build_Structure (line 91 of /data2/cda/model/ESMA-Baselibs/esmf/src/Infrastructure/Mesh/src/Zoltan/rcb_util.c):  Error returned from Zoltan_RB_Build_Structure.
[0] Zoltan ERROR in rcb_fn (line 440 of /data2/cda/model/ESMA-Baselibs/esmf/src/Infrastructure/Mesh/src/Zoltan/rcb.c):  Error returned from Zoltan_RCB_Build_Structure.
[0] Zoltan ERROR in Zoltan_LB (line 388 of /data2/cda/model/ESMA-Baselibs/esmf/src/Infrastructure/Mesh/src/Zoltan/lb_balance.c):  Partitioning routine returned code -2.
Zoltan_Malloc (from /data2/cda/model/ESMA-Baselibs/esmf/src/Infrastructure/Mesh/src/Zoltan/shared.c,89) No space on proc 1 - number of bytes requested = 116655624
[1] Zoltan ERROR in Zoltan_RB_Build_Structure (line 92 of /data2/cda/model/ESMA-Baselibs/esmf/src/Infrastructure/Mesh/src/Zoltan/shared.c):  Insufficient memory.
[1] Zoltan ERROR in Zoltan_RCB_Build_Structure (line 91 of /data2/cda/model/ESMA-Baselibs/esmf/src/Infrastructure/Mesh/src/Zoltan/rcb_util.c):  Error returned from Zoltan_RB_Build_Structure.
[1] Zoltan ERROR in rcb_fn (line 440 of /data2/cda/model/ESMA-Baselibs/esmf/src/Infrastructure/Mesh/src/Zoltan/rcb.c):  Error returned from Zoltan_RCB_Build_Structure.
[1] Zoltan ERROR in Zoltan_LB (line 388 of /data2/cda/model/ESMA-Baselibs/esmf/src/Infrastructure/Mesh/src/Zoltan/lb_balance.c):  Partitioning routine returned code -2.
Zoltan_Malloc (from /data2/cda/model/ESMA-Baselibs/esmf/src/Infrastructure/Mesh/src/Zoltan/shared.c,89) No space on proc 2 - number of bytes requested = 116655624
[2] Zoltan ERROR in Zoltan_RB_Build_Structure (line 92 of /data2/cda/model/ESMA-Baselibs/esmf/src/Infrastructure/Mesh/src/Zoltan/shared.c):  Insufficient memory.
[2] Zoltan ERROR in Zoltan_RCB_Build_Structure (line 91 of /data2/cda/model/ESMA-Baselibs/esmf/src/Infrastructure/Mesh/src/Zoltan/rcb_util.c):  Error returned from Zoltan_RB_Build_Structure.
[2] Zoltan ERROR in rcb_fn (line 440 of /data2/cda/model/ESMA-Baselibs/esmf/src/Infrastructure/Mesh/src/Zoltan/rcb.c):  Error returned from Zoltan_RCB_Build_Structure.
[2] Zoltan ERROR in Zoltan_LB (line 388 of /data2/cda/model/ESMA-Baselibs/esmf/src/Infrastructure/Mesh/src/Zoltan/lb_balance.c):  Partitioning routine returned code -2.
Zoltan_Malloc (from /data2/cda/model/ESMA-Baselibs/esmf/src/Infrastructure/Mesh/src/Zoltan/shared.c,89) No space on proc 3 - number of bytes requested = 116655624
[3] Zoltan ERROR in Zoltan_RB_Build_Structure (line 92 of /data2/cda/model/ESMA-Baselibs/esmf/src/Infrastructure/Mesh/src/Zoltan/shared.c):  Insufficient memory.
[3] Zoltan ERROR in Zoltan_RCB_Build_Structure (line 91 of /data2/cda/model/ESMA-Baselibs/esmf/src/Infrastructure/Mesh/src/Zoltan/rcb_util.c):  Error returned from Zoltan_RB_Build_Structure.
[3] Zoltan ERROR in rcb_fn (line 440 of /data2/cda/model/ESMA-Baselibs/esmf/src/Infrastructure/Mesh/src/Zoltan/rcb.c):  Error returned from Zoltan_RCB_Build_Structure.
[3] Zoltan ERROR in Zoltan_LB (line 388 of /data2/cda/model/ESMA-Baselibs/esmf/src/Infrastructure/Mesh/src/Zoltan/lb_balance.c):  Partitioning routine returned code -2.
Zoltan_Malloc (from /data2/cda/model/ESMA-Baselibs/esmf/src/Infrastructure/Mesh/src/Zoltan/shared.c,89) No space on proc 4 - number of bytes requested = 116655624
[4] Zoltan ERROR in Zoltan_RB_Build_Structure (line 92 of /data2/cda/model/ESMA-Baselibs/esmf/src/Infrastructure/Mesh/src/Zoltan/shared.c):  Insufficient memory.
[4] Zoltan ERROR in Zoltan_RCB_Build_Structure (line 91 of /data2/cda/model/ESMA-Baselibs/esmf/src/Infrastructure/Mesh/src/Zoltan/rcb_util.c):  Error returned from Zoltan_RB_Build_Structure.
[4] Zoltan ERROR in rcb_fn (line 440 of /data2/cda/model/ESMA-Baselibs/esmf/src/Infrastructure/Mesh/src/Zoltan/rcb.c):  Error returned from Zoltan_RCB_Build_Structure.
[4] Zoltan ERROR in Zoltan_LB (line 388 of /data2/cda/model/ESMA-Baselibs/esmf/src/Infrastructure/Mesh/src/Zoltan/lb_balance.c):  Partitioning routine returned code -2.
Zoltan_Malloc (from /data2/cda/model/ESMA-Baselibs/esmf/src/Infrastructure/Mesh/src/Zoltan/shared.c,89) No space on proc 5 - number of bytes requested = 116655624
[5] Zoltan ERROR in Zoltan_RB_Build_Structure (line 92 of /data2/cda/model/ESMA-Baselibs/esmf/src/Infrastructure/Mesh/src/Zoltan/shared.c):  Insufficient memory.
[5] Zoltan ERROR in Zoltan_RCB_Build_Structure (line 91 of /data2/cda/model/ESMA-Baselibs/esmf/src/Infrastructure/Mesh/src/Zoltan/rcb_util.c):  Error returned from Zoltan_RB_Build_Structure.
[5] Zoltan ERROR in rcb_fn (line 440 of /data2/cda/model/ESMA-Baselibs/esmf/src/Infrastructure/Mesh/src/Zoltan/rcb.c):  Error returned from Zoltan_RCB_Build_Structure.
[5] Zoltan ERROR in Zoltan_LB (line 388 of /data2/cda/model/ESMA-Baselibs/esmf/src/Infrastructure/Mesh/src/Zoltan/lb_balance.c):  Partitioning routine returned code -2.
[-1] Zoltan ERROR in Zoltan_RB_Box_Assign (line 77 of /data2/cda/model/ESMA-Baselibs/esmf/src/Infrastructure/Mesh/src/Zoltan/box_assign.c):  No Decomposition Data available; use KEEP_CUTS parameter.
...<repeat [-1] lines infinitely>

My debug trials

The first thing I do is to ulimit -s unlimited,
which leads to

vim gcm_run.j	(wd: ~/geos5/test1)
(cda_suite) [cda@measures1 scratch]$ ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 31172
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) unlimited
cpu time               (seconds, -t) unlimited
max user processes              (-u) 4096
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

Then I rerun the geosgcm, and the error about Zoltan does not show anymore, but the program still stops at

 EXTDATA: Updating R bracket for TR_LAI_FRAC
   EXTDATA:  ... file processed: ExtData/g5chem/sfc/LAI/lai_x720_y360_v72_t12_2008.nc

My guess

  1. Consider my machine only has 8 GB memory, is the geosgcm failure related to this out of memory based on Zoltan's error message?
  2. Is there a way to allow me change the # of processors to run geosgcm, so the total running memory can be reduced?

Full log

You can find Full log through this dropbox link here: test1_err.log

Add neural-fortran

This is a tracking issue to adding neural-fortran to Baselibs.

The main blocker at the moment is that neural-fortran uses FetchContent when building with CMake and that is sort of anti-compute node I think. I have an issue on their repo (modern-fortran/neural-fortran#128) where I lay out some thoughts I have on how we might get this to work (though with the help of @milancurcic for sure!).

Support NVHPC

Some time soon we are likely to need to run GEOS utilizing GPUs, which therefore requires the GPI compiler.

To facilitate porting GEOS, we will need a current build of Baselibs using PGI. Some Baselibs components are known to break under PGI 19.10: gFTL-shared, and pFUnit. A patch has been released for gFTL-shared (v1.0.2+pgi.19.10), and pFUnit should simply be skipped at least for now.

No super urgency, though I'm sure the need will come fast when it comes.

Add FRE-NCtools to Baselibs

Per @sanAkel:

Good Morning, Matt, Tom,

Matt, would you please help me get following built with your latest g5_modules/Baselibs.
https://github.com/NOAA-GFDL/FRE-NCtools
This is our first step to follow up from yesterdays' meetings.

Good news is that they (GFDL) will maintain our (GEOS) specific customizations (cMake, plug/interface, etc) in MOM6 branches. In next couple of weeks we'll hash out these details so that way our code will join the `main stream' and become part of their master!

Thanks a lot for all the help!
Santha

Add zlib-ng

Per a note on HDF5, zlib-ng seems to be faster than "old" zlib. Per a note:

I have done some quick serial comparisons between “normal” zlib and zlib-ng and I get anywhere from 50% faster to 80% faster (1/2 to 1/5 the runtime) with zlib-ng. Again, not rigourous in anyway just some quick tests on some medium to very large files.

Update to curl 8.0.1

The latest version is curl 8.0.1

Note, per @Badger:

Today we celebrate curl's 25th birthday and we do this partly by releasing
curl 8.0.0. While this is a major version number bump, there is no API or ABI
breakage or change.

So hopefully all will be simple. 😄

Update config.guess

Testing on Graviton3 at AWS found that many sublibraries have a decrepit config.guess file in them. With Baselibs 7.8.0, I found these needed to be updated:

  • antlr2/scripts/config.guess
  • gsl-2.7/config.guess
  • szip/bin/config.guess
  • hdf4/bin/config.guess
  • hdf5/bin/config.guess
  • netcdf-cxx4/config.guess
  • nccmp/config.guess
  • hdf-eos2-3.0/config/config.guess
  • hdf-eos5-2.0/config/config.guess
  • TOOLKIT/config/config.guess

I think what might need to be done is Baselibs itself needs a newer config.guess file and then we can "if arm" or something to trigger copies in the .config stages of these libraries.

Update to ESMF v8.5.0b18

Per @theurich, the fixes in ESMF v8.4.1:

All of the fixes that come with patch release 8.4.1 are also available in beta tag v8.5.0b18 and newer.

...

A bug in the implementation of method ESMF_FieldGet() was fixed. The problematic code was accessing the optional, intent(out) “name” argument without the proper present() check. As a consequence, code not specifying an actual “name” argument when making this call was at risk of suffering from memory corruption issues. Due to the fact that the ESMF library internally is making calls to ESMF_FieldGet() without passing the “name” argument, it must be assumed that all user code is at risk of memory corruption issues when using ESMF 8.4.0.

will also be in v8.5.0b18.

We currently have v8.5.0b13 in ESMA-Baselibs, so once v8.5.0b18 is available in the repo, we need to update Baselibs.

Return HDF4 to essential libraries

So it turns out HDF4 is an essential library. GEOSldas uses it, so therefore, when it was removed from essentials and the CI image, then the GEOSldas will not build in CI!

7.8 Branch: Update to ESMF 8.4.1

There is a bug in ESMF 8.4.0, so we need to patch Baselibs 7.8.0 to have this update. I don't think we have ever hit it, but just in case.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.