Giter Club home page Giter Club logo

e3sm-unified's Issues

latest ncremap failed with vertical interpolation

This happened on cori with latest e3sm_unified. The vertical interpolation was for an EAM initial file. The error message is
/global/cfs/cdirs/e3sm/software/anaconda_envs/base/envs/e3sm_unified_1.3.1/bin/ncremap: line 2767: 60854 Segmentation fault

The command to run and reproduce the error

cd /global/cscratch1/sd/wlin/share/testVrtInt
./map_ne30np4_to_ne120np4_vert_L72.sh.  # contains the actual ncremap cmd and options

The script used to work well. Instead of using laetest e3sm_unified, if activating e3sm_unified_1.3.0, it would still work.

This is not urgent since I can stick with 1.3.0 for now. Thanks,

Deploy to a new path on acme1 LLNL

Hi @xylar,

@tomvothecoder and I are looking into a new path to deploy e3sm_unified at LLNL, so that more people at Livermore can use the e3sm_unified environment. Based on the instruction you sent for deploying e3sm_unified on acme1:

cd e3sm_supported_machine/
./deploy_e3sm_unified.py --version 1.6.0 --conda ~/miniconda3/ --release

It seems like Tom and I need to send a new path to you to update the deploy script for acme1?

Mac OSX build of e3sm-unified

I've had several recent requests for (or, more accurately some confusion about) an OSX version of e3sm-unified. All (or at least nearly all) conda-forge packages should support it. I think many CDAT as well. I don't think any (or not many) of the e3sm channel packages do.

Packages that support OSX:

  • cdat_info ==8.0
  • cdms2 ==3.1.2
  • cdtime ==3.1.2
  • cdutil ==8.0
  • genutil ==8.1
  • vtk-cdat ==8.0.1.8.0
  • dv3d ==8.0
  • vcs ==8.0
  • vcsaddons ==8.0
  • output_viewer ==1.2.5
  • e3sm_diags ==1.6.0
  • xarray ==0.11.3
  • dask ==1.1.1
  • nco ==4.7.9
  • lxml
  • sympy
  • pyproj
  • pytest
  • shapely
  • cartopy
  • progressbar2
  • pillow
  • numpy >1.13
  • scipy
  • matplotlib
  • basemap
  • blas
  • jupyter
  • nb_conda
  • ipython
  • plotly
  • bottleneck
  • hdf5 ==1.10.3
  • netcdf4 ==1.4.2
  • evtk ==1.1.1
  • f90nml
  • globus-cli
  • globus-sdk
  • mpas_analysis ==1.2
  • processflow ==2.1.1
  • tabulate
  • cmocean
  • gsw
  • libnetcdf ==4.6.1
  • livvkit ==2.1.6
  • pyflann
  • scikit-image
  • shapely
  • zstash ==0.3
  • ilamb >=2.3.1 [NEW]

ESMF_RegridWeightGen on Anvil

Perhaps as a result of recent updates on Blues, the following errors appear:

ESMF_RegridWeightGen: /usr/lib64/libgfortran.so.3: version GFORTRAN_1.4' not found (required by ESMF_RegridWeightGen) ESMF_RegridWeightGen: /usr/lib64/libstdc++.so.6: version GLIBCXX_3.4.15' not found (required by /lcrc/soft/climate/e3sm-unified/base/envs/e3sm_unified_1.2.0_py2.7_nox/bin/../lib/libesmf.so)

Enable better CI

@xylar Once the CI build is fixed, we need to work out the chicken and egg problem with CI. Currently, testing the e3sm-unified Conda metapackage depends on Conda being updated. For actual CI, we should be doing conda create using the metapackage definition in the repo or pull request.

Nightly builds do basically what we want by checking that changes in dependencies aren't breaking things. But PRs aren't testing that changes to the metapackage definition don't break anything.

A fast compromise is to just unpin the e3sm-unified version in the container definition file. That would make the nightly build do the check after the metapackage is updated. It will need to be updated for change to Python 3 anyways.

    conda create -n e3sm-unified-nox -c conda-forge -c e3sm -c cdat/label/v81 python=2.7 e3sm-unified=1.2.6 mesalib

processflow 2.0.2 hanging on both anvil/blues and rhea

On both systems, I'm attempting to run processflow.py -v. In both cases, the command hangs in a way that I'm not able to interrupt with ^C. Eventually, I have to kill the process.

On blues, I attempt to run

source /lcrc/soft/climate/e3sm-unified/base/etc/profile.d/conda.sh
conda activate e3sm_unified_1.2.1_py2.7_nox
unset LD_LIBRARY_PATH
processflow.py -v

and on rhea:

module use /ccs/proj/cli900/sw/rhea/modulefiles/all
module load e3sm-unified/1.2.1
processflow.py -v

Basemap not working in csh version

For some reason, the enviornment variable PROJ_LIB is not getting defined as part of conda activate ... in csh/tcsh/etc. (Thanks, @milenaveneziani, for pointing me to this issue.)

A solution may be to define this variable manually in the load_latest...csh script.

Add f90nml package

Add f90nml package to e3sm-unified conda environment. This package is needed to MPAS-Seaice unit testing.

module load e3sm-unified broken on theta

The e3sm-unified modules on theta seems to be screwed up now. They try to point to paths of the form /projects/ClimateEnergy_2/software/e3sm_unified/e3sm_unified_1.1.X_py2_nox/bin but now only /projects/ClimateEnergy_2/software/e3sm_unified/ exists and it only has a subdirectory called "base".

MPI_Init error running ilamb on LCRC

I'm trying to setup an ilamb run on LCRC. But a test run resulted an MPI_init error as follows:

ilamb-run --config /home/ac.zhang40/ILAMB/src/ILAMB/data/cmip.cfg --model_root /lcrc/group/e3sm/ac.zhang40/ilamb_test_data/ --regions global bona

--------------------------------------------------------------------------
PMI2_Init failed to intialize.  Return code: 14
--------------------------------------------------------------------------
--------------------------------------------------------------------------
The application appears to have been direct launched using "srun",
but OMPI was not built with SLURM's PMI support and therefore cannot
execute. There are several options for building PMI support under
SLURM, depending upon the SLURM version you are using:

  version 16.05 or later: you can use SLURM's PMIx support. This
  requires that you configure and build SLURM --with-pmix.

  Versions earlier than 16.05: you must use either SLURM's PMI-1 or
  PMI-2 support. SLURM builds PMI-1 by default, or you can manually
  install PMI-2. You must then build Open MPI using --with-pmi pointing
  to the SLURM PMI library location.

Please configure as appropriate and try again.
--------------------------------------------------------------------------
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
[chr-0369:1658739] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!

v1.3.0 not building with numpy 1.18.1

CI is currently failing because of the issue described in CDAT/cdms#384.

The issue has been fixed with a patch in build 7 of cdms2 3.1.4 but E3SM-Unified 1.3.0 isn't picking up that build (probably because of pinnings of other packages. Needs investigation unless we want to pin E3SM-Unified to numpy < 1.18.

mpi4py on HPC

mpi4py (used by ilamb) doesn’t work properly on cori and is not likely performant on other HPC.

slow on NERSC compute nodes

Hi,
My E3SM diagnostics jobs aren't running. Could the e3sm unified environment be bogging it down?

Interactive jobs on NERSC knl and haswell slow to a crawl after I load the e3sm unified environment, e.g

`salloc --nodes=1 --partition=debug --time=00:30:00 -C knl

source /global/cfs/cdirs/e3sm/software/anaconda_envs/load_latest_e3sm_unified.sh`

After this, everything slows down and my diagnostic script hangs on the import statements.

These problems do not occur on the login node.

e3sm-unified no longer supports high res analysis (cdf5 not supported)

I just went back to running mpas-analysis on the high res coupled output on theta. using the latest e3sm-unified I get the following error

Traceback (most recent call last):
  File "./compute_transects.py", line 163, in <module>
    output=args.output_filename_pattern)
  File "./compute_transects.py", line 36, in compute_transport
    mesh = xr.open_dataset(mesh)
  File "/lus/theta-fs0/projects/ccsm/acme/tools/e3sm-unified/base/envs/e3sm_unified_1.2.2_py2.7_nox/lib/python2.7/site-packages/xarray/backends/api.py", line 320, in open_dataset
    **backend_kwargs)
  File "/lus/theta-fs0/projects/ccsm/acme/tools/e3sm-unified/base/envs/e3sm_unified_1.2.2_py2.7_nox/lib/python2.7/site-packages/xarray/backends/netCDF4_.py", line 332, in open 
    ds = opener()
  File "/lus/theta-fs0/projects/ccsm/acme/tools/e3sm-unified/base/envs/e3sm_unified_1.2.2_py2.7_nox/lib/python2.7/site-packages/xarray/backends/netCDF4_.py", line 231, in _open_netcdf4_group
    ds = nc4.Dataset(filename, mode=mode, **kwargs)
  File "netCDF4/_netCDF4.pyx", line 2126, in netCDF4._netCDF4.Dataset.__init__
  File "netCDF4/_netCDF4.pyx", line 1368, in netCDF4._netCDF4._get_format
ValueError: format not supported by python interface

It seems like libnetcdf in this version doesn't support cdf5. I went back in versions as I know I've used mpas-analysis with e3sm-unified on theta before and the last version cdf5 worked was v1.2.0

Library link problems on new conda install

In a fresh conda install, I am getting link errors when trying to run NCO tools:

ncks: error while loading shared libraries: libssl.so.1.0.0: cannot open shared object file: No such file or directory

It looks like openssl is installed, but maybe ncks was built with a different version? Commands to install e3sm-unified:

conda create --name e3sm-unified python=2.7
conda activate e3sm-unified
conda install -c conda-forge -c e3sm -c cdat e3sm-unified mesalib

v. 1.1.3

Changes in the next version of e3sm-unified:

  • nco 4.7.3
  • acme_diags 1.2.1
  • mpas_analysis 0.7.0
  • processflow 1.0.1

not able to add latest processflow and e3sm_unified versions

When building, I'm getting errors like this:

conda.exceptions.UnsatisfiableError: The following specifications were found to be in conflict:
  - e3sm_diags=1.6.1

A local build of processflow and e3sm_diags appears to take care of this, so I suspect that this could be related to using an old version of conda-build. Not sure.

Change from acme to e3sm on NERSC

In
/global/project/projectdirs/e3sm/software/anaconda_envs/load_latest_e3sm_unified.sh

can you please change

source /global/cfs/cdirs/acme/software/anaconda_envs/base/etc/profile.d/conda.sh

to

source /global/cfs/cdirs/e3sm/software/anaconda_envs/base/etc/profile.d/conda.sh

cdms2 3.0.1 incompatible with latest conda-build

The conda recipe for cdma2=3.0.1 has syntax that the latest conda-build doesn't support:

Traceback (most recent call last):
  File "/home/xylar/miniconda3/lib/python3.6/site-packages/conda_build/environ.py", line 749, in get_install_actions
    actions = install_actions(prefix, index, specs, force=True)
  File "/home/xylar/miniconda3/lib/python3.6/site-packages/conda/common/io.py", line 85, in decorated
    return f(*args, **kwds)
  File "/home/xylar/miniconda3/lib/python3.6/site-packages/conda/plan.py", line 473, in install_actions
    txn = solver.solve_for_transaction(prune=prune, ignore_pinned=not pinned)
  File "/home/xylar/miniconda3/lib/python3.6/site-packages/conda/core/solve.py", line 107, in solve_for_transaction
    force_remove, force_reinstall)
  File "/home/xylar/miniconda3/lib/python3.6/site-packages/conda/core/solve.py", line 145, in solve_for_diff
    force_remove)
  File "/home/xylar/miniconda3/lib/python3.6/site-packages/conda/core/solve.py", line 242, in solve_final_state
    ssc = self._run_sat(ssc)
  File "/home/xylar/miniconda3/lib/python3.6/site-packages/conda/common/io.py", line 85, in decorated
    return f(*args, **kwds)
  File "/home/xylar/miniconda3/lib/python3.6/site-packages/conda/core/solve.py", line 465, in _run_sat
    conflicting_specs = ssc.r.get_conflicting_specs(tuple(final_environment_specs))
  File "/home/xylar/miniconda3/lib/python3.6/site-packages/conda/resolve.py", line 852, in get_conflicting_specs
    reduced_index = self.get_reduced_index(specs)
  File "/home/xylar/miniconda3/lib/python3.6/site-packages/conda/common/io.py", line 85, in decorated
    return f(*args, **kwds)
  File "/home/xylar/miniconda3/lib/python3.6/site-packages/conda/resolve.py", line 356, in get_reduced_index
    specs, features = self.verify_specs(specs)
  File "/home/xylar/miniconda3/lib/python3.6/site-packages/conda/resolve.py", line 244, in verify_specs
    raise ResolvePackageNotFound(bad_deps)
conda.exceptions.ResolvePackageNotFound: 
  - cdms2==3.0.1 -> esmf[version='>=7.1.*']
  - cdms2==3.0.1 -> esmpy[version='>=7.1.*']

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/xylar/miniconda3/bin/conda-metapackage", line 11, in <module>
    sys.exit(main())
  File "/home/xylar/miniconda3/lib/python3.6/site-packages/conda_build/cli/main_metapackage.py", line 125, in main
    return execute(sys.argv[1:])
  File "/home/xylar/miniconda3/lib/python3.6/site-packages/conda_build/cli/main_metapackage.py", line 121, in execute
    api.create_metapackage(channel_urls=channel_urls, **args.__dict__)
  File "/home/xylar/miniconda3/lib/python3.6/site-packages/conda_build/api.py", line 374, in create_metapackage
    license_name=license_name, summary=summary, config=config)
  File "/home/xylar/miniconda3/lib/python3.6/site-packages/conda_build/metapackage.py", line 29, in create_metapackage
    return build(m, config=config, need_source_download=False)
  File "/home/xylar/miniconda3/lib/python3.6/site-packages/conda_build/api.py", line 207, in build
    notest=notest, need_source_download=need_source_download, variants=variants)
  File "/home/xylar/miniconda3/lib/python3.6/site-packages/conda_build/build.py", line 2300, in build_tree
    notest=notest,
  File "/home/xylar/miniconda3/lib/python3.6/site-packages/conda_build/build.py", line 1399, in build
    raise e
  File "/home/xylar/miniconda3/lib/python3.6/site-packages/conda_build/build.py", line 1390, in build
    channel_urls=tuple(m.config.channel_urls))
  File "/home/xylar/miniconda3/lib/python3.6/site-packages/conda_build/environ.py", line 751, in get_install_actions
    raise DependencyNeedsBuildingError(exc, subdir=subdir)
conda_build.exceptions.DependencyNeedsBuildingError: Unsatisfiable dependencies for platform linux-64: {"esmpy[version='>=7.1.*']", "esmf[version='>=7.1.*']"}

The newest versions of cdms2 on conda-forge (3.1.2) don't have this problem but isn't compatible with e3sm_diags.

Tempest-extreme in e3sm-unified

Hi Xylar, I'm trying to call tempest-extreme from e3sm-unified and realized that it is not available. I think you have already made it a conda package on conda-forge. Could you make it available in the next e3sm-unified release?

python 3.7 version of e3sm-unified

Packages that support python 3.7:

  • cdat_info
  • cdms2
  • cdtime
  • cdutil
  • genutil
  • vtk-cdat
  • dv3d
  • vcs
  • vcsaddons
  • output_viewer
  • e3sm_diags ==1.6.1
  • xarray
  • dask
  • nco
  • lxml
  • sympy
  • pyproj
  • pytest
  • shapely
  • cartopy
  • progressbar2
  • pillow
  • numpy >1.13
  • scipy
  • matplotlib
  • basemap
  • blas
  • jupyter
  • nb_conda
  • ipython
  • plotly
  • bottleneck
  • hdf5 ==1.10.3
  • netcdf4 ==1.4.2
  • evtk ==1.1.1
  • f90nml
  • globus-cli
  • globus-sdk
  • mpas_analysis ==1.2
  • processflow
  • tabulate
  • cmocean
  • gsw
  • libnetcdf ==4.6.1
  • livvkit ==2.1.6
  • pyflann
  • scikit-image
  • shapely
  • zstash ==0.2
  • ilamb ==2.4 [NEW]

The ones in bold are nearly there.

cdscan fails on grib control files

cdscan has been used to merge multiple grib control files into a single file. However, it fails with the newer versions in the e3sm_unified. The follow errors appears with e3sm_unified_1.2.6 and 1.3.0.

This is likely an issue with newer cdscan version. An older version from uvcdat 2.4.1 on the other machine works fine.

The files to reproduce this issue is at cori:/global/cscratch1/sd/tang30/ECMWF_Interim/download/2014

[tang30@cori06 2014]$ cdscan -x tst.xml *.ctl
Finding common directory ...
Common directory:
Scanning files ...
UVTQPS-20140101_T255.grib.ctl
Setting reference time units to
Traceback (most recent call last):
File "/global/cfs/cdirs/e3sm/software/anaconda_envs/base/envs/e3sm_unified_1.2.6/bin/cdscan", line 1840, in
main(sys.argv)
File "/global/cfs/cdirs/e3sm/software/anaconda_envs/base/envs/e3sm_unified_1.2.6/bin/cdscan", line 1282, in main
timeIsLinear = (referenceTime[0].lower().split() in
IndexError: string index out of range

Running create_test fails with v1.9.0

If I have the latest e3sm_unified environment loaded on perlmutter, I get failures running create_test that I don't get when I do not have that environment loaded. Is this expected from how e3sm_unified was put together?

I am using 2236937c71ab0c4ef67c2574e58e01a0e46714d8 hash for the E3SM code and am running
./create_test SMS_Ln5.ne4pg2_oQU480.F2010 from cime/scripts/

If I have not loaded e3sm_unified, everything passes.
If I load e3sm_unified v1.9.0, I get the following

Using project from config_machines.xml: e3sm
create_test will do up to 1 tasks simultaneously
create_test will use up to 320 cores simultaneously
Creating test directory /pscratch/sd/b/beharrop/e3sm_scratch/pm-cpu/SMS_Ln5.ne4pg2_oQU480.F2010.pm-cpu_intel.20231016_140619_w61kkz
RUNNING TESTS:
  SMS_Ln5.ne4pg2_oQU480.F2010.pm-cpu_intel
Starting CREATE_NEWCASE for test SMS_Ln5.ne4pg2_oQU480.F2010.pm-cpu_intel with 1 procs
Finished CREATE_NEWCASE for test SMS_Ln5.ne4pg2_oQU480.F2010.pm-cpu_intel in 1.424639 seconds (PASS)
Starting XML for test SMS_Ln5.ne4pg2_oQU480.F2010.pm-cpu_intel with 1 procs
copying /pscratch/sd/b/beharrop/e3sm_scratch/pm-cpu/SMS_Ln5.ne4pg2_oQU480.F2010.pm-cpu_intel.20231016_140619_w61kkz/env_run.xml -> /pscratch/sd/b/beharrop/e3sm_scratch/pm-cpu/SMS_Ln5.ne4pg2_oQU480.F2010.pm-cpu_intel.20231016_140619_w61kkz/LockedFiles/env_run.orig.xml
Finished XML for test SMS_Ln5.ne4pg2_oQU480.F2010.pm-cpu_intel in 0.518013 seconds (PASS)
Starting SETUP for test SMS_Ln5.ne4pg2_oQU480.F2010.pm-cpu_intel with 1 procs
Finished SETUP for test SMS_Ln5.ne4pg2_oQU480.F2010.pm-cpu_intel in 7.509040 seconds (FAIL). [COMPLETED 1 of 1]
    Case dir: /pscratch/sd/b/beharrop/e3sm_scratch/pm-cpu/SMS_Ln5.ne4pg2_oQU480.F2010.pm-cpu_intel.20231016_140619_w61kkz
    Errors were:
        ERROR: Command: '/pscratch/sd/b/beharrop/temp/e3sm_code/20231016/E3SM/components/elm/bld/build-namelist -infile /pscratch/sd/b/beharrop/e3sm_scratch/pm-cpu/SMS_Ln5.ne4pg2_oQU480.F2010.pm-cpu_intel.20231016_140619_w61kkz/Buildconf/elmconf/namelist  -csmdata /global/cfs/cdirs/e3sm/inputdata -inputdata /pscratch/sd/b/beharrop/e3sm_scratch/pm-cpu/SMS_Ln5.ne4pg2_oQU480.F2010.pm-cpu_intel.20231016_140619_w61kkz/Buildconf/elm.input_data_list -ignore_ic_year -namelist " &elm_inparm  start_ymd=00010101  /" -use_case 2010_CMIP6_control  -res ne4np4.pg2  -clm_start_type default -envxml_dir /pscratch/sd/b/beharrop/e3sm_scratch/pm-cpu/SMS_Ln5.ne4pg2_oQU480.F2010.pm-cpu_intel.20231016_140619_w61kkz -l_ncpl 24 -r_ncpl 6 -lnd_frac /global/cfs/cdirs/e3sm/inputdata/share/domains/domain.lnd.ne4pg2_oQU480.200527.nc -glc_nec 0 -co2_ppmv 388.717 -co2_type diagnostic  -ncpl_base_period day  -config /pscratch/sd/b/beharrop/e3sm_scratch/pm-cpu/SMS_Ln5.ne4pg2_oQU480.F2010.pm-cpu_intel.20231016_140619_w61kkz/Buildconf/elmconf/config_cache.xml -bgc sp -mask oQU480' failed with error 'Can't locate XML/LibXML.pm in @INC (you may need to install the XML::LibXML module) (@INC contains: /pscratch/sd/b/beharrop/temp/e3sm_code/20231016/E3SM/components/elm/bld /pscratch/sd/b/beharrop/temp/e3sm_code/20231016/E3SM/components/elm/bld /pscratch/sd/b/beharrop/temp/e3sm_code/20231016/E3SM/cime/utils/perl5lib /pscratch/sd/b/beharrop/temp/e3sm_code/20231016/E3SM/cime/utils/perl5lib/Config/ /pscratch/sd/b/beharrop/temp/e3sm_code/20231016/E3SM/components/elm/bld /global/cfs/cdirs/e3sm/perl/lib/perl5-only-switch/x86_64-linux-thread-multi /global/cfs/cdirs/e3sm/perl/lib/perl5-only-switch /global/common/software/e3sm/anaconda_envs/base/envs/e3sm_unified_1.9.0_login/lib/perl5/5.32/site_perl /global/common/software/e3sm/anaconda_envs/base/envs/e3sm_unified_1.9.0_login/lib/perl5/site_perl /global/common/software/e3sm/anaconda_envs/base/envs/e3sm_unified_1.9.0_login/lib/perl5/5.32/vendor_perl /global/common/software/e3sm/anaconda_envs/base/envs/e3sm_unified_1.9.0_login/lib/perl5/vendor_perl /global/common/software/e3sm/anaconda_envs/base/envs/e3sm_unified_1.9.0_login/lib/perl5/5.32/core_perl /global/common/software/e3sm/anaconda_envs/base/envs/e3sm_unified_1.9.0_login/lib/perl5/core_perl .) at /pscratch/sd/b/beharrop/temp/e3sm_code/20231016/E3SM/cime/utils/perl5lib/Config/SetupTools.pm line 5.
        BEGIN failed--compilation aborted at /pscratch/sd/b/beharrop/temp/e3sm_code/20231016/E3SM/cime/utils/perl5lib/Config/SetupTools.pm line 5.
        Compilation failed in require at /pscratch/sd/b/beharrop/temp/e3sm_code/20231016/E3SM/components/elm/bld/ELMBuildNamelist.pm line 440.' from dir '/pscratch/sd/b/beharrop/e3sm_scratch/pm-cpu/SMS_Ln5.ne4pg2_oQU480.F2010.pm-cpu_intel.20231016_140619_w61kkz/Buildconf/elmconf'

Waiting for tests to finish
FAIL SMS_Ln5.ne4pg2_oQU480.F2010.pm-cpu_intel (phase SETUP)
    Case dir: /pscratch/sd/b/beharrop/e3sm_scratch/pm-cpu/SMS_Ln5.ne4pg2_oQU480.F2010.pm-cpu_intel.20231016_140619_w61kkz
Due to presence of batch system, create_test will exit before tests are complete.
To force create_test to wait for full completion, use --wait
test-scheduler took 10.824334383010864 seconds

Just for fun, I checked if I could run this with e3sm_unified v1.8.1, and everything passes again. Was this an expected change going from v1.8.1 to v1.9.0?

Also, for my own best practice cheat sheet, should we not have the unified environment loaded to setup or build the model?

e3sm_diags doesn't work in sysmpi environment

At least on compy, no luck with e3sm_diags in the sysmpi envrionments.

From @chengzhuzhang:

e3sm_unified_1.3.0_sysmpi: it seems this environment doesn’t work with esmf. I tested e3sm_diags in an interactive session. I got following:

“Traceback (most recent call last):
  File "/compyfs/software/e3sm-unified/base/envs/e3sm_unified_1.3.0_py3.7_sysmpi/lib/python3.7/site-packages/ESMF/interface/loadESMF.py", line 118, in <module>
    mode=ct.RTLD_GLOBAL)
  File "/compyfs/software/e3sm-unified/base/envs/e3sm_unified_1.3.0_py3.7_sysmpi/lib/python3.7/ctypes/__init__.py", line 356, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /compyfs/software/e3sm-unified/base/envs/e3sm_unified_1.3.0_py3.7_sysmpi/lib/./libmpicxx.so.12: undefined symbol: MPII_Errhandler_set_cxx
Traceback (most recent call last):
  File "/compyfs/software/e3sm-unified/base/envs/e3sm_unified_1.3.0_py3.7_sysmpi/lib/python3.7/site-packages/ESMF/interface/loadESMF.py", line 118, in <module>
    mode=ct.RTLD_GLOBAL)
  File "/compyfs/software/e3sm-unified/base/envs/e3sm_unified_1.3.0_py3.7_sysmpi/lib/python3.7/ctypes/__init__.py", line 356, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /compyfs/software/e3sm-unified/base/envs/e3sm_unified_1.3.0_py3.7_sysmpi/lib/./libmpicxx.so.12: undefined symbol: MPII_Errhandler_set_cxx”

The diagnostics run was completed any way because cdms is able to pick up another regridder in this case.

These errors likely result from the custom-built esmf I included in this environment to remove MPI dependence. It may be that no packages that rely on esmf will work in the sysmpi environment or the problem may just lie with esmpy. More testing is needed.

Add the processflow to unified

I would like to add the latest version of the processflow to the next version of the e3sm-unified. Please let me know what you need to get it into the package.

How to use e3sm after activating through conda

I have installed e3sm using conda ( conda create -n e3sm-unified -c conda-forge -c e3sm e3sm-unified).
Activated using conda activate e3sm-unifed..
After that, I am not getting how to start running model...Checked /home/ssk/software/anaconda3/envs/e3sm-unified
it contains directories (like bin,cmake,include etc... attahced)
About the machine:
HPC : OS:
Screenshot 2024-07-25 171739
centos
Could you please suggest the possible way.

many packages in e3sm-unified are cobbering other packages

I have turned on the conda option path_conflict: prevent and discovered to my horror just now many packages in e3sm-unified are overwriting other packages. Guilty parties include:

This is kind of a mess and could lead to a lot of broken packages. I will try to work with the makers of these packages to address these conflicts but will have to ignore them for the time being in order to make headway.

Build failing due to Dask dependency

I saw that the Travis CI build was failing so I triggered a no-change build to confirm. This error happens when trying to activate the environment:

ImportError: Dask's distributed scheduler is not installed.

Please either conda or pip install dask distributed:

  conda install dask distributed          # either conda install
  pip install dask distributed --upgrade  # or pip install
3sm_diags help failed\n

e3sm-unified 1.2.3 issue on NERSC

I tried using e3sm-unified for MPAS-Analysis on NERSC this morning and am getting an error, see below

lvroekel:MPAS-Analysis$ module load e3sm-unified/1.2.3
lvroekel:MPAS-Analysis$ ./run_mpas_analysis --list
Traceback (most recent call last):
  File "./run_mpas_analysis", line 37, in <module>
    from mpas_analysis.shared.plot.plotting import _register_custom_colormaps, \
  File "/global/u2/l/lvroekel/MPAS-Analysis/mpas_analysis/shared/plot/plotting.py", line 30, in <module>
    from mpl_toolkits.basemap import Basemap
  File "/global/project/projectdirs/acme/software/anaconda_envs/edison/base/envs/e3sm_unified_1.2.3_py2.7_nox/lib/python2.7/site-packages/mpl_toolkits/basemap/__init__.py", line 155, in <module>
    pyproj_datadir = os.environ['PROJ_LIB']
  File "/global/project/projectdirs/acme/software/anaconda_envs/edison/base/envs/e3sm_unified_1.2.3_py2.7_nox/lib/python2.7/UserDict.py", line 40, in __getitem__
    raise KeyError(key)
KeyError: 'PROJ_LIB'

e3sm-unified/1.2.2 works fine. If this issue is more appropriate for the MPAS-analysis repo, let me know.

Permission denied activating e3sm_unified on perlmutter

source /global/common/software/e3sm/anaconda_envs/load_latest_e3sm_unified_pm-cpu.sh get the following error mess

PermissionError: [Errno 13] Permission denied: '/global/common/software/e3sm/anaconda_envs/base/lib/python3.8/site-packages/idna/__init__.py'

wlin@perlmutter:login16:~> ls -l /global/common/software/e3sm/anaconda_envs/base/lib/python3.8/site-packages/boltons/__init__.py
-rw-rw---- 2 xylar xylar 0 Mar 31 18:55 /global/common/software/e3sm/anaconda_envs/base/lib/python3.8/site-packages/boltons/__init__.py

problem packages for e3sm-unified 1.2.6

A number of packages are giving me trouble for e3sm-unified 1.2.6:

  • output_viewer - no conda-forge build for python 3.7. Need this PR merged: conda-forge/output_viewer-feedstock#4. This is also a dependency of e3sm_diags, which is a dependency of processflow, so neither of those can be included without this fix.
  • thermo - conflicting package in conda-forge channel is taking priority. This package probably needs to be renamed. Issue here: CDAT/thermo#11. Dependency of cdat, so that also currently cannot be included. Moved to vcsaddons by @doutriaux1, thanks!
  • globus-cli - dependency jmespath=0.9.2 not build for python 3.7. Pull request in here: conda-forge/jmespath-feedstock#7. Issue requesting an update in dependency here: globus/globus-cli#461
  • cdp - built incorrectly with pinned python. Fixed via: conda-forge/cdp-feedstock#7
  • processflow - depends on globus-cli 1.1.2 and globus-sdk 1.1.1, neither of which is available on conda-forge. With strict channel priority (needed to make builds work with the latest conda), these versions from the e3sm channel are no longer available to e3sm-unified. Here's the build of globus-sdk 1.1.1: conda-forge/globus-sdk-feedstock#8 and of globus-cli 1.1.2: conda-forge/globus-cli-feedstock#6
  • ncl vs. basemap - ncl is a dependency of processflow; basemap is a dependency of ilamb, mpas-analysis and cdat. ncl is not compatible with pyproj>=1.9.6,<2.0.0 and proj4>=5.2.0,<6.0.0 for any build with hdf5=1.10.5. basemap has not yet been released with a version compatible with pyproj>=2.0.0 and proj4>=6.0.0. A request was made over a year ago to have a new release of basemap with no luck: matplotlib/basemap#405. This issue was addressed with a new release of processflow without the ncl dependency. Users will need to load the local system version manually.
  • hdf5=1.10.5 - Not a show-stopper but a pain is the dependence of various packages on the cdat/label/v81 channel on hdf5=1.10.4, whereas conda-forge packages are being built with hdf5=1.10.5. I had to rebuild mpas_tools=0.0.3 and nco=4.8.1 with this older version. A request to have cdat/label/v81 packages rebuilt with the more recent hdf5 is here: CDAT/cdat#2234

Update: I tried using hdf5=1.10.5 this morning and it "just worked"! But there's a remaining incompatibility between basemap and ncl that I don't have a good solution for.

cdat using old libnetcdf

when I install the latest e3sm-unified (v 1.2.1), I see:

python -c "import vcs"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/xylar/test_e3sm_unified/base/envs/e3sm_unified_1.2.1_py2.7_nox/lib/python2.7/site-packages/vcs/__init__.py", line 102, in <module>
    from .utils import *  # noqa
  File "/home/xylar/test_e3sm_unified/base/envs/e3sm_unified_1.2.1_py2.7_nox/lib/python2.7/site-packages/vcs/utils.py", line 20, in <module>
    from . import boxfill
  File "/home/xylar/test_e3sm_unified/base/envs/e3sm_unified_1.2.1_py2.7_nox/lib/python2.7/site-packages/vcs/boxfill.py", line 20, in <module>
    from . import VCS_validation_functions
  File "/home/xylar/test_e3sm_unified/base/envs/e3sm_unified_1.2.1_py2.7_nox/lib/python2.7/site-packages/vcs/VCS_validation_functions.py", line 5, in <module>
    import genutil
  File "/home/xylar/test_e3sm_unified/base/envs/e3sm_unified_1.2.1_py2.7_nox/lib/python2.7/site-packages/genutil/__init__.py", line 5, in <module>
    from .grower import grower  # noqa
  File "/home/xylar/test_e3sm_unified/base/envs/e3sm_unified_1.2.1_py2.7_nox/lib/python2.7/site-packages/genutil/grower.py", line 3, in <module>
    import cdms2 as cdms
  File "/home/xylar/test_e3sm_unified/base/envs/e3sm_unified_1.2.1_py2.7_nox/lib/python2.7/site-packages/cdms2/__init__.py", line 33, in <module>
    from .dataset import createDataset, openDataset, useNetcdf3  # noqa
  File "/home/xylar/test_e3sm_unified/base/envs/e3sm_unified_1.2.1_py2.7_nox/lib/python2.7/site-packages/cdms2/dataset.py", line 7, in <module>
    from . import Cdunif
ImportError: libnetcdf.so.11: cannot open shared object file: No such file or directory

If I list the lib directory, the version actually present is 13, not 11:

ls /home/xylar/test_e3sm_unified/base/envs/e3sm_unified_1.2.1_py2.7_nox/lib/libnet*
/home/xylar/test_e3sm_unified/base/envs/e3sm_unified_1.2.1_py2.7_nox/lib/libnetcdf.a
/home/xylar/test_e3sm_unified/base/envs/e3sm_unified_1.2.1_py2.7_nox/lib/libnetcdff.a
/home/xylar/test_e3sm_unified/base/envs/e3sm_unified_1.2.1_py2.7_nox/lib/libnetcdff.so
/home/xylar/test_e3sm_unified/base/envs/e3sm_unified_1.2.1_py2.7_nox/lib/libnetcdff.so.6
/home/xylar/test_e3sm_unified/base/envs/e3sm_unified_1.2.1_py2.7_nox/lib/libnetcdff.so.6.1.1
/home/xylar/test_e3sm_unified/base/envs/e3sm_unified_1.2.1_py2.7_nox/lib/libnetcdf.settings
/home/xylar/test_e3sm_unified/base/envs/e3sm_unified_1.2.1_py2.7_nox/lib/libnetcdf.so
/home/xylar/test_e3sm_unified/base/envs/e3sm_unified_1.2.1_py2.7_nox/lib/libnetcdf.so.13

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.