e3sm-project / ekat Goto Github PK

Tools and libraries for writing Kokkos-enabled HPC C++ in E3SM ecosystem

License: Other

CMake 14.47% C++ 81.40% Fortran 0.84% Awk 0.39% Python 1.51% Shell 1.38%

ekat's Introduction

EKAT Version 1.0: E3SM Kokkos Applications Toolkit

EKAT is a small library, containing tools and libraries for writing Kokkos-enabled HPC C++ in the E3SM ecosystem. Tools include C++/Fortran code (data structures, functions, macros) as well as CMake scripts.

Please, read our guidelines if you are interested in contributing to EKAT.

This project is licensed under the terms of the BOOST license. See LICENSE.txt for details.

Overview

Despite being first created with the goal of being a common implementation for common Kokkos-related kernels, EKAT evolved to contain a variety of C++ and CMake utilities. Here is a list of some of the utilities provided in EKAT.

C++ structures to enhance vectorization (Pack and Mask).
C++ structures for kernel-local scratch memory (Workspace).
C++ interface to read and store content of yaml files.
C++ routines for team-level solvers for diagonally dominant tridiagonal systems.
C++ routines for team-level efficient linear interpolation of 1d data.
Backport of some C++17 features (such as std::any), and several std-like meta-utilities.
CMake utilities for creating a suite of unit tests for multiple MPI/OpenMP rank/threads combinations.
CMake machine files to configure Kokkos for common architectures and threading framework.

Configuring EKAT

EKAT uses CMake to generate the build system. To make life easier, we provide some machine files for the setup of Kokkos properties, located in cmake/machine-files/kokkos. In particular, we provide machine files for the architecture specs, as well as machine files for the threading framework. In cmake/machine-files you can see some example o f how to combine architecture/threading machine files into a single one, which can later be used in your cmake configuration script.

For instance, the following my-mach.cmake file combines intel Skylake arch specs with a OpenMP theading backend.

set (EKAT_MACH_FILES_PATH /path/to/ekat-src/cmake/machine-files/kokkos)
include (${EKAT_MACH_FILES_PATH}/openmp.cmake)
include (${EKAT_MACH_FILES_PATH}/intel-skx.cmake)

And this do-cmake.sh script would configure EKAT for a debug build using the machine file above

#!/bin/bash

SRC_DIR=${WORK_DIR}/libs/ekat/ekat-src/branch
INSTALL_DIR=${WORK_DIR}/libs/ekat/ekat-install/branch

rm -rf CMakeFiles
rm -f  CMakeCache.txt

cmake \
  -C /path/to/my-mach.cmake                         \
  \
  -D CMAKE_BUILD_TYPE:STRING=DEBUG                  \
  -D CMAKE_CXX_COMPILER:STRING=g++                  \
  -D CMAKE_Fortran_COMPILER:STRING=gfortran         \
  -D CMAKE_INSTALL_PREFIX:PATH=${INSTALL_DIR}       \
  \
  -D EKAT_DISABLE_TPL_WARNINGS:BOOL=ON              \
  \
  -D EKAT_ENABLE_TESTS:BOOL=ON                      \
  -D EKAT_TEST_DOUBLE_PRECISION:BOOL=ON             \
  -D EKAT_TEST_SINGLE_PRECISION:BOOL=ON             \
  -D EKAT_TEST_MAX_THREADS:STRING=8                 \
  \
  ${SRC_DIR}

Here are some of EKAT's cmake config options:

EKAT_ENABLE_TESTS: enables ekat testing
EKAT_TEST_[DOUBLE|SINGLE]_PRECISION: whether to test single and/or double precision.
EKAT_TEST_MAX_[THREADS|RANKS]: maximum number of threads/ranks to use in unit tests
EKAT_TEST_THREADS_INC: increment in number of threads for each test.
EKAT_DEFAULT_BFB: in certain kernels, whether to default to a BFB, serialized, implementation.
EKAT_DISABLE_TPL_WARNINGS: whether warnings from TPLs should be disabled.
EKAT_ENABLE_VALGRIND: whether tests should be run through valgrind.
EKAT_ENABLE_CUDA_MEMCHECK: whether tests should be run through cuda-memcheck.
EKAT_ENABLE_COVERAGE: whether to enable code coverage in the compiler.

Once CMake configuration has completed, you can build and test EKAT with the usual make and ctest commands.

ekat's People

Contributors

Stargazers

Watchers

Forkers

samcom12 abagusetty externalrepositories omarkahmed mjs271

ekat's Issues

Macro EKAT_FPE never defined.

Describe the bug
The macro EKAT_FPE is never set, since the script EkatBuildEkat.cmake is not setting the EKAT_FPE option (and not setting EKAT_FPE in the config.h file).

To Reproduce
It's easy to see it from SCREAM:

hack any test, to add a 0/0 exception. E.g., double n,d; n=d=0.0; std::cout << "nan: " << n/d << "\n";
run test-all-scream -t fpe ...
tests pass. also, inspect the output in ctest-build/debug_nopack_fpe/Testing/Temporary/LastTest.log, and notice how the nan indeed happened.

Expected behavior
The EkatBuildEkat.cmake script honors cmake vars like ${PREFIX}_blah (with PREFIX configurable; PREFIX=SCREAM for scream). But this cmake var is not handled, leaving the EKAT_FPE macro never defined.

Let EkatCreateUnitTest use `mpiexec` (etc) for `nproc = 1`

Some platforms seem to require the use of mpiexec or srun even when running "serially" (nproc = 1). This is a wrinkle that drives some folks to support a fake stubbed MPI implementation for libraries like EKAT. Since we've decided not to fake MPI for serial builds, we always need it, and this means we have to accept the constraints that it poses on our target platforms.

Given this, I think we should modify EkatCreateUnitTest to use the appropriate MPI run command even for "serial" processes, since it is necessary in the general case. It's an easy modification to accommodate mpiexec--we'd just comment out the checks for nproc > 1. However, if we wanted to do it "correctly", we'd also have to detect a SLURM environment and use srun (or whatever other environments we support for E3SM).

EkatCreateUnitTest still expects the test sources, even if TEST_EXE is provided

Describe the bug
The signature of the macro is

function(EkatCreateUnitTest test_name test_srcs)

which means that the test sources are mandatory. When providing a test exec, it makes no sense to force the user to provide sources. Although possible in our current use cases, in a more general context, one might want to create the exec once in a "root" folder, then use it several times in subfolders, to create unit tests with different setup/configs, but same exec.

A possible solution would be to make sources optional, with a keyword, as in EkatCreateUnitTest(my_test SOURCES <src list> ...). This is not backward compatible, though it might be more intuitive? Another solution is to create more macros (that recycle one another):

# create exec
macro (EkatCreateUnitTestExec exec_name test_srcs)
  # create the exec (e.g., take care of catch2 stuff)
endmacro ()

# create tests
macro (EkatCreateUnitTestFromExec test_name test_exec)
  # Given a test exec, create the unit tests combination (e.g., THREADS, RANKS, valg...)
endmacro()

# create exec and tests
macro (EkatCreateUnitTest test_name test_srcs)
   EkatCreateUnitTestExec ($test_name $test_srcs <other optional args, like libs, flags>)
   EkatCreateUnitTestFromExec ($test_name $test_name <other optional args, like RANKS, THREADS, valg>)
endmacro()

This would be backward compatible.

Update to Kokkos 3.5?

Is your feature request related to a problem? Please describe.
The EAGLES computation team is using TChem for aerosol and gas chemical kinetics. TChem makes pretty aggressive use of Kokkos, and recent updates use features exposed in Kokkos 3.5.

Describe the solution you'd like
We'd like to update EKAT to use Kokkos 3.5. We've tested that Kokkos 3.5 doesn't cause issues with EKAT the way we use it (in a branch of EKAT we're using within our own repository).

Add capabilities to Comm structure

Is your feature request related to a problem? Please describe.
Right now the Comm struct is simply storing rank and size of the comm group. But it might be helpful to add some utility functions, like reduceAll, or split. This would make downstream code simpler to read (hide heavy MPI calls).

Describe the solution you'd like
Add "simple" and "common" mpi pattern utilities (reduce, broadcast, split).

Should we remove the FindAVX cmake module and its use in EkatSetCompilersFlags?

Question

All that FindAVX.cmake does is probing /proc/cpuinfo for avx extensions, and set the AVX_VERSION cmake variable accordingly. Later, if the compiler is either Intel or GNU, EkatSetCompilerFlags will add the correct avx compiler flag, depending on AVX_VERSION.

I would consider removing this feature, for the following reasons:

It only works if the node where we compile has the same architecture as the compute node.
Only works for Intel and GNU.
Kokkos_ARCH_XYZ already provides the vectorization flags, also for other compilers.
It's yet another low level detail to maintain (rather than rely on kokkos).

Add device-friendly numeric_limits-like struct

It happens often that one needs stuff from std::numeric_limits<ScalarT>, like epsilon or max. Unfortunately, those methods are not device friendly. EKAT could implement a device friendly version of those, which defaults to std::numeric_limits on host.

This would allow to avoid several nvcc warnings of the form

warning: calling a constexpr __host__ function("min") from a __host__ __device__ function("operator()") is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

Implement specialized templates for pow<2>, pow<3>, pow<4>.

Is your feature request related to a problem? Please describe.
In order to make bit-for-bit testing easier, it would be nice to have specialized implementations of the pow function for low integer exponents. In particular, see this conversation.

Describe the solution you'd like
C++ template specializations for pow<2>, pow<3>, and pow<4>. Perhaps we should include some Fortran support for these and other functions in EKAT as well.

Add Pack::set(mask, const Pack pack_true, const Pack pack_false) function.

I would like to have an if/then/else type set in order to replace
A.set(mask, B);
A.set(!mask, C);
with
A.set(mask,B,C);
From what I have have read in SIMD tutorials, this would directly translate into a SIMD call that is widely available and could default to the two assignment operators if not.

EkatCreateUnitTest: do not add npX or ompY if only one rank and/or thread configuration is requested

This is related to #151 : when calling EkatCreateUnitTest(foo ...), the names of the tests are of the form foo_ut_npX_ompY, with X and Y being the number of MPI ranks and OpenMP threads used, respectively. If we only have a single MPI rank configuration and/or a single OpenMP threading configuration, it might be easier to omit the _npX and/or the _ompY token(s). Perhaps it is still beneficial to add it if the single configuration is non-trivial (i.e., np>1 and/or omp>1). But for several tests np=1 and omp=1. It might be easier to simply produce a test called foo_ut rather than foo_ut_np1_omp1.

Hack catch2.hpp to avoid char lines that can be parsed as markdown

Catch2 can use some char lines to separate sections. I'm referring to seeing something like

blah blah
~~~~~~~~~~~~~~~~~
other stuff

However, when the AT reports output in a github PR (due to tests failures), some lines can trigger markdown parsing. In particular, lines with 3+ tilde's (~) or backticks (`) will trigger the opening of a code block. This can mess up the rendering of the message. For instance, see #E3SM-Project/scream810 : some AT messages reporting failures only show two machines (blake and weaver), but if you expand the weaver section you will notice that it also contains output from the mappy section (the missing one). The reason is that in the weaver output there was a line of ~ chars which github parses as opening a code block. Since no matching line is found, the whole msg is parsed as that code block, which therefore swallows all the msg that the AT was trying to forward to the PR.

Possible solution: hack line 15777 of catch2.hpp, to print a line of chars other than ~. We could use = or *.

Change Pack implementation to ensure constness of template arg is honored?

The way the pack is implemented now, this code

  Pack<const Real,8> p(1.0);
  p[2] = 2.0;
  std::cout << "p:";
  for (int i=0; i<8; ++i) {
    std::cout << " " << p[i];
  }
  std::cout << "\n";

compiles, and prints

p: 1 1 2 1 1 1 1 1

The reason is that, internally, Pack only deals with a non-const version of ScalarType, delegating const-correctness to the constness of *this only. While that can be helpful in keeping code complexity under control (see below), it can also cause a bit of puzzlement in new devs.

The current behavior is handy in fcns like this:

template<typename T, int n>
Pack<T,n> square (const Pack<T,n>& p) {
  Pack<T,n> p2;
  for (int i=0; i<n; ++i)
      p2[i] = p[i]*p[i];
  return p2;
}

If T=const S, for some S, the return type is ok only because Pack<T,n> stores an array of std::remove_const<T>::type. Without this trick, the correct implementation of square would be


template<typename T, int n>
Pack<typename std::remove_const<T>::type,n> square (const Pack<T,n>& p) {
  Pack<typename std::remove_const<T>::type,n> p2;
  for (int i=0; i<n; ++i)
      p2[i] = p[i]*p[i];
  return p2;
}

which is arguably a bit more complex. It would, however, avoid to accidentally change the entries of a Pack<const T,n> pack...

Note: we could also have, inside the Pack struct declaration, something like:

  using non_const_type = Pack<typename std::remove_const<T>::type,n>;

which would turn the square signature into

template<typename T, int n>
typename Pack<T,n>::non_const_type square(const Pack<T,n>& p);

Not a huge improvement, but still smaller, and perhaps more verbose.

tl;dr Allowing a Pack<const double, 8> to be modified is counter-intuitive at best, and dangerous at worst. We should avoid bugs down the road due to this.

Make host_to_device and device_to_host functions more general.

The functions require a bunch of Kokkos::Arrays. However, since they must be called from host (since they call deep_copy), there's no need to use Kokkos::Array. Any container would do. To be extremist, one could use something like

template<typename ViewT>
void host_to_device (typename ViewT::value_type::scalar_type** data,
                     ViewT* views,
                     int dim1, int dim2)

but that is a tad too C-like, maybe. A flexible implementation could be

template<typename ViewArrayT,
         typename SizeArrayT,
         typename DataArrayT>
void host_to_device (DataArrayT& data, ViewArrayT& views, SizeArrayT& sizes)

Inside the impl, we would have some checks:

using ViewT = decltype(views[0]);
using PackT = typename ViewT::non_const_value_type;
using ScalarT = typename PackT::scalar_type;
using DataT = typename std::remove_cv<typename std::remove_pointer<decltype(data[0])>::type>::type;

static_assert (std::is_same<DataT,ScalarT>::value, "Error! Incompatible data types.");
EKAT_ASSERT_MSG(views.size()==data.size(), "Error! Incompatible input sizes.");
EKAT_ASSERT_MSG(views.size()==sizes.size(), "Error! Incompatible input sizes.");
// Inside the loop:
EKAT_ASSERT_MSG(view[i].size()==pack::npack<PackT>(sizes[i])`, "Error! Input view badly sized.").

The benefit of this impl is that we can use whatever array/vector implementation is easier, as long as they provide a size() method and overload operator[].

Also, host_to_device should not allocate the views, in case the user already took notes of the stored ptrs. We may optionally allocate them if explicitly asked though, via an extra bool arg (defaulting to false).

Add parallel scan/reduce dispatch routines, that run serially under certain configurations.

Describe the solution you'd like
We implemented something like this in homme. The idea is to force the for loop into a Kokkos::single in case some macro is defined (or we could use template specialization, so that EKAT doesn't need yet another macro). Somthing like this:

template<typename DeviceType, bool Serialized = false>
struct Dispatch {
  template<typename TeamMember, typename Lambda, typename ValueType>
  void team_reduce (const TeamMember& team,
                                 const int& begin, const int& end,
                                 const Lambda& lambda,
                                           ValueType& result) {
     Kokkos::parallel_reduce(Kokkos::TeamThreadRange(team, begin, end), lambda, result);
  }
  // Similar code for thread vector range and scan
}; 

template<typename DeviceType>
struct Dispatch<DeviceType,true> {
  template<typename TeamMember, typename Lambda, typename ValueType>
  void team_reduce (const TeamMember& team,
                                 const int& begin, const int& end,
                                 const Lambda& lambda,
                                           ValueType& result) {
     Kokkos::single(Kokkos::PerTeam(team),[&] {
        result = ValueType();
        for (int k=start; k<end; ++k) {
           lambda(k, result);
        }
#ifdef KOKKOS_ENABLE_CUDA
       // Broadcast result to all threads by doing sum of one thread's
       // non-0 value and the rest of the 0s.
       Kokkos::Impl::CudaTeamMember::vector_reduce(Kokkos::Sum<ValueType>(result));
#endif
     });
     // Similarly code for thread vector range and scan
  }
};
..

Describe alternatives you've considered
Now, one needs to manually put serial/parallel logic in the code.

Additional context
See Homme's implementation for a possible impl example.

Add fopenmp-simd/qopenmp-simd to CXX flags, to use pragma omp simd

Both gcc and intel support the pragma omp simd for vectorization. This pragma is enabled with -qopenmp-simd and -fopenmp-simd for intel and gcc respectively. These flags do not require OpenMP runtime, and no other omp feature is needed/linked. These flags simply allow to use that pragma for vectorization.

Note: this should get rid of the warning on serial intel builds

externals/ekat/src/ekat/ekat_pack.hpp(182): warning #3948: simd pragma has been deprecated, and will be removed in a future release.  Please refer to release notes for details and recommended alternatives

Do not use `CMAKE_INSTALL_BINDIR` in EkatSetNvccWrapper.cmake

Describe the bug

The following output shows the problem better than I could.

CMake Warning (dev) at /home/projects/ppc64le/cmake/3.18.0/share/cmake-3.18/Modules/GNUInstallDirs.cmake:225 (message):
  Unable to determine default CMAKE_INSTALL_LIBDIR directory because no
  target architecture is known.  Please enable at least one language before
  including GNUInstallDirs.
Call Stack (most recent call first):
  /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/621/scream/externals/ekat/cmake/mpi/EkatSetNvccWrapper.cmake:23 (include)
  CMakeLists.txt:39 (EkatSetNvccWrapper)
This warning is for project developers.  Use -Wno-dev to suppress it.

To Reproduce
Should be reproducible on any GPU build.

Expected behavior
We don't want warnings in our config output.

Possible solution
The bin dir is universally ${CMAKE_INSTALL_PREFIX}/bin, so we can probably hardcode that, and avoid including GNUInstallDirs in the script.

EkatDisableAllWarning uses CMake 3.14.0+ feature

Describe the bug
The variable Fortran_COMPILER_ID used in the generator-expression was introduced in cmake 3.14.0. Hence, with older cmake versions, the command $<Fortran_COMPILER_ID:XYZ> does not evaluate to a known generator-expression.

To Reproduce

Make sure you use cmake 3.13.0 or older
Create a target that builds a F90 file.
Call EkatDisableAllWarnings(<target_name>).
Enjoy.

Expected behavior
Do not error out.

Additional info
Possible solutions:

Force CMake 3.14.0 or newer: may have impact on downstream apps, so not the best option.
Use the string expression $<STREQUAL: "${CMAKE_Fortran_COMPILER_ID}", "XYZ">: a bit clunkier than the C (and CXX) $<C_COMPILER_ID:XYZ>.

Template pack operators overload on both pack types, and deduce the return type.

The input packs have different scalar types in case of mixed precision (if some parametrization is interested in experimenting with it) and if we decide to pull the trigger on boost::units::quantity (see E3SM-Project/scream#108 ), which causes quantities with different units to be of different types.

Note: issue transferred from SCREAM repo.

Should EKAT have macros for defaults floating point behaviors?

This discussion came up during the review of #34 . The issue is the following. Some operations need to be done serially to achieve BFB. Obviously, we don't want a runtime switch, so we stick a template argument (like bool Serialize) in the template arguments. The question is now: should Ekat provide a default for Serialize, which depends on some config options? The code would look like this:

// With config-dependent defaults
#ifdef EKAT_BFB_DEFAULTS
template<typename T, bool Serialize = true>
#else
template<typename T, bool Serialize = false>
#endif
void foo (T& t) {
...
}

// Without config-dependent defaults (keep deducible template args last)
template<bool Serialize, typename T>
void bar (T& t) {
...
}

Pros:

defaults make downstream code smaller; provided you configured EKAT with the correct flags, you can call foo(); rather than foo<true>();;
people who read the code don't have to find out what the true/false tempate argument is for.

Cons:

if the library is installed, switching implementation (e.g., quickly checking lack of serialization yields a bug) requires reconfiguring rather than just recompiling (experienced cmake/make folks might simply hack a bunch of flags.make and link.txt files in the CMake directories, but it could be risky too, and/or be needed in a lot of cmake files).
if you have several template args, Serialize must be last (args with defaults must come after args with no defaults), so if you want/need to override the default at a specific location (perhaps for debugging) it might get complicated.

My original opinion was to not add defaults, and force customers to be explicit on what they want. If foo is used a lot, the app can define a wrapper that sets the Serialize argument depending on some app-specific configuration option. However, I can see that there is a value in providing the user with simpler/shorter code.

Thoughts?

Should we remove all namespaces in ekat, except ekat itself?

The ekat namespace should already be enough to encapsulate ekat's stuff in any downstream application. Is there really a point in having, say, ekat::util? After all, isn't everything in ekat a utility function/class anyways? Also the ekat::pack namespace seems to be an overkill.

The reason why I would purge nested namespaces like util is that it would somewhat help keeping downstream codes shorter. Sure, one can always issue a using ekat::util::blah/using blah = ekat::util::blah, or even `using namespace ekat::util', but it all seems overly complicated.

Note: any sort of impl or details namespace should be kept, since they have a completely different purpose (isolating stuff that the user should not try to use/access).

Mixed-precision packs

Some ill-conditioned algorithms require double precision, even in single precision configurations.

We'd like to be able to copy-construct ekat::Pack types of the same size but possibly different scalar types.

Update kokkos submodule to mirror e3sm

PR E3SM-Project/E3SM#3893 is updating the kokkos submodule, to fix an error related to kokkos use of the dl library on cray systems. The error is also visible in SCREAM (see E3SM-Project/scream#679).

We should update the kokkos submodule in ekat to point to the libdl-fix branch on the E3SM-Project/kokkos repo.

@jeff-cohere @jgfouca any reason why ekat should keep a different version than e3sm? I mean, if EKAT becomes the main provider of kokkos for e3sm, it must provide a version that satisfy e3sm needs, after all.

Add 'none()' method to Mask structure

Feature description
Add a none() method to the Mask structure, that returns true if all the entries in the mask are false, and return false otherwise.

Describe the solution you'd like
Same implementation of any(), with true and false swapped.

Describe alternatives you've considered
Strictly speaking, it does not increase the set of capabilities, since my_mask.none() would be equivalent to !my_mask.any(). However, I would argue the code would be faster to parse.

Vectorization inconsistency between packed min/max

Describe the bug
I cannot understand why min and max have different vectorization macros (one has vector_disabled, while the other has vector_simd). It appears to me as a potential bug.

To Reproduce
Take a look at the two pairs of max/min implementation (masked and unmasked) around this line.

Expected behavior
Either both vector_simd or both vector_disabled.

@jgfouca might know more about this, and either confirm or deny that this is a bug.

Physical constants

Question

What would be the best (or at least acceptable to a majority) way to handle physical constants in a consistent manner throughout scream?

This issue has come up in several recent discussions (e.g., @PeterCaldwell, @AaronDonahue ) and at least 2 issues:
haero issue #6 ( @jeff-cohere and @pbosler )
scream pr #579 (@tcclevenger, @ambrad)

For discussion:

Currently, different parameterizations often define the same constants, but with different numbers of significant figures and/or units. The ideal solution is to use NIST-defined values for each constant, defined in a single location (file or struct) that dynamics and parameterization can use/alias with local variables as needed.
Suppose people agree with that statement; where should such code live?
Physical constants have nothing to do with Kokkos, but EKAT already contains other supporting features like ekat::units that are also unrelated to performance portability, but generally useful to everything in E3SM, so I'm inclined to put physical constants in EKAT (@bartgol).
We've found that some constants require double precision (e.g., they may be very small or very large, or they may be required by possibly ill-conditioned algorithms) whereas others can be single precision. How should these requirements be handled in scream & ekat?
What should the timeline be for this work?
Should this be a forward looking c++ only change (to avoid BFB issues), or should it include fortran as well?

Add scalar-only overloads for `ekat::max`, `ekat::min`, `ekat::abs`

Is your feature request related to a problem? Please describe.
One of the nice things about EKAT is that it provides functions that should work for both packs and scalar types. Interestingly, the min, max, and abs functions offered don't seem to include cases for pairs of scalar arguments. Let's add these!

Potential enhancement to Mask interface to help avoid FPEs

As I'm cleaning up FPEs in SHOC, I'm encountering this pattern over and over (this pattern is widespread in P3 too):

Spack pack1(data1), pack2(data2);
pack2.set(pack1 != 0, val / pack1);

... which needs to be changed in over to avoid the FPE (if they are on) to:

Spack pack1(data1), pack2(data2);
const auto mask = pack1 != 0;
if (mask.any()) {
  pack2.set(mask, val / pack1);
}

Because, even if pack2 is protected from getting infs by the mask, the divide-by-zero still occurs when val/pack1 gets evaluated. The code that needed to protect against this makes the code uglier and may impact performance since it introduces another branch.

I was thinking a macro like this would be nice:

#define FPE_SAFE_SET(pack, mask, val)
  #if defined(SCREAM_FPE)
    if (mask.any()) {
      pack.set(mask, val);
    }
  #else
    pack.set(mask, val);
  #endif

Which would be used like this:

Spack pack1(data1), pack2(data2);
FPE_SAFE_SET(pack2, pack1 != 0, val / pack1);

Thoughts?

Scope out usage of CUDA as cmake language

Idea: do not rely on nvcc_wrapper for the cuda-vs-host compiler magic, but use native CMake stuff (assuming cmake version is 3.18 or higher). Possible approach: set a var CXX_LANG which evaluates to CXX for non-cuda builds, and to CUDA for cuda builds. Then manually set the language of all src files to ${CXX_LANG}, to force compilation with the proper compiler.

I speculate that this would also allow to get stuff like

target_compile_features(mylib PRIVATE cxx_constexpr)

to work correctly. I think the nvcc_wrapper layer is somehow blocking communication between the compiler and cmake, preventing correct detection of cxx features. By using native CMake support for CUDA, I suspect this could be fixed, since cmake would invoke nvcc for compiling CUDA sources, folllowed by an invoke of the host compiler on the generated intermediate cpp source.

Trim down EKAT, and restore trimmed stuff inside scream.

There is quite a bit of stuff in EKAT that really does not need to be in EKAT (e.g., the RepoState enum, or the TimeStamp class). It should be restored inside SCREAM, and removed from EKAT.

If (and only if) in the future, other E3SM apps have an interest in bringing back some common code in EKAT, we can do it. Until then, let's keep EKAT relatively slim, so that it is easier to maintain (and promote!).

SCREAM compilation fails with CMAKE version 3.11

Describe the bug
cmake_policy CMP0074 is only available with CMAKE 3.12.0 version or higher. It might fail for machines such as compy which has a cmake version of 3.11.

To Reproduce
Steps to reproduce the behavior:
Compile Scream on Compy machine (or any any machine with older CMAKE version)

Expected behavior
Model should compile with versions older than CMAKE 3.12.0
The following code if added to CMakeLists.txt fixes this issue:

IF(${CMAKE_VERSION} VERSION_GREATER_EQUAL "3.12.0")
    cmake_policy(SET CMP0074 NEW)
ENDIF(${CMAKE_VERSION} VERSION_GREATER_EQUAL "3.12.0")

Odd behavior for reciprocal units.

Describe the bug
Odd behavior for reciprocal units, but pow is working correctly.

To Reproduce
Steps to reproduce the behavior: in ekat/tests/units/units.cpp

Add REQUIRE(one/mol == pow(mol,-1));
Or (same result) REQUIRE(1/mol == pow(mol,-1));
run the test

/Users/pabosle/ekat/tests/units/units.cpp:69: FAILED:
REQUIRE( one/mol == pow(mol,-1) )
with expansion:
m s kg K A cd == mol^-1

Hard code NEW (or OLD) behavior for policy CMP0074

Is your feature request related to a problem? Please describe.
CMake 3.12+, for compatibility with older version, is ignoring env/cmake vars named _ROOT. CMake 3.18+ goes further, and issues a warning if this policy is not explicitly set. From the cmake documentation:

In CMake 3.12 and above the ``find_package(<PackageName>)`` command now
searches prefixes specified by the ``<PackageName>_ROOT`` CMake
variable and the ``<PackageName>_ROOT`` environment variable.
Package roots are maintained as a stack so nested calls to all ``find_*``
commands inside find modules and config packages also search the roots as
prefixes.  This policy provides compatibility with projects that have not been
updated to avoid using ``<PackageName>_ROOT`` variables for other purposes.

The ``OLD`` behavior for this policy is to ignore ``<PackageName>_ROOT``
variables.  The ``NEW`` behavior for this policy is to use
``<PackageName>_ROOT`` variables.

This policy was introduced in CMake version 3.12.  CMake version
3.18.0 warns when the policy is not set and uses ``OLD`` behavior.
Use the ``cmake_policy()`` command to set it to ``OLD`` or ``NEW``
explicitly.

Describe the solution you'd like
Since many test beds have plenty of env vars named BLAH_ROOT, we should explicitly set this policy. I lean on setting it to NEW, so that we can more easily let cmake find packages (it will simply detect the env var, and follow the path stored within, to find the proper installation of the package).

Describe alternatives you've considered
There really are no alternatives, since we have no control on the environment we find on our testing machines. That is, unless we change our testing scripts to purge all the BLAH_ROOT variables from the environment, but that seems overly complicated and intrusive.

Unit assign operator fails to compile

Describe the bug
Can't assign to an existingekat::units. Use case: multiply a number/unit pair LHS with a number/unit pair RHS,
for example, R_gas = Avogadro * Boltzmann (for work related to #48 )

error: object of type 'ekat::units::Units' cannot be assigned because its copy assignment operator
is implicitly deleted
m_units = lhs.m_units * rhs.m_units;
^
/Users/pabosle/ekat/src/ekat/util/ekat_units.hpp:126:27: note: copy assignment operator of 'Units' is implicitly deleted because field 'm_scaling' has no copy
assignment operator
const ScalingFactor m_scaling;

To Reproduce
Steps to reproduce the behavior: In ekat/tests/units/units.cpp

Units u1 = kg/m/pow(s,2);
Units u2 = 1/K;
Units u3 = Units::nondimensional();
u3 = u1*u2;

Expected behavior
u3 = kg/m/s^2/K

Add optional re-run argument to EkatCreateUnitTest

Sometimes, an unlucky pick of inputs might lead to a result that exceeds the tolerance of the test. We've decided that, in some cases, it's better to try re-running the test instead of instead of increasing the tolerance.

Something like this:

EkatCreateUnitTest(my_test MAX_RERUN 2 ..) # allows 2 runs

YAML utilization tool

Would it be possible to generate a tool, most likely in EKAT, that could check if all valid entries in a YAML file are used, and then issue a warning to screen if any entries are ignored?

Maybe a general approach would be to store all entries in a yaml file as a map to a bool and trigger true the first time an entry is gotten with param_list.get?

The use case I can think of is that currently the YAML parser will ignore anything that isn't requested, and in a lot of cases we use "if" statements to check if something is in the YAML file at all before taking some action. A typo in the yaml file is enough to cause the "if" statement to be false which can make a user believe something is happening when it isn't. A warning printed to screen would be enough to alert a user to the fact that they've made a mistake in the YAML file.

operator<< for pack

not KOKKOS_FUNCTION; only for host

Kokkos::Serial execution space init check won't build

Describe the bug
ekat_arch.cpp line 35 DefaultDevice::execution_space::impl_is_initialized() doesn't work with Kokkos serial.

To Reproduce
Steps to reproduce the behavior:

Configure only Kokkos::Serial
Try to build
See error

bug fix
Remove "impl" (just use DefaultDevice::execution_space::is_initialized().

PR coming.

Have Pack::set() return Pack &.

I would like to chain set() with other operators e.g.
p1 = log(p0.set(p0==0, -large_val))
p2 = exp(p1).set(mask, x0)

The set() is like operator= which returns a reference and the return value can always be ignored if not used.

Consider adding support for half-precision floating point numbers.

Is your feature request related to a problem? Please describe.
Some projects have been toying with the idea of using half-precision numbers, in addition to single- and double-precision ones. It might be nice to enable EKAT to work with these.

Describe the solution you'd like
To let EKAT work with half-precision numbers, we'd generalize some of the logic in ekat_type_traits.hpp near line 29. Instead of using a double to indicate that double precision should be used, we'd use an enum to select the half, single, or double precision. See this conversation.

Get a Mac build working

EKAT fails to build on mac; gcc has a few non-std goodies that aren't in the mac os.

With this issue, we identify and collect the barriers to an ekat/mac build, and their solutions.

Alternatives: Don't use mac. This was easier pre-covid. With virtual work and vpn hassles, it would be great to have a laptop build.

Add indexing utilities for packed arrays

Is your feature request related to a problem? Please describe.
When going back-forth from packed to unpacked quantities, one needs to continuously do index arithmetics. Besides being tedious, it is error prone.

Describe the solution you'd like
Create a set of utilities, that perform this kind of index arithmetic for you. E.g.:

template<int PackSize>
PackInfo {
   // Which pack does array_idx map to?
   KOKKOS_INLINE_FUNCTION
   static int pack_idx (const int array_length, const int array_idx);
   // What entry *within the pack* does array_idx map to?
   KOKKOS_INLINE_FUNCTION
   static int vec_idx (const int array_length, const int array_idx);
   // How many packs are needed for the array
   KOKKOS_INLINE_FUNCTION
   static int num_packs (const int array_length);
   // Does the packed array contain padding?
   KOKKOS_INLINE_FUNCTION
   static bool has_padding (const int array_length);
   ..
};

and possibly others. One would then do

using Info = PackInfo<MY_PACK_SIZE>;
...
const int pack_id = Info::pack_idx(nlev,k);
const int vec_id   = Info::vec_idx(nlev,k);
if (Info::has_padding(nlev)) {
  // handle last pack carefully (namely, unpack it)
} else {
  // keep on rolling with packed operations
}

Describe alternatives you've considered
We could template on the actual pack type (which might be easier to use), but it would be somewhat wrong (and re-compile the functions for packs of same length but different scalar type). More interesting might be to offer, in addition to the above, a compile-time versions, where the array length is a compile time constant.

New Kokkos versions don't work with EKAT

Kokkos release 3.2 (and the Kokkos master branch) fail to build within EKAT.

Newer Kokkos versions enforce case-sensitive CMake variables, split libkokkos.a into 2 (libkokkoscore.a and libkokkoscontainers.a), and change some memory traits (RandomAccess) that ekat currently uses.

Alternative: Force EKAT to use the same Kokkos SHA as scream.

Implement code for cmake option EKAT_PACK_CHECK_BOUNDS

Currently, it has no effect. It does add a macro to the scream_config.h file, but no code has CPP logic that depends on this macro.

Note: issue transferred from SCREAM repo.

Add a `sum` function to `ekat_pack.hpp`.

Is your feature request related to a problem? Please describe.
See Andrew Bradley's comment in SCREAM PR 536. It would be helpful to have a sum function that returns the sum of all elements in a pack.

Describe the solution you'd like
The proposed solution is to define the following inline function in ekat_pack.hpp:

template <typename PackType> KOKKOS_INLINE_FUNCTION
OnlyPackReturn<PackType, typename PackType::scalar> reduce_sum (const PackType& p) {
  typename PackType::scalar v(p[0]);
  vector_simd for (int i = 1; i < PackType::n; ++i) v += p[i];
  return v;
}

Additional context
Add any other context or screenshots about the feature request here.

Ability to call `scream_kerror_msg` with a string variable

I am trying to do the following using scream_kerror_msg function :

const char* errstr = "some error message";
scream_kerror_msg(errstr);

and getting the following error:

scream/components/scream/src/physics/share/physics_saturation_impl.hpp(21): error: expected a ")"
          detected during:
            instantiation of "scream::physics::Functions<ScalarT, DeviceT>::Spack scream::physics::Functions<ScalarT, DeviceT>::qv_sat(const screa\
m::physics::Functions<ScalarT, DeviceT>::Spack &, const scream::physics::Functions<ScalarT, DeviceT>::Spack &, __nv_bool, const scream::physics::F\
unctions<ScalarT, DeviceT>::Smask &, scream::physics::Functions<ScalarT, DeviceT>::SaturationFcn) [with ScalarT=scream::Real, DeviceT=scream::Defa\
ultDevice]"

Please note that the following code works:

scream_kerror_msg("Some error message");

Therefore, scream_kerror_msg is not able to take a variable as an argument.

Is your feature request related to a problem? Please describe.
A : Yes, I am trying to build an error message dynamically but this function requires an explicit string.

Describe the solution you'd like
A Ability to call this function with a variable argument

Catch2 fails to compile on Fedora Rawhide (more recent version needed)

Describe the bug
One of the EAGLES/HAERO developers encountered an issue described here. This issue has been fixed downstream, so it would be nice to get a more recent version of the Catch2 header into EKAT.

To Reproduce
Steps to reproduce the behavior:

Log into a system running Fedora Rawhide (or another system with a particularly picky constexpr-obsessed C++ compiler?)
Try to build EKAT
See error

Here's what it looks like for us:

In file included from /usr/include/signal.h:328,
                 from /home/asher/workspace/haero/ext/ekat/extern/catch2/include/catch2/catch.hpp:7712,
                 from /home/asher/workspace/haero/ext/ekat/src/ekat/util/ekat_catch_main.cpp:3:
/home/asher/workspace/haero/ext/ekat/extern/catch2/include/catch2/catch.hpp:10453:58: error: call to non-‘constexpr’ function ‘long int sysconf(int)’
10453 |     static constexpr std::size_t sigStackSize = 32768 >= MINSIGSTKSZ ? 32768 : MINSIGSTKSZ;
      |                                                          ^~~~~~~~~~~
In file included from /usr/include/bits/sigstksz.h:24,
                 from /usr/include/signal.h:328,
                 from /home/asher/workspace/haero/ext/ekat/extern/catch2/include/catch2/catch.hpp:7712,
                 from /home/asher/workspace/haero/ext/ekat/src/ekat/util/ekat_catch_main.cpp:3:
/usr/include/unistd.h:640:17: note: ‘long int sysconf(int)’ declared here
  640 | extern long int sysconf (int __name) __THROW;
      |                 ^~~~~~~

Expected behavior

No compile errors, please!

Desktop (please complete the following information):

$ uname -a
Linux fedora 5.14.10-300.fc35.x86_64 #1 SMP Thu Oct 7 20:48:44 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Additional context
Add any other context about the problem here.

Fix warning regarding calling host function from device in ekat_pack.hpp

The error looks like this:

ekat_pack.hpp(295): warning: calling a __host__ function from a __host__ __device__ function is not allowed
          detected during:
            instantiation of "ekat::pack::OnlyPack<PackType> ekat::pack::sqrt(const PackType &) [with PackType=ekat::pack::OnlyPack<ekat::pack::Pack<long, 1>>]"

The warning seems to be innocuous, but it would be nice to put in place some logic (probably involving CPP) that prevents this warning from showing up.

Floating point exception handling test fails in Mac build with AppleClang compiler.

Describe the bug

The debug_tools_ut_np1_omp1 test in the title fails when building EKAT on a Mac using Apple's Clang C/C++ compilers. The failure is related to a problem in the way we handle floating point exceptions on Macs. This compiler doesn't include OpenMP support, which is probably why there aren't more failures like this.

make test
...
        Start 105: debug_tools_ut_np1_omp1
105/107 Test #105: debug_tools_ut_np1_omp1 ..........Child aborted***Exception:   0.23 sec
...

From the testing logs:

105/107 Testing: debug_tools_ut_np1_omp1
105/107 Test: debug_tools_ut_np1_omp1
Command: "/bin/sh" "-c" "./debug_tools"
Directory: /Users/jeff/projects/sandia/EKAT/build/tests/utils
"debug_tools_ut_np1_omp1" start time: Aug 28 10:10 PDT
Output:
----------------------------------------------------------
ExecSpace name: Serial
ExecSpace initialized: yes
 avx  compiler GCC default FPE mask 0 (NONE)
 #host threads 1
ExecSpace name: Serial
ExecSpace initialized: yes
 avx  compiler GCC default FPE mask 0 (NONE)
 #host threads 1
*) testing default fpes...
 tests mask: 63
   has FE_DIVBYZERO: 1
   has FE_INVALID:   1
   has FE_OVERFLOW:  1

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
debug_tools is a Catch v2.9.2 host application.
Run with -? for options

-------------------------------------------------------------------------------
fpes
  default-fpes
-------------------------------------------------------------------------------
/Users/jeff/projects/sandia/EKAT/tests/utils/debug_tools_tests.cpp:128
...............................................................................

/Users/jeff/projects/sandia/EKAT/tests/utils/debug_tools_tests.cpp:135: FAILED:
  REQUIRE( run_fpe_tests () == num_expected_fpes )
with expansion:
  0 == 4

*) testing user-enabled fpes...
 tests mask: 59
   has FE_DIVBYZERO: 0
   has FE_INVALID:   1
   has FE_OVERFLOW:  1
-------------------------------------------------------------------------------
fpes
  user-requested-fpes
-------------------------------------------------------------------------------
/Users/jeff/projects/sandia/EKAT/tests/utils/debug_tools_tests.cpp:144
...............................................................................

/Users/jeff/projects/sandia/EKAT/tests/utils/debug_tools_tests.cpp:65: FAILED:
  REQUIRE( gSignalStatus==has_fe_divbyzero(mask) )
with expansion:
  1 == 0

/Users/jeff/projects/sandia/EKAT/tests/utils/debug_tools_tests.cpp:153: FAILED:
  REQUIRE( run_fpe_tests () == num_expected_fpes )
due to unexpected exception with message:
  Exception translation was disabled by CATCH_CONFIG_FAST_COMPILE

*) testing user-disabled fpes...
 tests mask: 4
   has FE_DIVBYZERO: 1
   has FE_INVALID:   0
   has FE_OVERFLOW:  0
  - 0/0 threw.
  - Invalid arg threw.
  - Overflow threw.
-------------------------------------------------------------------------------
fpes
  user-requested-fpes
-------------------------------------------------------------------------------
/Users/jeff/projects/sandia/EKAT/tests/utils/debug_tools_tests.cpp:162
...............................................................................

/Users/jeff/projects/sandia/EKAT/tests/utils/debug_tools_tests.cpp:171: FAILED:
  REQUIRE( run_fpe_tests () == num_expected_fpes )
with expansion:
  3 == 1

<end of output>
Test time =   0.23 sec
----------------------------------------------------------
Test Failed.
"debug_tools_ut_np1_omp1" end time: Aug 28 10:10 PDT
"debug_tools_ut_np1_omp1" time elapsed: 00:00:00
----------------------------------------------------------

This is not urgent at all--I'm just logging it here so we don't forget about it.

To Reproduce
Steps to reproduce the behavior:

Make your way into Apple's walled garden.
Configure your EKAT build as usual, enabling tests. See script below.
Type make -j to build EKAT.
Type make test to see the fireworks.

Expected behavior
In a perfect world, this test would pass!

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

OS: MacOS Catalina (10.15.6)

Additional context
Here's a script for configuring EKAT:

#!/bin/sh
PROJ_DIR=/Users/myname/where/I/work
SRC_DIR=${PROJ_DIR}/EKAT
KOKKOS_DIR=${PROJ_DIR}/scream/externals/kokkos
YAMLCPP_DIR=${PROJ_DIR}/scream/externals/yaml-cpp
rm -rf CMakeFiles
rm -f  CMakeCache.txt
cmake \
  -D CMAKE_BUILD_TYPE:STRING=DEBUG            \
  -D CMAKE_CXX_COMPILER:STRING=mpicxx         \
  -D CMAKE_Fortran_COMPILER:STRING=mpifort    \
  \
  -D EKAT_DISABLE_TPL_WARNINGS:BOOL=ON        \
  -D EKAT_ENABLE_TESTS:BOOL=ON                \
  -D EKAT_TEST_DOUBLE_PRECISION:BOOL=ON       \
  -D EKAT_TEST_SINGLE_PRECISION:BOOL=ON       \
  \
  -D Kokkos_SOURCE_DIR:PATH=${KOKKOS_DIR}     \
  -D YAMLCPP_SOURCE_DIR:PATH=${YAMLCPP_DIR}   \
  \
  -D KOKKOS_ENABLE_DEPRECATED_CODE:BOOL=OFF   \
  ${SRC_DIR}

EkatCreateUnitTest: do not add _ut suffix

Currently, when invoking EkatCreateUnitTest (foo ...), the names of the tests are of the form foo_ut_npX_ompY, with X and Y being the number of MPI ranks and OpenMP threads used, respectively. While it is understandable that we add npX_ompY, it doesn't feel right that we change the name of the test from the requested foo to foo_ut.

Add an installation (make install) target

Is your feature request related to a problem? Please describe.
It would be nice if someone who uses ekat can place its libraries/headers into the proper location, given a CMAKE_INSTALL_PREFIX. This is pretty standard practice for libraries these days.

Describe the solution you'd like
I'd like to add a make install target that uses CMake's install command to put the goods where a user wants them.

e3sm-project / ekat Goto Github PK

ekat's Introduction

EKAT Version 1.0: E3SM Kokkos Applications Toolkit

Overview

Configuring EKAT

ekat's People

Contributors

Stargazers

Watchers

Forkers

ekat's Issues

Question

Question

For discussion:

Recommend Projects

Recommend Topics

Recommend Org