[Note: this project is in active development.]
This is a SUNDIALS-based demonstration application to assess and demonstrate the large-scale parallel performance of new capabilities that have been added to SUNDIALS in recent years. Namely:
-
The new SUNDIALS MPIManyVector implementation, that enables flexibility in how a solution "vector" is partitioned across computational resources e.g., CPUs and GPUs.
-
The new ARKODE multirate integration module, MRIStep, allowing high-order accurate calculations that subcycle "fast" processes within "slow" ones.
-
The new flexible SUNDIALS SUNLinearSolver interfaces, to enable streamlined use of problem specific and scalable linear solver libraries e.g., hypre, MAGMA, and NVIDIA cuSPARSE.
This code simulates a 3D nonlinear inviscid compressible Euler equation with advection and reaction of chemical species,
for independent variables
The differential equation is completed using initial condition
Here, the "solution" is given by
$w = \begin{bmatrix} \rho & \rho v_x & \rho v_y & \rho v_z & e_t & \mathbf{c} \end{bmatrix}^T = \begin{bmatrix} \rho & m_x & m_y & m_z & e_t & \mathbf{c} \end{bmatrix}^T$
corresponding to the density, momentum in the x, y, and z directions, total
energy per unit volume, and any number of chemical densities
The external force
We have the physical parameters:
-
$R$ is the specific ideal gas constant (287.14 J/kg/K), -
$c_v$ is the specific heat capacity at constant volume (717.5 J/kg/K), -
$\gamma = c_p/c_v = 1 + R/c_v$ is the ratio of specific heats (1.4),
corresponding to air (predominantly an ideal diatomic gas). The speed
of sound in the gas is then given by
The fluid variables above are non-dimensionalized; in standard SI units these would be:
-
$[\rho] = kg / m^3$ , -
$[v_x] = [v_y] = [v_z] = m/s$ , which implies$[m_x] = [m_y] = [m_z] = kg / m^2 / s$ -
$[e_t] = kg / m / s^2$ , and -
$[\mathbf{c}_i] = kg / m^3$
Note: the fluid portion of the description above follows the presentation here in sections 7.3.1 - 7.3.3.
We discretize this problem using the method of lines, where we first semi-discretize
in space using a regular finite volume grid with dimensions nx
x ny
x nz
, with
fluxes at cell faces calculated using a 5th-order FD-WENO reconstruction. MPI
parallelization is achieved using a standard 3D domain decomposition, using nprocs
MPI ranks, with layout npx
x npy
x npz
defined automatically via the
MPI_Dims_create
utility routine. The minimum size for any dimension is 3, so
to run a two-dimensional test in the yz-plane, one could specify nx = 3
and
ny = nz = 200
. When run in parallel, only "active" spatial dimensions (those
with extent greater than 3) will be parallelized.
Each fluid field (N_Vector
object on each MPI rank. Chemical species at all spatial locations over
each MPI rank are collocated into a single serial or RAJA N_Vector
object when
running on the CPU or GPU respectively. The five fluid vectors and the chemical
species vector are combined together to form the full "solution" vector MPIManyVector
N_Vector
module.
After spatial semi-discretization, we are faced with a large IVP system,
where
For non-reactive flows, the resulting initial-value problem is evolved in time using an adaptive step explicit Runge-Kutta method from the ARKStep module in ARKODE. For problems involving (typically stiff) chemical reactions, the problem may be solved using one of two approaches.
-
It may be treated as a multirate initial-value problem, that is solved using the MRIStep module in ARKODE, wherein the gas dynamics equations are evolved explicitly at the slow time scale, while the chemical kinetics are evolved at a faster time scale using a temporally-adaptive, diagonally-implicit Runge-Kutta method from the ARKStep module.
-
It may be treated using mixed implicit-explicit (IMEX) methods at a single time scale. Here, the gas dynamics equations are treated explicitly, while the chemical kinetics are treated implicitly, using an additive Runge-Kutta method from the ARKStep module.
For (1) we use SUNDIALS' modified Newton solver to handle the global nonlinear
algebraic systems arising at each implicit stage of each time step. Since only
SUNLinearSolver
implementation that solves each MPI rank-local linear system
independently. The portion of the Jacobian matrix on each rank is itself
block-diagonal. We further leverage this structure by solving each rank-local
linear system using either the sparse KLU, batched sparse NVIDIA (GPU-enabled),
or batched dense MAGMA (GPU-enabled) SUNDIALS SUNLinearSolver
implementations.
The multirate approach (2) can leverage the structure of
and all MPI communication necessary to construct the forcing functions, nx
x ny
x nz
spatially-decoupled fast IVPs. We construct a custom fast integrator that groups
all the independent fast IVPs on an MPI rank together as a single system evolved
using a rank-local ARKStep instance. The code for this custom integrator itself
is minimal, primarily consisting of steps to access the local subvectors in SUNLinearSolver
modules listed
above for linear systems that arise within the modified Newton iteration.
Steps showing the process to download this demonstration code, install the relevant dependencies, and build the code in a Linux or OS X environment are as follows. To obtain the demonstration code simply clone this repository with Git:
git clone https://github.com/sundials-codes/sundials-manyvector-demo.git
To compile the code you will need:
-
modern C and C++ compilers
-
CMake 3.12 or newer
-
the SUNDIALS library of time integrators and nonlinear solvers
-
the HDF5 high-performance data management and storage suite
Optionally, when solving problems that involve chemistry on the CPU you will need:
- the SuiteSparse library of sparse direct linear solvers (specifically KLU).
Optionally, for problems that involve chemistry that will run on GPUs you will additionally need:
-
the NVIDIA CUDA Toolkit (the
nvcc
compiler and optionally the cuSPRASE library) -
the RAJA performance portability library
-
the MAGMA dense linear solver "multicore+GPU" library
To assist in building the code the scripts directory contains shell scripts to setup the environment on specific systems and install some of the required dependencies. For example, if working on Summit the following commands may be used to setup the environment and install the necessary dependencies:
cd sundials-manyvector-demo/scripts
export PROJHOME=/css/proj/[projid]
source setup_summit.sh
./build-klu.sh
./build-raja.sh
./build-magma.sh
./build-sundials.sh
Where [projid]
is a Summit project ID. For more information on the setup and
build scripts see the README file in the scripts directory. As an
alternative, any of the dependencies for the demonstration code can be installed
with the Spack package manager e.g.,
git clone https://github.com/spack/spack.git
spack/bin/spack install mpi
spack/bin/spack install hdf5 +mpi +pic +szip
spack/bin/spack isntall suitesparse
spack/bin/spack isntall magma
spack/bin/spack install raja +cuda
spack/bin/spack install sundials +klu +mpi +raja +cuda
Once the necessary dependencies are installed, the following CMake variables can be used to configure the demonstration code build:
-
CMAKE_INSTALL_PREFIX
- the path where executables and input files should be installed e.g.,path/to/myinstall
. The executables will be installed in thebin
directory and input files in thetests
directory under the given path. -
CMAKE_C_COMPILER
- the C compiler to use e.g.,mpicc
-
CMAKE_C_FLAGS
- the C compiler flags to use e.g.,-g -O2
-
CMAKE_C_STANDARD
- the C standard to use, defaults to99
-
CMAKE_CXX_COMPILER
- the C++ compiler to use e.g.,mpicxx
-
CMAKE_CXX_FLAGS
- the C++ flags to use e.g.,-g -O2
-
CMAKE_CXX_STANDARD
- the C++ standard to use, defaults to11
-
SUNDIALS_ROOT
- the root directory of the SUNDIALS installation, defaults to the value of theSUNDIALS_ROOT
environment variable -
ENABLE_RAJA
- build with RAJA support, defaults toOFF
-
RAJA_ROOT
- the root directory of the RAJA installation, defaults to the value of theRAJA_ROOT
environment variable -
RAJA_BACKEND
- set the RAJA backend to use with the demonstration code, defaults toCUDA
-
ENABLE_HDF5
- build with HDF5 I/O support, defaults toOFF
-
HDF5_ROOT
- the root directory of the HDF5 installation, defaults to the value of theHDF5_ROOT
environment variable
When RAJA is enabled with the CUDA backend the following additional variables may also be set:
-
CMAKE_CUDA_COMPILER
- the CUDA compiler to use e.g.,nvcc
-
CMAKE_CUDA_FLAGS
- the CUDA compiler flags to use -
CMAKE_CUDA_ARCHITECTURES
- the CUDA architecture to target, defaults to70
-
ENABLE_MAGMA
- build with MAGMA linear solver support, defaults toOFF
. This requires that SUNDIALS was built with MAGMA support; CMake will automatically utilize the same MAGMA library that was used for SUNDIALS
In-source builds are not permitted and as such the code should be configured and built from a separate build directory. For example, continuing with the Summit case from above, the following commands can be used to build with RAJA targeting CUDA and HDF5 output enabled:
cd sundials-manyvector-demo
mkdir build
cd build
cmake ../. \
-DCMAKE_INSTALL_PREFIX="${MEMBERWORK}/[projid]/sundials-demo" \
-DCMAKE_C_COMPILER=mpicc \
-DCMAKE_C_FLAGS="-g -O2" \
-DCMAKE_CXX_COMPILER=mpicxx \
-DCMAKE_CXX_FLAGS="-g -O2" \
-DENABLE_RAJA="ON" \
-DENABLE_HDF5="ON"
make
make install
The test executables and input files are installed under the sundials-demo
directory in the member work space for the Summit project ID [projid]
.
Note: In this example, since the environment was configured using the Summit
setup script, the values for SUNDIALS_ROOT
, RAJA_ROOT
, and HDF5_ROOT
can
be omitted from the cmake
command as these values are automatically set from
the corresponding environment variables defined by the setup script.
Several test cases are included with the code and the necessary input files for
each case are contained in the subdirectories within the tests
directory. Each input file is internally documented to discuss all possible
input parameters (in case some have been added since this README
was last
updated).
The input files contain parameters to set up the physical problem:
-
spatial domain,
$\Omega$ --xl
,xr
,yl
,yr
,zl
,zr
-
time interval,
$[t_0, t_f]$ --t0
,tf
-
the ratio of specific heats,
$\gamma$ --gamma
-
spatial discretization dimensions --
nx
,ny
,nz
-
boundary condition types --
xlbc
,xrbc
,ylbc
,yrbc
,zlbc
,zrbc
Parameters to control the execution of the code:
-
desired cfl fraction --
cfl
(if set to zero, then the time step is chosen purely using temporal adaptivity). -
number of desired solution outputs --
nout
-
a flag to enable optional output of RMS averages for each field at the frequency specified via
nout
--showstats
Numerous parameters are also provided to control how time integration is performed (these are passed directly to ARKODE). For further information on the ARKODE solver parameters and the meaning of individual values, see the ARKODE documentation.
To specify an input file to the executable, the input filename should be
provided using the -f
flag e.g.,
<executable> -f <input_file>
Additionally, any input parameters may also be specified on the command line e.g.,
<executable> --nx=100 --ny=100 --nz=400
For example, continuing with the Summit case from above, the primordial blast test can be run on one Summit node using four cores and four GPUs with the following commands:
cd ${MEMBERWORK}/[projid]/sundials-demo/tests/primordial_blast
bsub -q debug -nnodes 1 -W 0:10 -P [projid] -Is $SHELL
jsrun -n4 -a1 -c1 -g1 ../../bin/primordial_blast_mr.exe -f input_primordial_blast_mr_gpu.txt
The bsub
command above will submit a request for an interactive job to the
debug queue allocating one node for 10 minutes with the compute time charged to
[projid]
. Once the interactive session starts the test case is launched using
the jsrun
command. Solutions are output to disk using parallel HDF5, solution
statistics are optionally output to the screen at specified frequencies, and run
statistics are printed at the end of the simulation.
The parallel HDF5 solution snapshots are written at the frequency specified by
nout
. Accompanying these output-#######.hdf5
files is an automatically
generated input file, restart_parameters.txt
that stores a complete set of
input parameters to restart the simulation from the most recently generated
output file. This is a "warm" restart, in that it will pick up the calculation
where the previous one left off, using the same initial time step size as
ARKStep would use. This restart may differ slightly from an uninterrupted run
since other internal ARKStep time adaptivity parameters cannot be reused. We
note that the restart must use the same spatial grid size and number of chemical
tracers as the original run, but it may use a different number of MPI tasks if
desired.
Individual test problems are uniquely specified through an input file and
auxiliary source code file(s) that should be linked with the main routine at
compile time. By default, all codes are built with no chemical species; however,
this may be controlled at compilation time using the NVAR
preprocessor
directive, corresponding to the number of unknowns at any spatial location.
Hence, the (default) minimum value for NVAR
is 5, so for a calculation with 4
chemical species the code should be compiled with the preprocessor directive
NVAR=9
. See src/CMakeLists.txt for examples of how to
specify NVAR
when adding a new test/executable.
The auxiliary source code files for creating a new test must contain three
functions. Each of these must return an integer flag indicating success (0) or
failure (nonzero). The initial condition function
int initial_conditions(const realtype& t, N_Vector w, const UserData& udata);
and the forcing function
int external_forces(const realtype& t, N_Vector G, const UserData& udata);
Additionally, a function must be supplied to compute/output any desired solution diagnostic information with the signature
int output_diagnostics(const realtype& t, const N_Vector w, const UserData& udata);
If no diagnostics information is desired, then this routine may just return 0.
Here, the initial_conditions
routine will be called once when the simulation
begins, external_forces
will be called on every evaluation of the ODE
right-hand side function for the Euler equations (it is assumed that this does
not require the results from (UserData::ExchangeStart
/ UserData::ExchangeEnd
), and output_diagnostics
will be called at the same
frequency as the solution is output to disk.
To add a new executable using these auxiliary source code file(s), update
src/CMakeLists.txt to include a new call to
sundemo_add_executable
in a similar manner as the existing test problems e.g.,
hurricane_yz.exe
.