oden-eag / hp3d Goto Github PK

View Code? Open in Web Editor NEW

15.0 2.0 2.0 32.58 MB

MPI/OpenMP hp-adaptive 3D finite element software

Home Page: https://oden.utexas.edu

License: Other

Shell 1.82% Perl 0.51% C 0.50% C++ 0.12% Makefile 0.87% Python 0.95% Fortran 95.23%

computational-science fem finite-elements high-order hpc multiphysics hp-adaptivity scientific-computing

hp3d's Introduction

hp3D

A Scalable MPI/OpenMP hp-Adaptive Finite Element Software Library for Complex Multiphysics Applications

Downloading the library

Clone the repository

via HTTPS: git clone https://github.com/Oden-EAG/hp3d.git
via SSH: git clone [email protected]:Oden-EAG/hp3d.git

Access the main directory: cd hp3d/trunk

Compiling the library

Create m_options file in hp3d/trunk/: Copy one of the existing m_options files from hp3d/trunk/m_options_files/ into hp3d/trunk/. For example: cp m_options_files/m_options_linux ./m_options
Modify m_options file to set the correct path to the main directory: Set the HP3D_BASE_PATH to the path of the hp3d/trunk/
To compile the library, type make in hp3d/trunk/. Before compiling, you must link to the external libraries and set compiler options by modifying the m_options file as described below.

Note: We recommend configuring PETSc with all of hp3D's dependencies (see below). Then, the default m_options files m_options_files/m_options_linux and m_options_files/m_options_macos only require setting the corresponding values of PETSC_DIR and PETSC_ARCH to link to the required external libraries.

Linking to external libraries

The m_options file must link to the correct paths for external libraries. The following external libraries are used:

Intel MKL [optional]
X11 [optional]
PETSc (all following packages can be installed with PETSc)
HDF5/pHDF5
MUMPS
Metis/ParMetis
Scotch/PT-Scotch
PORD
Zoltan

For example, assuming MPI libraries are installed, PETSc configure may look like this:

./configure --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90 --with-mpiexec=mpirun \
            --download-fblaslapack=yes \
            --download-scalapack=yes \
            --download-mumps=yes \
            --download-metis=yes \
            --download-parmetis=yes \
            --download-ptscotch=yes \
            --download-zoltan=yes \
            --download-hdf5=yes \
            --with-hdf5-fortran-bindings=1 \
            --with-shared-libraries=0 \
            --with-debugging=0 \
            --with-scalar-type=real \
            --PETSC_ARCH=arch-debian-real

Note: PETSc can also install MPI libraries if needed, e.g. --download-openmpi=yes or --download-mpich=yes.

Compiler options

Compilation is governed by preprocessing flags HP3D_COMPLEX and HP3D_DEBUG.

HP3D_COMPLEX = 0 , stiffness matrix, load vector(s) and solution DOFs are real-valued
HP3D_COMPLEX = 1 , stiffness matrix, load vector(s) and solution DOFs are complex-valued
HP3D_DEBUG = 0 , compiler uses optimization flags, and the library performs only minimal checks during the computation
HP3D_DEBUG = 1 , compiler uses debug flags, and the library performs additional checks during the computation

Library will be created under either hp3d/trunk/complex/ or hp3d/trunk/real/.

Additional preprocessing flags for enabling/disabling dependencies on third-party libraries:

HP3D_USE_INTEL_MKL = 0/1 , disable/enable dependency on Intel MKL package
HP3D_USE_MPI_F08 = 0/1 , disable/enable MPI Fortran 2008 binding (module mpi_f08)
HP3D_USE_OPENMP = 0/1 , disable/enable OpenMP threading
HP3D_USE_X11 = 0/1 , disable/enable dependency on X11

Verifying build

In addition to the default make that builds and installs the hp3D library, the makefile provides various targets which can be viewed via make help. For example, use make check to run a quick check after building the library, or run more extensive tests using make test.

Compiling a problem

Projects are implemented in hp3d/trunk/problems/. A few projects have been implemented and can serve as an example. For example, /problems/POISSON/GALERKIN/ is a Galerkin implementation for the classical variational Poisson problem. To compile and run the problem, type make in the project folder, i.e., cd problems/POISSON/GALERKIN; make; ./run.sh.

Citing hp3D

Please add the following citation to any paper, technical report, or article that incorporated the hp3D library:

@article{hp2024joss,
         Author = {Henneking, Stefan and Petrides, Socratis and Fuentes, Federico and Badger, Jacob and Demkowicz, Leszek},
         Title = {{$hp$3D: A Scalable MPI/OpenMP $hp$-Adaptive Finite Element Software Library for Complex Multiphysics Applications}},
         Publisher = {The Open Journal},
         Journal = {Journal of Open Source Software},
         Year = {2024},
         Volume = {9}, 
         Number = {95}, 
         Pages = {5946},
         doi = {10.21105/joss.05946}}

And, optionally,

@article{hpUserManual,
         Author = {Henneking, Stefan and Demkowicz, Leszek},
         Title = {{$hp$3D User Manual}},
         Journal={arXiv preprint arXiv:2207.12211},
         Year = {2022}}

@book{hpBook3,
      Author = {Henneking, Stefan and Demkowicz, Leszek and Petrides, Socratis and Fuentes, Federico and Keith, Brendan and Gatto, Paolo},
      Title = {{Computing with $hp$ Finite Elements. III. Parallel $hp$3D Code}},
      Publisher = {In preparation},
      Year = {2024}}

@book{hpBook2,
      Author = {Demkowicz, Leszek and Kurtz, Jason and Pardo, David and Paszy\'{n}ski, Maciej and Rachowicz, Waldemar and Zdunek, Adam},
      Title = {{Computing with $hp$ Finite Elements. II. Frontiers: Three-Dimensional Elliptic and Maxwell Problems with Applications}},
      Publisher = {Chapman \& Hall/CRC},
      Year = {2007}}

@book{hpBook1,
      Author = {Demkowicz, Leszek},
      Title = {{Computing with $hp$ Finite Elements. I. One- and Two-Dimensional Elliptic and Maxwell Problems}},
      Publisher = {Chapman \& Hall/CRC Press, Taylor and Francis},
      Year = {2006}}

@article{shape2015,
         Author = {Fuentes, Federico and Keith, Brendan and Demkowicz, Leszek and Nagaraj, Sriram},
         Title = {Orientation embedded high order shape functions for the exact sequence elements of all shapes},
         Publisher = {Elsevier},
         Journal = {Computers \& Mathematics with Applications},
         Year = {2015},
         Volume = {70},
         Number = {4},
         Pages = {353--458}}

User Guide

The user manual is continuously updated and maintained here: https://github.com/Oden-EAG/hp3d_user_guide (LaTeX source)

A PDF version of the user manual is available on arXiv: https://arxiv.org/abs/2207.12211

How to Contribute

hp3D is distributed under the terms of the BSD-3 license. All new contributions must be made under this license.

Contributions of all kinds are welcome, including bug fixes, code optimizations, new capabilities, improved documentation, and model problems or applications. The new feature or contribution should be developed on a properly named feature branch based off of hp3d:master in a forked repository. If you would like to propose a contribution, please use a pull request (PR) toward the hp3d:master branch from your forked hp3D repository. Before starting a PR or working on a new feature, we encourage opening an issue first and discussing the new feature or contribution with the developer team.

Support

The issue tracker serves as the primary tool for resolving questions related to code features, etc.

For other inquiries, please contact: [email protected], [email protected]

hp3d's People

Stargazers

Watchers

Forkers

mfkiwl xupeiwust

hp3d's Issues

Optimizations in primal DPG heat formulation.

Some of the optimizations (boundary integrals, Gram matrix storage format) identified in #36 can also be used to speed up the primal DPG formulation of the heat equation in the LASER code.

Change DTYPE from string to integer.

The DTYPE array currently stores the discretization type for each physics variable as a character(len=6) entry ('contin','tangen','normal','discon'). This causes a lot of string comparisons to happen in some functions that are called many times (O(#nodes)), e.g. subroutine get_index. By changing the entries to integer parameters (similar to the type change for node%type and element%type in #69), those calls will be much faster thanks to the cheaper integer comparisons.

Convert node/elem type to integer value

Profiling by @jbadger95 indicates string comparison for character(4) types (e.g., mdlq) takes significant time when routines are called millions of times in large-scale problems. Using integer types instead reduces the cost by a factor of ca. 6–7. This change is in the data structure module.

Boundary domain flags

Support boundary IDs in GMP mesh input files for more convenient setting of BCs in set_initial_mesh
Optionally, add a mesh converter script to the repository for converting gmsh meshes to hp3D's GMP input format

Higher-order paraview/vtk output

Add support for higher-order paraview/vtk output data (geometry mesh and solution) in addition to the existing linear output on uniformly h-refined geometry meshes.

Move compiler debug flags

Several Compiler debug flags (Intel) are currently in the problem makefiles. Move those flags into the (Intel) m_options files.

Decrease memory footprint of initial mesh elements

Initial mesh elements are stored globally. For large initial meshes, this means that the ELEMS array can come at significant memory cost. We can decrease the memory footprint of the initial mesh by making type(element) more memory efficient.

An issue with solver interface for Maxwell problem

For distributed meshes with hanging nodes, solving the Maxwell Galerkin problem fails with some mesh partitionings -- likely when the hanging node is on the subdomain interface. This can be replicated by
passing partition:
uniform h-ref (20), single elem href (26) --> (121), distribute mesh (30), solve (40,45)
failing partition:
uniform h-ref (20), distribute mesh (30), single elem href (26) --> (121), solve (40,45)

par_mumps_sc solver interface:

par_mumps_sc: STARTED

 STEP 1 started : Get assembly info
 STEP 1 finished:      0.00265  seconds

[   0] Number of dof  : nrdof_con =          208
[   0]                  nrdof_tot =          208
[   0] Total non-zeros: nnz       =         8298
[   0] Local non-zeros: nnz_loc   =         4204

 STEP 2 started : Global Assembly
[   1] Local non-zeros: nnz_loc   =         4094
At line 394 of file src/solver/par_mumps/par_mumps_sc.F90
Fortran runtime error: Index '0' of dimension 1 of array 'mumps_par%rhs' below lower bound of 1

Error termination. Backtrace:

PETSc solver interface:

STEP 1 started : Get assembly info
[RANK], sum(petsc_dnz) = [   1],       818
[RANK], sum(petsc_onz) = [   1],       540
 nod_subd .le. 0 !!!

Tests

Add more unit and integration tests to hp3D.

Related PRs:

#75 (cleans up test/ and sets up test/makefile)
#106
#107
#122
#150
#159
#169

Speeding up interpolation operations

Interpolation routines (e.g. hpface, dhpface-, etc.) constitute the bulk of the computation in update_gdof and update_Ddof, which can be expensive (although they aren't as much of a bottleneck with the OMP version). These interpolation routines could be accelerated significantly (>10x, see below) with relatively straightforward optimizations. This isn't high-priority for me right now but I'm making the issue to outline potential improvements and in case anyone wants to implement it before I get to it (@ac1512, you might be interested in similar optimizations for your anisotropic refinement work since IIRC the projections are rather expensive in that case).

Here's a few ways in which they could be optimized:

Ordering of the loops on the assembled projection matrix is not optimal (indices need to be swapped)
Test functions are currently pulled back in the inner-most loop (which means all the pullbacks are applied to all test functions, redundantly for each trial function). The pulled-back test and trial functions coincide so could instead be computed once and stored (and importantly, not redundantly computed, this alone should result in a significant speedup). The best way to do this would be to apply the pullback to all shape functions at once (inside the quadrature point loop) using optimized BLAS3 routines (DGEMM).
Integration could likely be accelerated by forming a matrix with dimensions pulled-back shape functions by quadrature points (so each column has values of pulled-back shape functions at different quadrature points) and then simply multiplying the matrix and its transpose. Calling optimized BLAS3 routines for the assembly instead of explicitly forming the product (as we are doing now) will likely be much faster.

For a p=3 hexahedral mesh, the assembly (integration) in hpface takes ~10-40x longer than the matrix inversion indicating a >10x improvement can likely be achieved.

Simplify installation of hp3D

See comments in JOSS paper review here:
openjournals/joss-reviews#5946

Add preprocessor flag for enabling/disabling openmp

HP3D_USE_OPENMP = 0/1

Error handling

Currently, there is no consistent and reliable error handling in the code. For example, instead of calling stop the code should always call an hp3D internal routine (with error code and error message) that then ends the program safely with MPI_Abort.

Solver interface

Incorporate the different solvers (PETSc, par_mumps, omp_mumps, mkl_pardiso, frontal, ...) into an hp3D solver interface module with solver options defined and set (by the user) within that module.

Fix definition of number of loads and solution copies

Currently, the parameters NRCOMS and MAXNRHS and their behavior are not well-defined.

Implicit `save` declaration with locally initialized variables

In several places in the code, locally initialized variables are implicitly save (see explanation here https://www.cs.rpi.edu/~szymansk/OOF90/bugs.html#4). These should be changed to either have an explicit save declaration (if needed) or otherwise fixed if save is not needed.

add topic tags in About section

I suggest adding the topics multiphysics, finite-elements, fem in the About section at https://github.com/Oden-EAG/hp3d

ParaView: PVD files for VTU output

This is an extension of the work in #53:

Enable writing time values in VTU output format; PVD file will contain the metadata for multiple VTU files in a time series.
Some refactoring of the ParaView module is needed to keep track of the previously written output files (PVD metadata file is rewritten each time a new VTU file is exported)
Optionally, consider adding ParaView module functionality for writing single precision output for reduced file sizes (VTU)

Speeding up face integration loops

Currently, many routines (such as elem for DPG formulations) use inefficient face integration by iterating over more shape functions than necessary. Shape functions should only be evaluated on the particular face being integrated over. This can be accomplished by modifying the norder array in a way specific to the face being integrated before calling the shape functions package. This is currently implemented in the hpinterp face routines that integrate over faces.

Interpolation of existing solution on the refined mesh

The functionality is already provided for p-refinements in nodmod via a COPY_DOFS parameter; this should be instead a module variable that can be set at runtime (perhaps in the parameter module).
The functionaility is not yet provided for h-refinements which should be added; there was a preliminary implementation of this done by @LeszekDemkowicz but it needs to be tested and merged.

Convert fixed-format Fortran to modern Fortran

Convert files from .F to .F90
Add implicit none if missing
Remove common blocks

Pump ODE in fiber amplifier application

For long fiber simulations (> 10 cm) using the vectorial envelope formulation, the assumption of a constant pump plane wave is not a good assumption anymore. As an alternative for these long fiber simulations, I am adding a pump ODE model. Instead of modeling the full vectorial pump field by the Maxwell equations, the Pump ODE model models a pump plane wave decaying along the fiber as the signal laser gains power. This is computationally much cheaper than Maxwell and sufficient if we are not interested in modal content of the pump field; usually, the pump field is assumed to behave like a plane wave across the entire inner cladding.

Enhancing `dirichlet` routine API

Currently, the only information the user can work with (input arguments) in the dirichlet routine are the element (middle node) number Mdle, the physical coordinates of the point X(3) being evaluated, and the Icase number encoding an element's support for particular physics variables.

This setup does not let the user know which face on the boundary X is actually on. That is, the user only has the physical location available to decide which Dirichlet data to set. It may in some cases be helpful to have the actual face number available on which X is being evaluated. Note that dirichlet is also called from hpvert (H1) and hpedge (H1,Hcurl) routines which would need to specify a face accordingly if changing the API.
For H(div) and H(curl) vector-valued Dirichlet data, the user is expected to pass the full vector-valued solution even though only the normal and tangential components are actually needed. While this is convenient for manufactured solutions, it may not be a good way to specify Dirichlet data for unknown problems when only the normal or tangential component data is known. In that case, to correctly specify a vector-valued Cartesian solution, the user may need the normal vector which could be passed as an input to the dirichlet routine.

Distribute initial mesh

For large initial meshes, the initial update_gdof,update_Ddof routines take very long because the mesh is not yet distributed. We should move to distributing initial meshes by default when MPI parallelism is used.

Continuous Integration

Set up CI via GitHub actions

Support static condensation selectively per variable

With the current set up, static condensation is either enabled or disabled for interior DOFs of all variables at the same time. In some applications, the user may need to statically condense only some of the variables. Can we adapt celem routines and module stc so that they are letting the user control which variables to do static condensation on?

ParaView: Implement submesh output

Enable outputting of variables on submeshes such as,
- Surfaces
- 2D cuts through the domain
- Mesh skeleton (useful for trace visualization)

Legacy code

Update remaining legacy code (F77, fixed-format, deprecated functions, etc.)

Related PRs:

ParaView: Higher-Order Element Output

Implement support for higher-order (p > 2) element output with VTU (and perhaps XDMF).

ParaView: Distributed vtk output with VTU/PVTU

Implement support for distributed writing of VTU files (PVTU)

Blocking directives in mesh migration

Mesh migration can be rather slow on large distributed meshes because each node is sent and received individually with blocking directives. I don't have a large file at the moment but on a moderate-size acoustics problem run on 64 Frontera nodes the timing was:

 migration time:      4.14451 seconds

This becomes increasingly expensive on large meshes.

Indirect communication and/or batching may be needed.

Background reading: https://www.mcs.anl.gov/papers/P5139-0514.pdf
Configure PETSc with --with-mpi-f90module-visibility=0 (requires PETSc version >= 3.15, see https://petsc.org/main/changes/315/#changes-3-15)