oden-eag / hp3d Goto Github PK

View Code? Open in Web Editor NEW

16.0 2.0 2.0 32.58 MB

MPI/OpenMP hp-adaptive 3D finite element software

Home Page: https://oden.utexas.edu

License: Other

Shell 1.82% Perl 0.51% C 0.50% C++ 0.12% Makefile 0.87% Python 0.95% Fortran 95.23%

computational-science fem finite-elements high-order hpc multiphysics hp-adaptivity scientific-computing

hp3d's Issues

Legacy code

Update remaining legacy code (F77, fixed-format, deprecated functions, etc.)

Related PRs:

Blocking directives in mesh migration

Mesh migration can be rather slow on large distributed meshes because each node is sent and received individually with blocking directives. I don't have a large file at the moment but on a moderate-size acoustics problem run on 64 Frontera nodes the timing was:

 migration time:      4.14451 seconds

This becomes increasingly expensive on large meshes.

Indirect communication and/or batching may be needed.

Tests

Add more unit and integration tests to hp3D.

Related PRs:

#75 (cleans up test/ and sets up test/makefile)
#106
#107
#122
#150
#159
#169

Speeding up face integration loops

Currently, many routines (such as elem for DPG formulations) use inefficient face integration by iterating over more shape functions than necessary. Shape functions should only be evaluated on the particular face being integrated over. This can be accomplished by modifying the norder array in a way specific to the face being integrated before calling the shape functions package. This is currently implemented in the hpinterp face routines that integrate over faces.

Implicit `save` declaration with locally initialized variables

In several places in the code, locally initialized variables are implicitly save (see explanation here https://www.cs.rpi.edu/~szymansk/OOF90/bugs.html#4). These should be changed to either have an explicit save declaration (if needed) or otherwise fixed if save is not needed.

Speeding up interpolation operations

Interpolation routines (e.g. hpface, dhpface-, etc.) constitute the bulk of the computation in update_gdof and update_Ddof, which can be expensive (although they aren't as much of a bottleneck with the OMP version). These interpolation routines could be accelerated significantly (>10x, see below) with relatively straightforward optimizations. This isn't high-priority for me right now but I'm making the issue to outline potential improvements and in case anyone wants to implement it before I get to it (@ac1512, you might be interested in similar optimizations for your anisotropic refinement work since IIRC the projections are rather expensive in that case).

Here's a few ways in which they could be optimized:

Ordering of the loops on the assembled projection matrix is not optimal (indices need to be swapped)
Test functions are currently pulled back in the inner-most loop (which means all the pullbacks are applied to all test functions, redundantly for each trial function). The pulled-back test and trial functions coincide so could instead be computed once and stored (and importantly, not redundantly computed, this alone should result in a significant speedup). The best way to do this would be to apply the pullback to all shape functions at once (inside the quadrature point loop) using optimized BLAS3 routines (DGEMM).
Integration could likely be accelerated by forming a matrix with dimensions pulled-back shape functions by quadrature points (so each column has values of pulled-back shape functions at different quadrature points) and then simply multiplying the matrix and its transpose. Calling optimized BLAS3 routines for the assembly instead of explicitly forming the product (as we are doing now) will likely be much faster.

For a p=3 hexahedral mesh, the assembly (integration) in hpface takes ~10-40x longer than the matrix inversion indicating a >10x improvement can likely be achieved.

Convert fixed-format Fortran to modern Fortran

Convert files from .F to .F90
Add implicit none if missing
Remove common blocks

Change DTYPE from string to integer.

The DTYPE array currently stores the discretization type for each physics variable as a character(len=6) entry ('contin','tangen','normal','discon'). This causes a lot of string comparisons to happen in some functions that are called many times (O(#nodes)), e.g. subroutine get_index. By changing the entries to integer parameters (similar to the type change for node%type and element%type in #69), those calls will be much faster thanks to the cheaper integer comparisons.

ParaView: PVD files for VTU output

This is an extension of the work in #53:

Enable writing time values in VTU output format; PVD file will contain the metadata for multiple VTU files in a time series.
Some refactoring of the ParaView module is needed to keep track of the previously written output files (PVD metadata file is rewritten each time a new VTU file is exported)
Optionally, consider adding ParaView module functionality for writing single precision output for reduced file sizes (VTU)

Continuous Integration

Set up CI via GitHub actions

ParaView: Distributed vtk output with VTU/PVTU

Implement support for distributed writing of VTU files (PVTU)

Simplify installation of hp3D

See comments in JOSS paper review here:
openjournals/joss-reviews#5946

Solver interface

Incorporate the different solvers (PETSc, par_mumps, omp_mumps, mkl_pardiso, frontal, ...) into an hp3D solver interface module with solver options defined and set (by the user) within that module.

CI with Intel compilers and Intel MKL (oneAPI)

Enhancing `dirichlet` routine API

Currently, the only information the user can work with (input arguments) in the dirichlet routine are the element (middle node) number Mdle, the physical coordinates of the point X(3) being evaluated, and the Icase number encoding an element's support for particular physics variables.

This setup does not let the user know which face on the boundary X is actually on. That is, the user only has the physical location available to decide which Dirichlet data to set. It may in some cases be helpful to have the actual face number available on which X is being evaluated. Note that dirichlet is also called from hpvert (H1) and hpedge (H1,Hcurl) routines which would need to specify a face accordingly if changing the API.
For H(div) and H(curl) vector-valued Dirichlet data, the user is expected to pass the full vector-valued solution even though only the normal and tangential components are actually needed. While this is convenient for manufactured solutions, it may not be a good way to specify Dirichlet data for unknown problems when only the normal or tangential component data is known. In that case, to correctly specify a vector-valued Cartesian solution, the user may need the normal vector which could be passed as an input to the dirichlet routine.

Support static condensation selectively per variable

With the current set up, static condensation is either enabled or disabled for interior DOFs of all variables at the same time. In some applications, the user may need to statically condense only some of the variables. Can we adapt celem routines and module stc so that they are letting the user control which variables to do static condensation on?

Interpolation of existing solution on the refined mesh

The functionality is already provided for p-refinements in nodmod via a COPY_DOFS parameter; this should be instead a module variable that can be set at runtime (perhaps in the parameter module).
The functionaility is not yet provided for h-refinements which should be added; there was a preliminary implementation of this done by @LeszekDemkowicz but it needs to be tested and merged.

Decrease memory footprint of initial mesh elements

Initial mesh elements are stored globally. For large initial meshes, this means that the ELEMS array can come at significant memory cost. We can decrease the memory footprint of the initial mesh by making type(element) more memory efficient.

Upgrade Fortran MPI binding to `mpi_f08`

hp3D currently uses the mpi module (F90 / MPI-2.0) binding, which should be upgraded to (F08 / MPI-3.0) binding.

Background reading: https://www.mcs.anl.gov/papers/P5139-0514.pdf
Configure PETSc with --with-mpi-f90module-visibility=0 (requires PETSc version >= 3.15, see https://petsc.org/main/changes/315/#changes-3-15)

Move compiler debug flags

Several Compiler debug flags (Intel) are currently in the problem makefiles. Move those flags into the (Intel) m_options files.

Add preprocessor flag for enabling/disabling openmp

HP3D_USE_OPENMP = 0/1

Boundary domain flags

Support boundary IDs in GMP mesh input files for more convenient setting of BCs in set_initial_mesh
Optionally, add a mesh converter script to the repository for converting gmsh meshes to hp3D's GMP input format

Pump ODE in fiber amplifier application

For long fiber simulations (> 10 cm) using the vectorial envelope formulation, the assumption of a constant pump plane wave is not a good assumption anymore. As an alternative for these long fiber simulations, I am adding a pump ODE model. Instead of modeling the full vectorial pump field by the Maxwell equations, the Pump ODE model models a pump plane wave decaying along the fiber as the signal laser gains power. This is computationally much cheaper than Maxwell and sufficient if we are not interested in modal content of the pump field; usually, the pump field is assumed to behave like a plane wave across the entire inner cladding.

Distribute initial mesh

For large initial meshes, the initial update_gdof,update_Ddof routines take very long because the mesh is not yet distributed. We should move to distributing initial meshes by default when MPI parallelism is used.

ParaView: Implement submesh output

Enable outputting of variables on submeshes such as,
- Surfaces
- 2D cuts through the domain
- Mesh skeleton (useful for trace visualization)

Convert node/elem type to integer value

Profiling by @jbadger95 indicates string comparison for character(4) types (e.g., mdlq) takes significant time when routines are called millions of times in large-scale problems. Using integer types instead reduces the cost by a factor of ca. 6–7. This change is in the data structure module.

Error handling

Currently, there is no consistent and reliable error handling in the code. For example, instead of calling stop the code should always call an hp3D internal routine (with error code and error message) that then ends the program safely with MPI_Abort.

add topic tags in About section

I suggest adding the topics multiphysics, finite-elements, fem in the About section at https://github.com/Oden-EAG/hp3d

Higher-order paraview/vtk output

Add support for higher-order paraview/vtk output data (geometry mesh and solution) in addition to the existing linear output on uniformly h-refined geometry meshes.

An issue with solver interface for Maxwell problem

For distributed meshes with hanging nodes, solving the Maxwell Galerkin problem fails with some mesh partitionings -- likely when the hanging node is on the subdomain interface. This can be replicated by
passing partition:
uniform h-ref (20), single elem href (26) --> (121), distribute mesh (30), solve (40,45)
failing partition:
uniform h-ref (20), distribute mesh (30), single elem href (26) --> (121), solve (40,45)

par_mumps_sc solver interface:

par_mumps_sc: STARTED

 STEP 1 started : Get assembly info
 STEP 1 finished:      0.00265  seconds

[   0] Number of dof  : nrdof_con =          208
[   0]                  nrdof_tot =          208
[   0] Total non-zeros: nnz       =         8298
[   0] Local non-zeros: nnz_loc   =         4204

 STEP 2 started : Global Assembly
[   1] Local non-zeros: nnz_loc   =         4094
At line 394 of file src/solver/par_mumps/par_mumps_sc.F90
Fortran runtime error: Index '0' of dimension 1 of array 'mumps_par%rhs' below lower bound of 1

Error termination. Backtrace:

PETSc solver interface:

STEP 1 started : Get assembly info
[RANK], sum(petsc_dnz) = [   1],       818
[RANK], sum(petsc_onz) = [   1],       540
 nod_subd .le. 0 !!!

Implement support for higher-order (p > 2) element output with VTU (and perhaps XDMF).

oden-eag / hp3d Goto Github PK

hp3d's Issues

Recommend Projects

Recommend Topics

Recommend Org