oden-eag / hp3d Goto Github PK
View Code? Open in Web Editor NEWMPI/OpenMP hp-adaptive 3D finite element software
Home Page: https://oden.utexas.edu
License: Other
MPI/OpenMP hp-adaptive 3D finite element software
Home Page: https://oden.utexas.edu
License: Other
Mesh migration can be rather slow on large distributed meshes because each node is sent and received individually with blocking directives. I don't have a large file at the moment but on a moderate-size acoustics problem run on 64 Frontera nodes the timing was:
migration time: 4.14451 seconds
This becomes increasingly expensive on large meshes.
Indirect communication and/or batching may be needed.
Currently, many routines (such as elem
for DPG formulations) use inefficient face integration by iterating over more shape functions than necessary. Shape functions should only be evaluated on the particular face being integrated over. This can be accomplished by modifying the norder
array in a way specific to the face being integrated before calling the shape functions package. This is currently implemented in the hpinterp
face routines that integrate over faces.
In several places in the code, locally initialized variables are implicitly save
(see explanation here https://www.cs.rpi.edu/~szymansk/OOF90/bugs.html#4). These should be changed to either have an explicit save
declaration (if needed) or otherwise fixed if save
is not needed.
Interpolation routines (e.g. hpface
, dhpface-
, etc.) constitute the bulk of the computation in update_gdof
and update_Ddof
, which can be expensive (although they aren't as much of a bottleneck with the OMP version). These interpolation routines could be accelerated significantly (>10x, see below) with relatively straightforward optimizations. This isn't high-priority for me right now but I'm making the issue to outline potential improvements and in case anyone wants to implement it before I get to it (@ac1512, you might be interested in similar optimizations for your anisotropic refinement work since IIRC the projections are rather expensive in that case).
Here's a few ways in which they could be optimized:
DGEMM
).For a p=3 hexahedral mesh, the assembly (integration) in hpface
takes ~10-40x longer than the matrix inversion indicating a >10x improvement can likely be achieved.
implicit none
if missingThe DTYPE
array currently stores the discretization type for each physics variable as a character(len=6)
entry ('contin'
,'tangen'
,'normal'
,'discon'
). This causes a lot of string comparisons to happen in some functions that are called many times (O(#nodes)), e.g. subroutine get_index
. By changing the entries to integer parameters (similar to the type change for node%type
and element%type
in #69), those calls will be much faster thanks to the cheaper integer comparisons.
This is an extension of the work in #53:
Set up CI via GitHub actions
See comments in JOSS paper review here:
openjournals/joss-reviews#5946
Incorporate the different solvers (PETSc
, par_mumps
, omp_mumps
, mkl_pardiso
, frontal
, ...) into an hp3D solver interface module with solver options defined and set (by the user) within that module.
Currently, the only information the user can work with (input arguments) in the dirichlet
routine are the element (middle node) number Mdle
, the physical coordinates of the point X(3)
being evaluated, and the Icase
number encoding an element's support for particular physics variables.
This setup does not let the user know which face on the boundary X
is actually on. That is, the user only has the physical location available to decide which Dirichlet data to set. It may in some cases be helpful to have the actual face number available on which X
is being evaluated. Note that dirichlet
is also called from hpvert
(H1) and hpedge
(H1,Hcurl) routines which would need to specify a face accordingly if changing the API.
For H(div) and H(curl) vector-valued Dirichlet data, the user is expected to pass the full vector-valued solution even though only the normal and tangential components are actually needed. While this is convenient for manufactured solutions, it may not be a good way to specify Dirichlet data for unknown problems when only the normal or tangential component data is known. In that case, to correctly specify a vector-valued Cartesian solution, the user may need the normal vector which could be passed as an input to the dirichlet
routine.
With the current set up, static condensation is either enabled or disabled for interior DOFs of all variables at the same time. In some applications, the user may need to statically condense only some of the variables. Can we adapt celem
routines and module stc
so that they are letting the user control which variables to do static condensation on?
nodmod
via a COPY_DOFS
parameter; this should be instead a module variable that can be set at runtime (perhaps in the parameter
module).Initial mesh elements are stored globally. For large initial meshes, this means that the ELEMS
array can come at significant memory cost. We can decrease the memory footprint of the initial mesh by making type(element)
more memory efficient.
hp3D currently uses the mpi
module (F90 / MPI-2.0) binding, which should be upgraded to (F08 / MPI-3.0) binding.
Background reading: https://www.mcs.anl.gov/papers/P5139-0514.pdf
Configure PETSc with --with-mpi-f90module-visibility=0
(requires PETSc version >= 3.15, see https://petsc.org/main/changes/315/#changes-3-15)
Several Compiler debug flags (Intel) are currently in the problem makefiles. Move those flags into the (Intel) m_options files.
HP3D_USE_OPENMP = 0/1
GMP
mesh input files for more convenient setting of BCs in set_initial_mesh
gmsh
meshes to hp3D's GMP
input formatFor long fiber simulations (> 10 cm) using the vectorial envelope formulation, the assumption of a constant pump plane wave is not a good assumption anymore. As an alternative for these long fiber simulations, I am adding a pump ODE model. Instead of modeling the full vectorial pump field by the Maxwell equations, the Pump ODE model models a pump plane wave decaying along the fiber as the signal laser gains power. This is computationally much cheaper than Maxwell and sufficient if we are not interested in modal content of the pump field; usually, the pump field is assumed to behave like a plane wave across the entire inner cladding.
For large initial meshes, the initial update_gdof
,update_Ddof
routines take very long because the mesh is not yet distributed. We should move to distributing initial meshes by default when MPI parallelism is used.
Profiling by @jbadger95 indicates string comparison for character(4) types (e.g., mdlq
) takes significant time when routines are called millions of times in large-scale problems. Using integer types instead reduces the cost by a factor of ca. 6โ7. This change is in the data structure module.
Currently, there is no consistent and reliable error handling in the code. For example, instead of calling stop the code should always call an hp3D internal routine (with error code and error message) that then ends the program safely with MPI_Abort
.
I suggest adding the topics multiphysics
, finite-elements
, fem
in the About section at https://github.com/Oden-EAG/hp3d
Add support for higher-order paraview/vtk output data (geometry mesh and solution) in addition to the existing linear output on uniformly h-refined geometry meshes.
For distributed meshes with hanging nodes, solving the Maxwell Galerkin problem fails with some mesh partitionings -- likely when the hanging node is on the subdomain interface. This can be replicated by
passing partition:
uniform h-ref (20), single elem href (26) --> (121), distribute mesh (30), solve (40,45)
failing partition:
uniform h-ref (20), distribute mesh (30), single elem href (26) --> (121), solve (40,45)
par_mumps_sc
solver interface:
par_mumps_sc: STARTED
STEP 1 started : Get assembly info
STEP 1 finished: 0.00265 seconds
[ 0] Number of dof : nrdof_con = 208
[ 0] nrdof_tot = 208
[ 0] Total non-zeros: nnz = 8298
[ 0] Local non-zeros: nnz_loc = 4204
STEP 2 started : Global Assembly
[ 1] Local non-zeros: nnz_loc = 4094
At line 394 of file src/solver/par_mumps/par_mumps_sc.F90
Fortran runtime error: Index '0' of dimension 1 of array 'mumps_par%rhs' below lower bound of 1
Error termination. Backtrace:
PETSc
solver interface:
STEP 1 started : Get assembly info
[RANK], sum(petsc_dnz) = [ 1], 818
[RANK], sum(petsc_onz) = [ 1], 540
nod_subd .le. 0 !!!
Continuation of the work on visualization #53
As of now, pyramids aren't supported (only hexas, prisms, tets) in paraview export (both XDMF and VTU).
Some of the optimizations (boundary integrals, Gram matrix storage format) identified in #36 can also be used to speed up the primal DPG formulation of the heat equation in the LASER
code.
Currently, the parameters NRCOMS
and MAXNRHS
and their behavior are not well-defined.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.