gridap / gridappetsc.jl Goto Github PK

View Code? Open in Web Editor NEW

30.0 30.0 9.0 820 KB

Provides PETSc solvers to the Gridap project

License: Other

Julia 100.00%

gridappetsc.jl's People

Contributors

Stargazers

Watchers

Forkers

ericneiva jkozdon isgasho oriolcg mfkiwl carlodev hammy4815 nathanlzc ranocha

gridappetsc.jl's Issues

Rename init! and finalize!

@victorsndvg

I think that it can be a good idea to rename

GridapPETSc.init!() -> GridapPETSc.Init()
GridapPETSc.finalize!() -> GridapPETSc.Finalize()

in order to be consistent with

MPI.Init()
MPI.Finalize()

Last time, I have suggested to use the versions with !. But now, by looking into the example in the README file, I see clearly that we want to be consistent with MPI. Sorry for the inconvenience...

Can you rename the functions? Thanks!

Improving PETSc SNES support

At present the current implementation of PETScNonLinearSolver is not fully optimized. These are the main problems.

Residuals and jacobians are copied back and forth from PArrays to PETSc objects, and vice-versa. I guess that this can be overcomed by directly assembling into PETSc data structures (as we did in GridapDistributed.jl 0.1.0). I would consider this in a future PR if and only if we determine that it becomes a bottleneck/hot spot.
The jacobian copy overhead is even more severe because the local storage format of PSparseMatrix by default is not SparseMatrixCSR{0,PetscInt,PetscReal} but SparseMatrixCSC, so that we need a conversion at each Jacobian calculation. I guess this latter issue can be easily overcomed. #66
The PETScNonLinearSolverCache does not currently store a reference to the KSP/PC objects used for the previous nonlinear solve. I guess it would be convenient to have this in the future. (We can put it an issue).

Besides,

Partitioned simulations in sequential mode currently trigger a @NotImplemented macro. I did not have enough strength to fully implement this case.

Support for complex numbers

Thanks for the great library!

Would it be possible to add support for complex numbers?

Missing `PCFieldSplit` support

The current implementation does not allow to use pc_fieldsplit preconditioners of non-strided block matrices, i.e. ConsecutiveMultiFieldStyle. I believe that to enable this feature we should have an implementation of the function PCFieldSplitSetIS and the associated index set struc IS.

Am I missing something? If not, how should we proceed?

TagBot trigger issue

This issue is used to trigger TagBot; feel free to unsubscribe.

If you haven't already, you should update your TagBot.yml to include issue comment triggers.
Please see this post on Discourse for instructions and more details.

If you'd like for me to do this for you, comment TagBot fix on this issue.
I'll open a PR within a few hours, please be patient!

Add Int64 PETsc Ints tests in Github Actions

I think it is highly recommended to include tests in Github Actions with the 64-bit integer compilation of PETSc.

AFAIK, PETSc_jll artifacts only provide the 32-bit integer compilation.

PETSc Error

Hi, I am operating on macOS Monterey, version 12.5.1, and i am facing an issue when i try to run code using GridapPETSc, below is the error in stack-trace

PETSc Error --- Application was linked against both OpenMPI and MPICH based MPI libraries 
and will not run correctly

After i got this error, i have un-installed MPICH from my machine, and currently only have OPEN-MPI, but still it shows having linked with multiple MPI implementation. Any help in this regard will be very much appreciated ...

Mpi tests finish successfully even thought there are errors in the launched MPI processes

See e.g. https://github.com/gridap/GridapPETSc.jl/runs/4934759266?check_suite_focus=true#step:9:1942

Assertion error in solve! for transient operators

Hi @amartinhuertas , @fverdugo . I'm getting an assertion error here:

GridapPETSc.jl/src/PETScNonlinearSolvers.jl

Line 161 in 119a9d0

@assert cache.op === op

when using PETScNonlinearSolver in transient problem. I get the error at the second time step, which makes me think that indeed op and cache.op are not the same. Should that be the case? If yes, what would be the best solution to this issue?

test/sequential/ElasticityDriver.jl fails with PETSc compiled in DEBUG mode on my machine

@fverdugo ... test/sequential/ElasticityDriver.jl fails with PETSc compiled in DEBUG mode on my machine

The error is:

[0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
[0]PETSC ERROR: Invalid argument
[0]PETSC ERROR: Vector 0 must have 2-norm of 1.0, it is 27.
[0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.
[0]PETSC ERROR: Petsc Release Version 3.15.4, Sep 01, 2021 
[0]PETSC ERROR: GridapPETSc on a x86_64 named sistemas-ThinkPad-X1-Carbon-6th by amartin Sat Nov 13 16:14:08 2021
[0]PETSC ERROR: Configure options --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90 -with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --download-mumps --download-scalapack --download-parmetis --download-metis --download-ptscotch --with-debugging --with-x=0 --with-shared-libraries=1 --with-mpi=1 --with-64-bit-indices
[0]PETSC ERROR: #1 MatNullSpaceCreate() at /home/amartin/software_installers/petsc-3.15.4-build-dbg-gnu9-mpi/src/mat/interface/matnull.c:265
ElasticityDriver: Error During Test at /home/amartin/git-repos/GridapPETSc.jl/test/sequential/runtests.jl:19
  Got exception outside of a @test
  LoadError: Petsc returned with error code: 62 
  Stacktrace:
    [1] macro expansion
      @ ~/git-repos/GridapPETSc.jl/src/Config.jl:88 [inlined]
    [2] (::Main.GridapPETScSequentialTests.ElasticityDriver.var"#1#9")()
      @ Main.GridapPETScSequentialTests.ElasticityDriver ~/git-repos/GridapPETSc.jl/test/sequential/ElasticityDriver.jl:79
    [3] with(f::Main.GridapPETScSequentialTests.ElasticityDriver.var"#1#9"; kwargs::Base.Iterators.Pairs{Symbol, Vector{SubString{String}}, Tuple{Symbol}, NamedTuple{(:args,), Tuple{Vector{SubString{String}}}}})
      @ GridapPETSc ~/git-repos/GridapPETSc.jl/src/Environment.jl:38
    [4] top-level scope
      @ ~/git-repos/GridapPETSc.jl/test/sequential/ElasticityDriver.jl:50
    [5] include(mod::Module, _path::String)

I tried to normalize the rigid body motion vectors. This error above disappears, but we get a new one. Namely:

[0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
[0]PETSC ERROR: Invalid argument
[0]PETSC ERROR: Vector 0 must have 2-norm of 1.0, it is 27.
[0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.
[0]PETSC ERROR: Petsc Release Version 3.15.4, Sep 01, 2021 
[0]PETSC ERROR: GridapPETSc on a x86_64 named sistemas-ThinkPad-X1-Carbon-6th by amartin Sat Nov 13 16:14:08 2021
[0]PETSC ERROR: Configure options --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpif90 -with-blaslapack-dir=/opt/intel/compilers_and_libraries_2020.0.166/linux/mkl --download-mumps --download-scalapack --download-parmetis --download-metis --download-ptscotch --with-debugging --with-x=0 --with-shared-libraries=1 --with-mpi=1 --with-64-bit-indices
[0]PETSC ERROR: #1 MatNullSpaceCreate() at /home/amartin/software_installers/petsc-3.15.4-build-dbg-gnu9-mpi/src/mat/interface/matnull.c:265
ElasticityDriver: Error During Test at /home/amartin/git-repos/GridapPETSc.jl/test/sequential/runtests.jl:19
  Got exception outside of a @test
  LoadError: Petsc returned with error code: 62 
  Stacktrace:
    [1] macro expansion
      @ ~/git-repos/GridapPETSc.jl/src/Config.jl:88 [inlined]
    [2] (::Main.GridapPETScSequentialTests.ElasticityDriver.var"#1#9")()
      @ Main.GridapPETScSequentialTests.ElasticityDriver ~/git-repos/GridapPETSc.jl/test/sequential/ElasticityDriver.jl:79
    [3] with(f::Main.GridapPETScSequentialTests.ElasticityDriver.var"#1#9"; kwargs::Base.Iterators.Pairs{Symbol, Vector{SubString{String}}, Tuple{Symbol}, NamedTuple{(:args,), Tuple{Vector{SubString{String}}}}})
      @ GridapPETSc ~/git-repos/GridapPETSc.jl/src/Environment.jl:38
    [4] top-level scope
      @ ~/git-repos/GridapPETSc.jl/test/sequential/ElasticityDriver.jl:50
    [5] include(mod::Module, _path::String)
      @ Base ./Base.jl:386
    [6] include
      @ ~/git-repos/GridapPETSc.jl/test/sequential/runtests.jl:1 [inlined]

does it ring a bell? The rigid body motions seem to be well defined in the driver (3xrotations + 3xtranslations).

At present we are not able to reproduce this error in Github actions as we are using PETSc_jll which comes with PETSc compiled in release mode.

User API to get KSP number of iterations after convergence

In GridapPETSc v0.3.0 the API no longer allows to extract the number of iterations at which ksp converged.

Old API:

function PETSc_get_number_of_iterations(ps::PETScSolver)
    KSPGetIterationNumber!(ps.ksp)
end

The ksp is no longer an attribute of PETScSolver, but PETScSolverNS. How can the user extract the number of iterations now?

Error running tests with external PETSc installation

When installing GridapPETSc with external libraries, i.e. with non-empty JULIA_MPI_BINARY and JULIA_PETSC_LIBRARY, executing the whole set of tests fails even if separate execution of serial and parallel tests succeeds.

In other words,

julia --project=. --color=yes --check-bounds=yes test/sequential/runtests.jl
julia --project=. --color=yes --check-bounds=yes test/mpi/runtests.jl

are both successful but

julia --project=. --color=yes --check-bounds=yes test/runtests.jl

fails to exectute the parallel tests with a message

...
PartitionedArrays: Error During Test at /home/jprincipe/Prog/GridapPETSc.jl/test/mpi/runtests.jl:3
  Got exception outside of a @test
  LoadError: failed process: Process(`mpiexec -n 3 --allow-run-as-root --oversubscribe /home/jprincipe/System/bin/julia-1.7.2/bin/julia -Cnative -J/home/jprincipe/System/bin/julia-1.7.2/lib/julia/sys.so --check-bounds=yes -g1 --color=yes --project=/home/jprincipe/Prog/GridapPETSc.jl/test/mpi/../.. /home/jprincipe/Prog/GridapPETSc.jl/test/mpi/PartitionedArraysTests.jl`, ProcessExited(1)) [1]
...

The execution of the whole set of test is successful when the default installation is used, that is with empty JULIA_MPI_BINARY and JULIA_PETSC_LIBRARY.

@amartinhuertas , @fverdugo any thoughts?

Misc tasks towards large-scale linear and nonlinear problems with GridapDistributed + GridapPETSc

High-priority

Refactor PetscNonlinearSolver. Issue: #45 @amartinhuertas #53
Fix copy!. Issue #50 @amartinhuertas #53
Refactor mpi tests with a TestApp and try to put as much drivers as possible in the same mpi run to reuse JIT compilation cache. Essentially, all test requiring the same number of procs can be in the same run. This should reduce the CI time significantly
Garbage collection of petsc objects. Issue #41. To fully fix this, it is likely that we need to fix #51 first.
#45 (comment)

It would be nice to have all functionality working in native sequential (just as before), as well as in distributed sequential and distributed mpi mode.

Compatibility with PETSc_jll.jl v0.16+

During the recent upgrade to PartitionedArrays.jl v0.3, I've had to limit the compatibility of the julia binaries to 0.13 and 0.15.
With any binary versions above that, errors occur (see for instance).

I am not currently 100% sure of what is causing the issue, but it is probably related to changes to the PETSc library. Hopefully it only involves a change in the tests, but it could be deeper. We should have a look at what is causing the issue and restore compatibility.

Inconsistent definition of copy! for PETScVector ?

I have the feeling that the definition of these two functions:

GridapPETSc.jl/src/PETScArrays.jl

Line 121 in 3ba0de4

function Base.copy!(vec::AbstractVector,petscvec::PETScVector)

GridapPETSc.jl/src/PETScArrays.jl

Line 136 in 3ba0de4

function Base.copy!(petscvec::PETScVector,vec::AbstractVector)

is inconsistent with the Julia definition of copy!:

help?> copy!
search: copy! copyto! circcopy! unsafe_copyto! copy copysign deepcopy

  copy!(dst, src) -> dst

  In-place copy of src into dst, discarding any pre-existing elements in dst.
  If dst and src are of the same type, dst == src should hold after the call.
  If dst and src are multidimensional arrays, they must have equal axes. See
  also copyto!.

In particular, with the sentence "if dst and src are multidimensional arrays, they must have equal axes. " As far as I can tell this means that length(dst) == length(src) should be true. And I think that this is not true in the functions linked above. In other words, both vectors need to be understood either as global vectors, or as local ones.

@amartinhuertas Do you remember why you needed to change these functions?

Discussion on GC of PETSc objects on PETSc-Users

Dear PETSc users,
What is the main reason underlying PetscDestroy subroutines having global collective semantics? Is this actually true for all > PETSc objects? Can this be relaxed/deactivated by, e.g., compilation macros/configuration options?
We are leveraging PETSc from Julia in a parallel distributed memory context (several MPI tasks running the Julia REPL each). Julia uses Garbage Collection (GC), and we would like to destroy the PETSc objects automatically when the GC decides so along the simulation. In this context, we cannot guarantee deterministic destruction on all MPI tasks as the GC decisions are local to each task, no global semantics guaranteed.
Thanks in advance!
Best regards,
Alberto.

Ahh, this makes perfect sense.
The code for PetscObjectRegisterDestroy() and the actual destruction (called in PetscFinalize()) is very simply and can be found in src/sys/objects/destroy.c PetscObjectRegisterDestroy(), PetscObjectRegisterDestroyAll().
You could easily maintain a new array like PetscObjectRegisterGCDestroy_Objects[] and add objects with PetscObjectRegisterGCDestroy() and then destroy them with PetscObjectRegisterDestroyGCAll(). The only tricky part is that you have to have, in the context of your Julia MPI, make sure that PetscObjectRegisterDestroyGCAll() is called collectively over all the MPI ranks (that is it has to be called where all the ranks have made the same progress on MPI communication) that have registered objects to destroy, generally PETSC_COMM_ALL. We would be happy to incorporate such a system into the PETSc source with a merge request.
Barry

I think that it is not just MPI_Comm_free that is potentially problematic.
Here are some additional areas off the top of my head:

PetscSF with -sf_type window. Destroy (when the refcount drops to zero) calls MPI_Win_free (which is collective over comm)

Deallocation of MUMPS objects is tremendously collective.
In general the solution of just punting MPI_Comm_free to PetscFinalize (or some user-defined time) is, I think, insufficient since it requires us to audit the collectiveness of all XXX_Destroy functions (including in third-party packages).
Barry's suggestion of resurrecting objects in finalisation using PetscObjectRegisterDestroy and then collectively clearing that array periodically is pretty close to the proposal that we cooked up I think.
Jack can correct any missteps I make in explanation, but perhaps this is helpful for Alberto:

Each PETSc communicator gets two new attributes "creation_index" [an int64], "resurrected_objects" [a set-like thing]

PetscHeaderCreate grabs the next creation_index out of the input communicator and stashes it on the object. Since object >creation is collective this is guaranteed to agree on any given communicator across processes.

When the Python garbage collector tries to destroy PETSc objects we resurrect the C object in finalisation and stash it in >"resurrected_objects" on the communicator.

Periodically (as a result of user intervention in the first instance), we do garbage collection collectively on these resurrected >objects by performing a set intersection of the creation_indices across the communicator's processes, and then calling >XXXDestroy in order on the sorted_by_creation_index set intersection.
I think that most of this infrastructure is agnostic of the managed language, so Jack was doing implementation in PETSc (rather than petsc4py).
This wasn't a perfect solution (I recall that we could still cook up situations in which objects would not be collected), but it did >seem to (in theory) solve any potential deadlock issues.
Lawrence

Hi Everyone,
I cannot fault Lawrence's explanation, that is precisely what I'm implementing. The only difference is I was adding most of the logic for the "resurrected objects map" to petsc4py rather than PETSc. Given that this solution is truly Python agnostic, I will move what I have written to C and merely add the interface to the functionality to petsc4py.
Indeed, this works out better for me as I was not enjoying writing all the code in Cython! I'll post an update once there is a working prototype in my PETSc fork, and the code is ready for testing.
Cheers,
Jack

Store MPI.Comm in PetscVector and PetscMatrix

I think it would be convenient to store the MPI.Comm object inside the PetscVector and PetscMatrix.

It would be required e.g. to check if comm==MPI.COMM_SELF and select a suitable finalization strategy.

Once this is done, we can also remove MPI.Comm from PetscLinearSolver.

Add test associated with the fix in PR #75

cc PR #75

Efficiency of the Garbage Collection

Hello all,

@amartinhuertas and me have been working on a Geometric MultiGrid solver/preconditioner for Gridap. To solve the coarsest-level linear system in parallel, we decided to use MUMPS through a PETScLinearSolver from GridapPETSc.

While profiling the resulting algorithm for a Poisson problem, where we use a preconditioned Conjugate Gradient solver with our GMG preconditioner, we find that a weirdly high amount of time is spent in the method gridap_petsc_gc(), which is called from within solve!() at every iteration of the conjugate gradient.

Due to this, we think it could be advantageous to introduce a new flag/kwarg/setting that deactivates all these automatic calls to the GC that are scattered around the code. The idea is to give an advanced user/developer the ability to manually call the GC optimally depending on the application.

Thanks!

Misc tasks

cc @fverdugo

Upgrade dependency versions. (Gridap 0.17) PR #36
Rename PETScSolver by PETScLinearSolver. PR #37
Add support for SNES. Wrap it around Gridap NLSolver implementor. PR #38
Implement Garbage Collector in the spirit of issue #41
Use convert API to transform from PartitionedArrays to PETSc and vice-versa. Done in PR #38
Investigate the interplay among GridapPETSc.with(...) and PArrays.prun(). See also #46 (comment)
Something else?

Concurrent subdomain-wise assembly?

Sorry for these very naive questions, I don't know the inner workings of Gridap so they may be inappropriate.

When you do assembly in parallel, do you use ghost elements, or do you use partial assembly and then sum contributions at the interfaces?
I'm guessing there is some underlying data decomposition of the initial mesh depending on the number of MPI processes. Is it possible to assemble a different variational formulation just on the "local" portion of the initial mesh?

PETSc nonlinear solve not producing correct solution for transient problems

For all transient problems in GridapODEs, the stage operator (linear or non-linear) is recreated at each time step. This causes issues in GridapPETSc when using a non-linear solver as described in Issue#81.

GridapPETSc.jl/src/PETScNonlinearSolvers.jl

Line 161 in 35b678d

@assert cache.op === op

The hack to comment out this line or set cache.op=op does not work for all problems. @JordiManyer and I found the only solution was to set sysslvrcache=nothing prior to solving the NonlinearStageOperator within ode_march!.

This is not a long term solution as destroying sysslvrcache means the residual and jacobian are allocated every time step in

GridapPETSc.jl/src/PETScNonlinearSolvers.jl

Line 168 in 35b678d

cache=_setup_cache(x,nls,op)

An alternative is to use the non-linear solvers within GridapSolvers. Additionally, we tested using a PETSc linear solver within a GridapSolvers nonlinear solver and this recovered the correct behaviour. This isolates the problem to PETSc nonlinearsolvers, in particular:

GridapPETSc.jl/src/PETScNonlinearSolvers.jl

Line 156 in 35b678d

function Algebra.solve!(x::T,

We believe this function should be changed to pass a new jacobian and residual function rather than cached objects for transient problems requiring a non-linear solver. @JordiManyer will try to create a reproducible example for testing.

gridap / gridappetsc.jl Goto Github PK

gridappetsc.jl's People

Contributors

Stargazers

Watchers

Forkers

gridappetsc.jl's Issues

High-priority

Recommend Projects

Recommend Topics

Recommend Org