alan-turing-institute / stat-fem Goto Github PK

View Code? Open in Web Editor NEW

12.0 6.0 6.0 744 KB

Python tools for solving data-constrained finite element problems

License: GNU Lesser General Public License v3.0

Dockerfile 0.50% Python 99.42% Makefile 0.07%

hut23 hut23-183 uncertainty-quantification finite-element-analysis probabilistic-numerics

stat-fem's Introduction

stat-fem

Python tools for solving data-constrained finite element problems

Overview

This package provides a Python implementation of the Statistical Finite Element Method (FEM) as described in the paper by Girolami et al. [1] to use data observations to constrain FEM models. The package builds on top of Firedrake [2] to assemble the underlying FEM system and uses PETSc [3-4] to perform the sparse linear algebra routines. These tools should allow the user to create efficient, scalable solvers based on high level Python code to address challenging problems in data-driven numerical analysis.

Installation

Installing stat-fem

stat-fem requires a working Firedrake installation. The easiest way to obtain Firedrake is to use the installation script provided by the Firedrake project on the firedrake homepage.

curl -O https://raw.githubusercontent.com/firedrakeproject/firedrake/master/scripts/firedrake-install
python3 firedrake-install --install git+https://github.com/alan-turing-institute/stat-fem#egg=stat-fem

This will install Firedrake and install the stat-fem library inside the Firedrake virtual environment. If this does not work, details on manual installation are provided in the documentation.

Using a Docker Container

Alternatively, we provide a working Firedrake Docker container that has the stat-fem code and dependencies installed within the Firedrake virtual environment. See the docker directory in the stat-fem repository.

Testing the installation

The code comes with a full suite of unit tests. Running the test suite uses pytest and pytest-mpi to collect and run the tests. To run the tests on a single process, simply enter pytest into the running virtual environment from any location in the stat-fem directory. To run the test suite in parallel, enter mpiexec -n 2 python -m pytest --with-mpi or mpiexec -n 4 python -m pytest --with-mpi depending on the number of desired processes to be used. Tests have only been written for 2 and 4 processes, so you may get a failure if you attempt to use other choices for the number of processes.

Example Scripts

An example illustrating the various code capabilities and features is included in the stat-fem/examples directory.

Contact

This software was written by Eric Daub as part of a project with the Research Engineering Group at the Alan Turing Institute.

Any bugs or issues should be filed in the issue tracker on the main Github page.

References

[1] Mark Girolami, Eky Febrianto, Ge Yin, and Fehmi Cirak. The statistical finite element method (statFEM) for coherent synthesis of observation data and model predictions. Computer Methods in Applied Mechanics and Engineering, Volume 375, 2021, 113533, https://doi.org/10.1016/j.cma.2020.113533.

[2] Florian Rathgeber, David A. Ham, Lawrence Mitchell, Michael Lange, Fabio Luporini, Andrew T. T. Mcrae, Gheorghe-Teodor Bercea, Graham R. Markall, and Paul H. J. Kelly. Firedrake: automating the finite element method by composing abstractions. ACM Trans. Math. Softw., 43(3):24:1–24:27, 2016. URL: http://arxiv.org/abs/1501.01809, arXiv:1501.01809, doi:10.1145/2998441.

[3] L. Dalcin, P. Kler, R. Paz, and A. Cosimo, Parallel Distributed Computing using Python, Advances in Water Resources, 34(9):1124-1139, 2011. http://dx.doi.org/10.1016/j.advwatres.2011.04.013

[4] S. Balay, S. Abhyankar, M. Adams, J. Brown, P. Brune, K. Buschelman, L. Dalcin, A. Dener, V. Eijkhout, W. Gropp, D. Karpeyev, D. Kaushik, M. Knepley, D. May, L. Curfman McInnes, R. Mills, T. Munson, K. Rupp, P. Sanan, B. Smith, S. Zampini, H. Zhang, and H. Zhang, PETSc Users Manual, ANL-95/11 - Revision 3.12, 2019. http://www.mcs.anl.gov/petsc/petsc-current/docs/manual.pdf

stat-fem's People

Contributors

Stargazers

Watchers

Forkers

jp2011 mengjing120 chensending mginoya tittuvmathew danielandresarcones

stat-fem's Issues

Solver and Interpolation Matrix Docs

Need to flesh out docstrings for the solve functions (the one-off wrappers) and interpolation matrix.

Priors

Priors should be available through a shared library with the UQ code, since the framework is identical.

Update Dockerfile

Ensure that the Dockerfile is correctly setup to use the released version on master.

Time dependent problems

The library should be able to handle time-dependent linear problems as a Linear FEM is solved at each time step in most treatments of time-dependent problems. At the moment this would have to be done manually, but perhaps some convenience functions could handle this for the user.

Related question, which I do not know the best answer to at the moment: How to do estimation for a time-dependent problem?

Add Firedrake '21 Talk to Documentation

Convert talk into RST files for inclusion in documentation. I was unable to find a way to have either a notebook or a more version-controllable version of a notebook hold this and then convert to RST to include the notebook in the documentation (or at least not within the time I had available for putting the talk in the docs), so this will have to be the current solution.

Examples

Code needs some working examples.

Base sparse matrix class

Should there be a base class for the dressed up sparse matrix classes?

Consider matrix-free methods for forcing covariance matrix

We never care about the forcing covariance matrix in isolation, only its action on a vector. Thus, we do not necessarily need to explicitly form it and would only need to implement some version of matrix-vector multiplication. (Looking at the code for the ForcingCovariance class should make this apparent that this is the only operation defined for the class.)

Multiple observations

Allow the model to accept repeated measurements in some fashion and compute the appropriate posterior.

Hessian Computation

The Hessian of the log likelihood can be easily computed in closed form, add a method for doing this.

Memory Deallocation with PETSc objects

Revisit if memory deallocation and garbage collection is done carefully.

Add a changelog

Add a changelog file to keep track of the evolution of the software from the start.

readthedocs build

Building the documentation requires installing stat-fem to get the correct version number. However, the installation requires Firedrake, which has a complex build process. Work on fixing this. A few thoughts:

Could use the docker container the way that the tests currently do. Would be tricky to get the docs out of it, though, and the build scripts would need to be manually configured.
readthedocs builds in a virtual environment, but the Firedrake installer needs to create its own virtual environment. I imagine there is a workaround.
Or, write a manual install script to be run inside the virtualenv? Seems like the wrong solution.

The right answer is probably 2, but I need to test this out.

Make interpolation matrix true interpolation

At the moment, the interpolation matrix does a projection to map the sensor data point to the FEM mesh points. This is correct for piecewise linear elements (1st order continuous Galerkin or Lagrange finite elements), but is not correct for the general case. To be truly correct, this should be an interpolation rather than a projection, but interpolation requires knowledge of the FEM basis functions and depends on more complex workings under the hood to calculate this correctly. Projection is easier to compute in the general case, and thus for the initial work on this I simply implemented projection to get the library working.

However, given that other approximations are made throughout the computation, it is unclear if this makes a significant difference and needs to be studied in more depth to determine how much error this introduces to the computation.

Fix setup.py license

setuptools no longer allows specifyling license as a list of strings. Need to fix this.

Boundary Conditions

What is the correct way to handle Dirichlet Boundary Conditions? BCs are disabled when solving the linear system as that was what was done in the other papers, but I don't completely understand why. Worth tracking this down and clearly explaining what is going on in the documentation.

Fix Interpolation Matrix to use native Firedrake version

Recent updates to interpolation in Firedrake should simplify the computation of the interpolation matrix and generalize it to other finite element spaces. This needs to be implemented and tested.

Polish docs Makefile

The Makefile is a bit opaque, as all commands are routed to Sphinx and not all of them show up in the help command (for instance clean). This is worth looking into.

Better documentation

Improve the docstrings and set up sphinx documentation.

Parallelization issues

At the moment, all processes have to collectively call the expensive _solve_forcing_covariance routine each time, even though some solves only need to be done on the root process of the ensemble. This presumably is best solved with some re-configuring of the MPI ensemble communicators to create a better master/worker division, where master does all "base" FEM solves, while the "sensor" solves are divided up among the remaining worker processors. Performance hit at the moment is probably minimal, though, as the number of sensor solves is presumably much larger in most cases.

Prediction

Write routines for predicting values at new data points.

Improve PETSc options

Firedrake gives more control over the PETSc options that are available, this should exend to the solves done by the LinearSolver class (can just instantiate a Firedrake version, which is reusable, thus we can just pass on all solves to that object).

Fix assemble options no longer support firedrake assembly

Firedrake has updated their assemble options. To avoid the need to update these due to further upstream changes in the future, this capability should be deprecated.

Ensemble Parallelism docs

Write some more details on ensemble parallelism (in particular give a full example).

Cache covariance and Cholesky factor when computing log-likelihood

Estimation routines currently do not cache things that are expensive to compute and are re-used. In particular, the forcing covariance matrix only needs to be computed once (it is currently computed twice at each step in the minimization routine), and the Cholesky factorization of the combined covariance matrix is computed twice (once in the log-likelihood, once in the derivatives). Caching these computations will dramatically improve the speed of estimation routines.

Update statFEM paper

The paper that this software is based on has now published. The documentation (readme and docs) need to update the references based on this to link to the published paper.

Parallel tests still hang

I still have issues with the parallel tests hanging. It seems to occur reliably with 4 processes when running a sufficient fraction of the test suite and hangs without a clear reason why. (2 processes seems reliably fine.) When I just run one file at a time it appears to be fine, but once I add enough other tests it seems to cause issues. Changing the order doesn't seem to have a predictable pattern associated with it, but it is usually the LinearSolver tests or the ForcingCovariance tests that are most prone to this problem.

I don't think it is a memory usage issue -- the base test suite consistently uses ~100 MB of memory, and the parallel test suite uses roughly 4 times that as expected. This amount is nowhere near what would be needed to cause any issues.

For whatever reason, it never seems to have trouble on the Travis CI, but on my Mac or in my own version of the Docker container it is more problematic.

However, I have no idea at this point what else could be causing this other than some MPI issue that I have failed to uncover.

Coverage reports

I can't seem to figure out why the coverage reports aren't uploading. A couple things to explore:

Can I just install firedrake manually within the travis build? Coverage reports seem more straightforward if not in a docker container.
I managed to get things working before on the travis branch, then they stopped. Look back at that commit and what changed after that to get some ideas.
Keep tweaking? I don't know why I can never get the bash uploader to work...

Computing the forcing covariance

Computing the forcing covariance is expensive as it is a large matrix (that is potentially dense, depending on the covariance parameters that are used). It cannot be stored for FEM problems of interest, so we use a sparse approximation to it in the computations. However, we do not know in advance what the structure of this sparse approximation of the matrix looks like, so we still have to compute all matrix elements in order to form the sparse version. A further complication is that memory allocation in PETSc is expensive, so the fact that we cannot know in advance how big this matrix is makes this a large computational expense.

At the moment, we deal with these issues by pre-computing all elements and only storing those that are needed in a Python dictionary. From the pre-computed version we can allocate memory and form the PETSc matrix. This has two issues: (1) this requires doing simple Python loops, which can be slow, and (2) the Python PETSc interface does not give as much flexibility when allocating memory for sparse matrices as the C interface. This means that significant improvements in speed and memory usage can be obtained by writing the matrix formation routines as a C extension. I believe that pre-computing the elements is still the best approach, which would mean that something like a linked list (as a replacement for the Python dictionary) is probably the best way to store the matrix elements prior to allocating memory and forming the PETSc matrix.

Change to assemble function signature

It appears that the new function signature for assemble has changed. Need to update the function wrapper based on this.

Estimation

Finish implementation of estimation routines. Ideally, mimic behavior in the UQ platform where you can do either MLE or MCMC, and the MLE is used to initiate the MCMC and pick a good step size by inverting the Hessian. This will involve some (small) additional overhead to compute the necessary derivatives.

Documentation Build Error

Docutils introduced a change that was not backwards compatible with Sphinx in a recent version:

https://blog.readthedocs.com/build-errors-docutils-0-18/

Fix this by bumping minimum version of sphinx in docs/requirements.txt

Model updating

If multiple observations are made over time, provide a way to update the posterior by providing additional measurements to a linear solver object. Related to #11, and should be easier to implement as it can (I think) use existing machinery to do the update.

Clarify which mean outputs are/should be scaled by the model discrepancy

It is not clear if means computed in the data space are scaled by the discrepancy (I believe they are not, but I had to think about it which means that is is not clear). Getting the scaled mean is likely to be a common desired output, so the solver class should handle this for the user.