Giter Club home page Giter Club logo

neon's Introduction

Build Status Documentation Status Documentation License: MIT

neon

A non-linear finite element code. This project is still under development and is not considered stable.

Building

Before using the package, the code needs to be compiled and installed on the local system. There are two ways of accomplishing this task; using a docker image in a secure environment with the dependencies already handled, or compiling this on the host operating system.

Docker

Go to the top level directory in the git repository and enter

  • sudo docker build -t neon .

This will pull down the base image and then clone the git repository inside the docker container. The original hosting can be found at https://hub.docker.com/r/dbeurle/neon/

Once this is completed, enter the following commands to enter the docker container

  • $ sudo docker run -i -t neon /bin/bash

The instructions in the section below can now be followed with the copy of the repository inside the docker container without worrying about dependencies.

Linux

The external dependencies are:

  • Boost filesystem (to be replaced with std::filesystem)
  • Pastix and MUMPS for direct linear solvers
  • VTK for writing out simulation data to a ParaView compatible format
  • An OpenMP enabled C++17 compiler
  • Intel Thread Building Blocks TBB

Other dependencies are pulled in during build time with CMake and include

  • Eigen for linear algebra
  • range-v3 for range support
  • Termcolor for colour terminal support
  • Catch for unit testing

For best performance build with native optimisations on for the machine you are using. This will automatically trigger Eigen to use wider SIMD instruction when available on native architecture at the expense of portability.

The build instructions for debug are

  • $ mkdir build && cd build/
  • $ cmake ..
  • $ make all

and for release mode with optimisations

  • $ mkdir build && cd build/
  • $ cmake -DCMAKE_BUILD_TYPE=Release ..
  • $ make all -j<# cpus>
  • $ sudo make install

If the code is only used on a particular machine, then machine specification optimisations can be invoked by specifying the CMake symbol

  • -DENABLE_NATIVE=1

which enables the compiler to generate the best machine code for your platform.

If you have an NVIDIA graphics card, then you can use the CUDA enabled iterative solvers by specifying the CMake symbol

  • -DENABLE_CUDA=1

CUDA doesn't follow the latest GCC so a compatibility package is usually provided. CMake can be told to use a specific compiler for the host compilation through

  • -DCMAKE_CUDA_HOST_COMPILER=<compiler>

For checking the successful compilation of the program, invoke the test suite by executing

  • $ ctest

in the build directory.

Ubuntu 18.04

Install dependencies through the package manager:

$ sudo apt install cmake git mercurial zlib1g-dev libcurl4-openssl-dev libvtk6-dev libtbb-dev libboost-filesystem-dev libmumps-seq-dev libopenblas-dev libarpack2-dev libscotch-dev hwloc libhwloc-dev gfortran

Then clone the repository and add

$ git clone https://github.com/dbeurle/neon.git

and enter the repository

$ cd neon/docker && sh install_pastix.sh

After this compiles and fails to install, enter the commands to install and link the libraries

$ cd pastix_5.2.3/build && sudo make install && sudo ln -s /usr/local/lib/libpastix.so /usr/lib/libpastix.so

Provide the sudo password when prompted.

Now we are going to create the build directory and compile. Navigate back to the neon directory and create and enter the build directory

$ mkdir build && cd build

We can create the compilation environment and setup optimisations using CMake with

$ cmake -DCMAKE_BUILD_TYPE=Release ..

Finally compiling with

$ make all -jN

where N is the number of parallel build jobs you want to run.

You can then install using

$ sudo make install

and run the test suite with

$ ctest -jN

to ensure the program is behaving as expected.

Licensing

See the LICENSE.md file for the project license and the licenses of the included dependencies.

Contributions

Many thanks to the contributors of ideas, code and theoretical discussions:

  • Shadi Alameddin (code reviews, verification, bug reports, theoretical and code contributions, damage model)
  • Shannon Beurle (documentation, quality, beam section properties and tests)
  • Patricio Rodrigeuz (pyramid shape functions and quadrature)

If you are missing please open an issue and I'll happily add you to the list.

neon's People

Contributors

annierhea avatar dbeurle avatar shadisharba avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

neon's Issues

Lagrange multipliers for constraints

Lagrange multipliers can be used to enforce constraints on the displacements of the system to machine precision. The introduction of these constraints change the structure and increase of the sparse matrix and restrict the type of solver to be used to only direct linear solvers.

It should be possible to implement this partitioned structure for displacement boundary conditions for non-linear problems and for using periodic boundary conditions for example. Tie constraints also naturally fall out of this.

A constraint matrix can be formulated for each of these describing the derivative of the gap function used for the particular constraint and provided under a different class of boundary conditions.

  • Add interface class constraint
  • Support constraint dofs in fem::mesh and fem::matrix
  • Prohibit use of non-compatible linear solvers with zero-diagonal matrices

Plane strain formulation

Plane strain is probably an easy example to get working and experiment with. A finite strain example will allow debugging of the three-dimensional example and set up internal variable store etc.

Checkpoints and restarts

During long simulations it is useful to write out the state of the simulation such that it can be restarted from this checkpoint. The checkpoint should contain:

  1. History variables (accumulated scalars / tensors)
  2. Current displacements or nodal coordinates
  3. Current time step and step size.

Required changes

The classes which hold state will require new constructors or logic to populate accumulation fields from restart data. Whether the restart should handle that different data as input is debatable. If there is a checkpoint at which point a simulation diverges due to bad material properties or time expiry on a shared compute device, it would be sensible to allow a change however this would break the continuity of the analysis and should perhaps be avoided entirely.

Each constitutive model should specify the internal variables which are listed as the history variables and provide this set through a constant method. The fem_submesh that is responsible for the internal variables is also responsible for serialisation of the scalar and tensor variables. Due to the potentially large size of the internal variable set, the results should be in binary format with the option of an ASCII output.

Classes with restart information could provide this in json format so this could be written / appended to a restart file <simulation_name>.restart with a mapping to a binary object containing the history variables. This is best handled in the fem*matrix classes as the time / load incrementation occurs here.

Serialisation libraries

The cereal library is header only and allows for a binary representation of the matrices. It seems somewhat easy to implement:

#include <Eigen/Dense>

#include <cereal/archives/binary.hpp>
#include <cereal/cereal.hpp>

#include <fstream>
#include <type_traits>

namespace cereal
{
template <class Archive, typename MatrixType>
inline std::enable_if_t<traits::is_output_serializable<BinaryData<typename MatrixType::Scalar>, Archive>::value> save(
    Archive& archive, MatrixType const& m)
{
    typename MatrixType::Index rows = m.rows();
    typename MatrixType::Index cols = m.cols();
    archive(rows);
    archive(cols);
    archive(binary_data(m.data(), rows * cols * sizeof(typename MatrixType::Scalar)));
}

template <class Archive, typename MatrixType>
inline std::enable_if_t<traits::is_input_serializable<BinaryData<typename MatrixType::Scalar>, Archive>::value> load(
    Archive& archive, MatrixType& m)
{
    typename MatrixType::Index rows, cols;
    archive(rows);
    archive(cols);

    m.resize(rows, cols);

    archive(binary_data(m.data(),
                        static_cast<std::size_t>(rows * cols * sizeof(typename MatrixType::Scalar))));
}
}

int main()
{
    Eigen::MatrixXd test1 = Eigen::MatrixXd::Random(10, 3);
    Eigen::MatrixXd test2 = Eigen::MatrixXd::Random(3, 5);

    {
        std::ofstream out("eigen.cereal", std::ios::binary);
        cereal::BinaryOutputArchive archive_o(out);
        archive_o(test1);
        archive_o(test2);
    }

    std::cout << "test1:" << std::endl << test1 << std::endl;
    std::cout << "test2:" << std::endl << test2 << std::endl;

    Eigen::MatrixXd test1_loaded, test2_loaded;
    {
        std::ifstream in("eigen.cereal", std::ios::binary);
        cereal::BinaryInputArchive archive_i(in);
        archive_i(test1_loaded, test2_loaded);
    }
    std::cout << "Norm of test matrix 1: " << (test1 - test1_loaded).norm() << "\n";
    std::cout << "Norm of test matrix 2: " << (test2 - test2_loaded).norm() << "\n";
}

and also supports containers in the standard library and json output (https://uscilab.github.io/cereal/index.html).

Implementation steps

  • Have constitutive models populate a set of restart variables
  • Print out current time information
  • Create a checkpoint class to give to fem_submesh classes with file name information

Each fem_submesh owns and can print out their internal variables. This is because each submesh has a unique interpolation function and can printout based on this information alone.

Constitutive model

  • Add a serialise() method using the code snippet.

This function would need to include its parent (information about the element or an index) and a method to construct its data with restart information, using pseudo c++:

class internal_variables
{
public:
    /// Serialise data to file
    void serialise(std::string const& file_tag,
                      std::set<variable::scalar> const& scalars
                      std::set<variable::second> const& tensors);

    /// Deserialise from file to memory
    void deserialise(std::string const& file_tag);
};

Gradient recovery options

See 'A Framework for Finite Strain Elastoplasticity Based on Maximum Plastic Dissipation and the Multiplicative Decomposition, Part II'. This leads to a nice linear solve based on a diagonalised mass matrix rather than the average of the nodal values. Perhaps we can offer both methods?

Linear solver floating point error trapping

If there is a numerical issue, the linear solver could compute a NaN and no error is currently thrown. To fix this wrap the linear solvers in floating point environment exceptions similar to the internal variable update.

64 bit integers

I opened this issue is as a reminder to check.

There will be problems with integer overflow / undefined behaviour if there are more than roughly 2 billion elements or degrees of freedom. However the likelihood of such large meshes on a single node is low. We should check a large mesh and see if this will cause problems on a single node with high memory (~356 GB or there about) with iterative solvers.

An easy solution is to change everything 'scalable' to int64 and be done with it. This will require the linear solvers and associated ordering packages to be changed out to support 64 bit integers. The GPU solver will need changing because currently they are setup for 32 bit integers.

Distributed memory parallelisation

This issue tracks the progress towards inclusion of MPI based parallelisation. Before this is included there should be a set of test codes to ensure that the currently used libraries can be enabled with MPI. Once this is completed, the linear solver backends need to be changed and the io implementation will be swapped out to include parallel file io.

For a worthwhile implementation it makes sense to restrict the choice of solvers to a 'fully' distributed such that no particular process has a complete view of the stiffness matrix. This method scales better because no one process has a world view on the problem.

  1. Include external c++ wrappers for MPI (mpi-bind)
  2. Distributed PaStiX
  3. Distributed MUMPS (slight issues with this)
  4. CPU Iterative solver suite
  5. GPU Iterative solver suite
  6. Distributed VTK IO

Mixed finite elements

In order to avoid locking problems for incompressible models (such as the rubber models implemented), the weak form needs to be adjusted. This can be done through the Q1P0 family of elements, where the additional pressure variable is condensed out on the element level. However this is only valid for the discontinuous pressure field and higher order shape functions need to include the entire (u, p) function set and solve a partitioned system.

This is a tracking issue of the progress.

Fixed and adaptive time stepping

Provide an option for fixed and adaptive time/load stepping for static problems. Fixed time stepping has only a queue and the fixed times. Non-convergence will result in a failed simulation.

Coupled simulations

Including multiphysics will be essential for interactions between mechanical and diffusion processes. This requires a robust way of transferring data between the modules to avoid excessive memory copying.

Requirements

This feature requires new infrastructure in the preprocessor to schedule the simulations. The requirements of this structure are:

  • Propagation of results through multiple cases
  • Apriori error checking. This will require construction of objects and checking inputs
  • Arbitrary mixing of modules (mechanical followed by diffusion for example)
  • Efficient access to internal variables and solution vectors to avoid excessive memory copies of large structures

Proposed implementation

An algorithm for scheduling the simulations is required, as a particular module may have multiple steps, where the steps reuse the previous simulation state. The scheduling model must be generic enough to handle a combination of procedures.

Eigenvalue computation

We need eigenvalue solvers in the mix. Add in the unsupported ARPACK wrapper. This library is used in Octave and other numerical packages so it should be robust.

Unused variable warnings

Using structured bindings (decomposition bindings...?) and not using one of the variables results in lots of warnings. Using the [[maybe_unused]] specifier works in the latest clang and gcc. Once these new compilers hit the stable repos add the specifier to each call site where a variable is not used.

Killing off OpenMP

OpenMP looks like crap.

We should move towards c++11 standard threads or something a compatible with this library.

The main multithreaded portions of the program are inside the internal variables update and the matrix assembly. We can swap out those manual for loops for something like:

https://www.threadingbuildingblocks.org/tutorial-intel-tbb-generic-parallel-algorithms

Got some thoughts on this Shadi? We can still use an integer index which is handy when we have a lot of arrays to index into, and also means the code doesn't have to change too much.

Problem with the Tetrahedron4 and Triangle 3 element

There is a problem with the Tetrahedron4 element. The displacement field is correct but all the other fields like stress or damage are NAN. Tetrahedron10 works perfectly as you can see in the last picture.

linear_tet_displacement
linear_tet_stress
quadratic_tet

Python script for testing output

Complete integration tests would be useful in ensuring the results are actually correct. This could be done using a python script that checks against analytical solutions.

Linear solver performance optimisation

A trivial optimisation for the linear solver classes is to hold onto the state of the linear solver if the pattern hasn't changed. This was done for the LDLT PaStiX solver but it can be applied to all solvers.

This easy optimisation results in an impressive speed up of around 1/3 of the computation time on a unstructured grid with quadrature tetrahedrons, which is especially useful in non-linear solves where the sparse matrix pattern is unlikely to change.

Prism elements

Prism elements are the only missing common element from the library. We should add them in.

Implementation of cyclic loading

If we want to run fatigue computations, it will be easier to have a "cyclic" option that changes the interpolation of the boundary conditions to fit a sinusoidal wave. An input example may look like:

"Cyclic" : "True",
"Amplitude" : [0.12,0.10],
"Period" : [10,10],
"NumperOfCycles" : [2,1],
"Phase" : [0,0]

In this case "Times" and "Values" will be ignored and the respective values will be computed base on the given wave.

Finite strain plasticity

Finite strain plasticity in three dimensions is kaputt. The reference by Neto provides an implementation for finite plasticity (J2) that we can compare against.

  • Implement two dimensional theory and associated framework
  • Test against the reference implementation
  • Attempt the three dimensional theory again if it works

Type traits for mechanical problems

Fixed size numerical structures (vectors and matrices) are critical to extracting the maximum performance out of the code by providing as much information as possible to the optimiser. Depending on each simulation type, the matrices representing the material tangent, stress and deformation tensors are known. Some generic classes, for example the internal variables class, already take template parameters for the size of the rank two and rank four tensor. The dimension information (one, two or three) and the degrees of freedom can be encoded into these traits.

To deal with the quirks of each simulation type (axisymmetric, plate, shell, solid, beam, plane strain and plane stress) we should introduce a traits class that specifies this information for each simulation type. This generates a little additional work but should encapsulate the size information.

Firstly, we need to introduce a strongly-typed enum to provide a compile-time logic.

#include <type_traits>

enum class type : uint8_t {plane_stress, plane_strain, solid, plate, shell, axisymmetric};

template <type T, bool>
struct traits;

template <type::solid, bool is_symmetric = true>
struct traits
{
    static auto constexpr value_type = type::solid;
    static auto constexpr size = 3;

    using rank_two_tensor = matrix3;
    using rank_four_tensor = std::conditional<is_symmetric, matrix6, matrix9>::type;

    using internal_variables_type = internal_variables<rank_two_tensor, rank_four_tensor>;
    using constitutive_type = constitutive_model<type::solid, internal_variables_type>;
};

This should be able to handle the unsymmetrical 9x9 constitutive models for three-dimensional theory if required through the template parameter. In addition, if analysis specific post-processing steps are required, further template specialisations can be used for the parts which are not common.

A downside of this approach is the additional boiler plate and the fact that most of the implemented code will still be dependent on the type with what code can be reused, will be.

Follower loads

Pressure loads follow the deformation of the body. This introduces another part of the stiffness matrix called the 'load stiffness' which causes the system to lose its symmetry. Issue #11 allows the linear solver to select an appropriate solver if the system is unsymmetric.

  • Add follower_pressure class with external_stiffness and external_force interface

Better non-linear stopping criteria

For some exotic material models (i.e. this rubber ageing) there are some problems with defining a sensible tolerance. We should add the option in for using an absolute or relative tolerance from the input file.

Asynchronous file io

The routines where the mesh is gathered and written to VTK format have multiple distinct tasks. It should be possible to spawn asynchronous tasks (with std::async) to fill the mesh data structure, extrapolate nodal variables and add these into the VTK container with the appropriate locks. This would ensure that file io is loosely parallelised and should net a speedup for larger models.

This asynchronous approach could also be used in other parts of the code (preprocessing)

Displacement controlled loading

Displacement controlled loadings as they currently stand cause spurious plastic deformations in the elements connected to displacement nodes. This is because the load is applied such that high localised strains are computed at the quadrature points.

It should be that a global or local linear solution is computed first as an approximation to the deformation state.

The update of the internal variables

The update of the internal variables with a non-zero time step is done before invoking perform_equilibrium_iterations and it is done again inside perform_equilibrium_iterations with a zero time step. However, for visco-plastic materials, the update with a non-zero time step should be done in the first N-R iteration only.

Constitutive models with unsymmetric matrices

Currently this is not enforced and needs to occur at time of construction of the fem*Matrix. The mesh objects should be able to report if they will produce an unsymmetric matrix based on their constitutive model / boundary conditions.

Linear solver deduction

Currently the input file specifies the choice of linear solver as "ConjugateGradient" or "SparseLU", but this choice can conflict with the choice of constitutive model, constraint enforcement and line search methods. A better approach would be to specify the class of solvers and have the code automatically deduce the best approach based on:

  • Constitutive model
  • Nonlinear solver options
  • Constraint equations
  • Boundary conditions (follower loads)

The suggested supported solvers options could be

  • "PaStiX" (LU, LLT) direct solver
  • "MUMPS" (LU, LLT) direct solver
  • "Direct" (Eigen inbuilt LU, LLT)
  • "Iterative" (Eigen CG and BiCGStab (or GMRES))
  • "IterativeGPU" (CG and BiCGStab on the GPU)
  • "DirectGPU" (QR factorisation from CUDA library)

Changes required to be implemented are:

  • Mesh classes query their objects to determine symmetry status
  • femMatrix classes check their nonlinear options and meshes and deduce correct solver

The class structure of linear solvers should follow the corresponding hierarchy with two classes Iterative and Direct serving as base classes with the derived classes containing specialisations (iterative have tolerances and direct solvers with saving of the factorised matrices). For iterative solvers, there are options to specify the precondition to improve convergence properties. These could also be specified when possible in a solver agnostic way, with "Preconditioner" : "Diagonal" or "Preconditioner" : "IncompleteFactorisation", or with a similar method.

An implementation of these changes should increase the linear solver efficiency and avoid incorrect results.

Boundary std::variant framework

WIth c++17 a variant type (a type safe union) has been introduced into the standard library. Instead of using inheritance and polymorphic pointers, the boundary conditions can be modelled as a union of types with an .external_force() parameter and accessed through a visitor and a lambda.

For this, we need to split the boundary conditions into two types; those who contribute to the right hand side and those who contribute terms in the stiffness matrix and terms to the right hand side.

The variant type can be defined as

using boundary_type = std::variant<Traction, Pressure, BodyForce>;

A factory method can allocate the correct type to the variant

// Construct a list of boundary conditions
std::vector<boundary_type> boundary_conditions;
boundary_conditions.emplace_back(Pressure{});
boundary_conditions.emplace_back(Traction{});
boundary_conditions.emplace_back(BodyForce{});

Within the system matrix assembler, the original code requires a small modification

for (auto const& boundary_condition : boundary_conditions)
{
    std::visit([&](auto const& boundary) {
        // Assemble boundary condition contribution for the right hand side
        // Loop over elements
        for (auto element = 0; element < elements; ++element)
        {
        }
    }, boundary_condition);
}

From the input file, the traction boundary conditions are usually given as time history for each degree of freedom:

"x" : [1.0, 2.0, 3.0],
"y" : [0.0, 1.0, 2.0],
"z" : [0.0, 0.0, 0.0]

This format needs to be stored internally for operation. Currently, each traction boundary is held in a std::vector inside a container class. An advantage is that each degree of freedom for the traction is handled separately and does not contain a notion of what degree of freedom it represents. This abstracts all code and allows greater flexibility while simplifying the base class.

A downside is that this approach causes dynamic memory allocation. It would be easier to have a std::array<> and set a boolean flag for active or inactive DoFs.

For handling the boundary conditions the following framework is proposed. We are primarily concerned with a few common types for non-follower loads to begin with

  • Traction
  • Body force (gravity)
  • Pressure

Traction boundary condition are composed of three DoFs in the most general case. Between time/load steps, the traction value needs to be interpolated. This can be generalised to a Vector3 type.

Each boundary condition needs to have an encoding between the given DoF, "x" for example, and the corresponding numerical number. This data is specific to the physics being simulated. A solid mechanics simulation may have a different ordering to an axisymmetric model and so forth.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.