Giter Club home page Giter Club logo

morpheus's Introduction

Introduction

Sparse matrices are a key component of the performance critical computations in many numerical simulations. A desire to represent sparse matrices efficiently in memory and optimise , in particular given the evolution of hardware architectures, has over the years led to the development of a plethora of sparse matrix storage formats. Each format is designed to exploit the particular strengths of an architecture or the specific sparsity pattern of a matrix. The choice of the format can be crucial in order to achieve optimal performance. Being able to dynamically select storage formats at runtime is therefore highly desirable.

Morpheus, is a library of sparse matrix storage formats that is designed for efficient and transparent format switching across architectures without introducing prohibitive overheads and that, at the same time, enables the straightforward addition of new storage formats without major application code changes. The library has a functional design which separates the containers that implement the storage formats from the algorithm that implement operations such as multiplication. Each container is aware of the memory space it resides in, and the algorithms require knowledge of the execution space that they will be running in. This design allows Morpheus to target multiple heterogeneous architectures.

To enable efficient yet flexible dynamic switching, Morpheus uses the std::variant introduced in C++17 - hence it requires a C++17 compliant compiler - as a type-safe union. It offers a polymorphic variable called DynamicMatrix which abstracts the notion of sparse matrix storage formats away from users and can hold any of the supported formats internally.

Installation

A very basic installation using CMake is done with:

$ cmake ${srcdir} \
   -DCMAKE_CXX_COMPILER=g++ \
   -DCMAKE_INSTALL_PREFIX=${my_install_folder}
   -DKokkos_ROOT=${kokkos_install_prefix}
$ make
$ make install

which configures, builds and installs a default Morpheus. Note that Morpheus REQUIRES Kokkos v.3.5.0 to be installed. Note that Morpheus inherits the enabled devices and compiler optimization flags from Kokkos.

Once Morpheus is installed In your CMakeLists.txt simply use:

find_package(Morpheus REQUIRED)

Then for every executable or library in your project:

target_link_libraries(myTarget Morpheus::morpheus)

More information can be found at BUILD.md

Documentation

Documentation can be found here.

Supported Formats

Format Container Serial OpenMP CUDA HIP Kokkos
Coo CooMatrix yes yes yes yes no
Csr CsrMatrix yes yes yes yes yes
Dia DiaMatrix yes yes yes yes yes

Specifying a container

To define a container we need to specify four template parameters:

  • ValueType: The type of the values the container will hold. Valid types must satisfy std::is_arithmetic i.e to be an arithmetic type.
  • IndexType: The type of the indices the container will hold. Valid types must satisfy std::is_integral i.e to be an integral type.
  • Layout: Orientation of data in memory. Valid layouts are either Kokkos::LayoutLeft (Column-Major) or Kokkos::LayoutRight (Row-Major).
  • Space: A memory or execution space supported by Morpheus.
    • Valid Memory Spaces are HostSpace, CudaSpace and HIPSpace.
    • Valid Execution Spaces are Serial, OpenMP, Cuda and HIP.
    • Spaces can be Generic or Custom by specifying the appropriate namespace eg: Morpheus::Generic::HostSpace and Morpheus::Custom::HostSpace represent the generic and custom Host Memory space respectively.
    • Note that specifying the execution space will determine in which space each member function will be executed. By not providing a space parameter Morpheus will choose a default one for you.

Note that only ValueType is mandatory. For the rest of the arguments, if not provided, sensible defaults will be selected.

#include <Morpheus_Core.hpp>

int main(){
    /* 
     * ValueType        : double
     * IndexType        : long long
     * Layout           : Kokkos::LayoutRight
     * MemorySpace      : Morpheus::HostSpace 
     */
    Morpheus::CooMatirx<double, long long, Kokkos::LayoutRight, Morpheus::HostSpace> A;  

    /* 
     * ValueType        : double
     * IndexType        : int (Default)
     * Layout           : Kokkos::LayoutRight (Default), 
     * MemorySpace      : Morpheus::DefaultSpace (Default) 
     */
    Morpheus::CsrMatirx<double> B; 
}

Using an Algorithm

For each algorithm the same interface is used across different formats. Algorithms are aware of the execution space they will be executed in and dispatch depends on that too. Currently we support the following algorithms for each of the supported storage formats:

  • Multiply (Sparse Matrix-Vector Multiplication)
  • Copy
  • Convert
  • Copy by key
  • Print
  • Update diagonal
  • Read/Write matrix market
#include <Morpheus_Core.hpp>

int main(){
    // [ 3.5   *   * ]
    // [  *   1.5  * ]
    Morpheus::CooMatirx<double,  Kokkos::HostSpace> A(2, 3, 2);  
    Morpheus::DenseVector<double,  Kokkos::HostSpace> x(3, 0), y(2, 0); 
    
    // Initializing A
    A.row_indices(0) = 0;
    A.column_indices(0) = 0;
    A.values(0) = 3.5;

    A.row_indices(1) = 1;
    A.column_indices(1) = 1;
    A.values(1) = 1.5;

    // Initializing x
    x(0) = 1; x(1) = 2; x(2) = 3;

    // y = A * x
    Morpheus::Multiply<Morpheus::Serial>(A, x, y);
}

Use of dynamic matrix

The dynamic matrix tries to absrtract away the different supported formats and provide an efficient yet flexible switching mechanism between them. The dynamic matrix follows the same interface as the other containers hence algorithms can be used in the same way.

#include <Morpheus_Core.hpp>

int main(){
    Morpheus::DynamicMatirx<double> A;  // Default format is COO

    A.activate(Morpheus::CSR_FORMAT)    // Active type now is CSR

    A.activate(Morpheus::DIA_FORMAT)    // Active type now is DIA

    A.activate(0)    // Active type now is the first in the DynamicMatrix
}

Building and Running the Tests and Examples

Building the tests and examples requires the GTest testing framework, which ships together with Morpheus.

To build and run the tests add the -DMorpheus_ENABLE_TESTS=On during configuration stage respectively. Note that tests for different devices are enabled based on how Kokkos was configured.

After configuration, to build and run the Serial tests do:

$ cd  ${srcdir}
$ make MorpheusCore_UnitTest_Serial
$ ${srcdir}/core/tests/MorpheusCore_UnitTest_Serial

Same process can be followed for OpenMP, Cuda and HIP.

To build and run the examples add the -DMorpheus_ENABLE_EXAMPLES=On during configuration. Similar process is followed as building the tests.

License

This software is licensed under the Apache License, Version 2.0. See the LICENSE file for details.

morpheus's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

morpheus's Issues

Add GPU Support

  • Introduce Cuda memory and execution spaces using Kokkos
  • Introduce HIP memory and execution spaces using Kokkos

README update

Update README.md to include information on how to build Morpheus, its purpose and what it can do.

Add Level 1 Blas support

  • Unified interface for Level 1 Blas operations to be used by iterative sovlers.
  • Would either choose our own implementation or the one provided by different libraries such as MKL, cuBlas etc.
  • Functionality will be added on demand from what is needed by different solvers.

Add support for complex numbers

  • Support complex numbers both on CPU and GPU containers and algorithms either using Kokkos or our own class.
  • Support also for IO

Support for AMD GPUs

Enable support for AMD GPUs:

  • Update Build process
  • Morpheus_HIPUtils.hpp
  • Unify WARP_SIZE (Cuda=32, Rocm=64)
  • Compile & Run tests for containers
  • Kernel Wrappers and Launchers
  • Update kernel to not use any Cuda specific commands & refactor to take into account also Rocm's warp size

create_mirror_container_copy()

Extend the interface in core/src/Morpheus_MirrorContainers.hpp to support Morpheus::create_mirror_container_copy() which essentially is a mirror creation and deep copy in one line.

Single value copy

Enable copy of single variable across spaces in deep copy semantics.

Custom Backend

  • Dispatch algorithms based on the execution space.
  • For each execution space the algorithm is explicitly written.

Unit tests

  • Test core
  • Test Containers
    • Vector
    • DenseMatrix
    • CooMatrix
    • CsrMatrix
    • DiaMatrix
    • DynamicMatrix
  • Test Algorithms
    • Serial
    • OpenMP
  • Test IO

Update GPU Kernels

Update GPU Kernels to utilise the full warp (64 threads) for AMD HIP backend:

  • spmv_csr_vector_kernel()
  • __spmv_csr_vector()
  • spmv_coo_flat_kernel()
  • segreduce_warp() - removed as is not used anywhere

SpMV Kernels using Kokkos

  • SpMV Dia:
    • Check hierarchical parallelism version for correctness.
    • Enable shared/scratch memory.
  • SpMV Coo:
    • Segmented scan by key.
    • SpMV with segmented scan.
    • SpMV with Hierarchical Parallelism.

Convert to free function

bool exceeds_tolerance(const index_type num_rows,
const index_type num_entries,
const index_type num_diagonals) {
const float max_fill = 10.0;
const float threshold = 100e6; // 100M entries
const float size = float(num_diagonals) * float(num_rows);
const float fill_ratio = size / std::max(1.0f, float(num_entries));
if (max_fill < fill_ratio && size > threshold) {
return true;
} else {
return false;
}
}

Not really a member function of diagonal matrix - should be moved outside of the class and defined as free function instead or the parameters of instantiated dia matrix shall be used instead.

Fence and synchronization

  • Add Kokkos::fence() when necessary after each Kokkos' parallel dispatch.
  • Add cudaDeviceSynchronize() when necessary - might not as we are using only one stream at the moment.

Organise type traits

  • Organise and formalize type traits in core/src/Morpheus_TypeTraits.hpp.
  • Test traits.
  • Update traits in code.

Kokkos Backend

  • Use Kokkos' parallel dispatch mechanism to target different execution spaces and write only a single algorithm for each container.

Resizing during the copy operation

During copy the containers are resized to match the destination shape.

A correct approach will be to use the create_mirror...() operation first to create and allocate the correct container shape and then apply the copy operation.

Also a smaller sized source container should also be copyable assuming the copy restrictions are matched.

Construct from raw pointers

Enable container construction from raw pointers:

  • Enabled Unmanaged Memory Trait in containers
  • Construction of containers from raw pointers is enabled only when the Memory Trait is set to Unmanaged
  • Morpheus::DenseVector
  • Morpheus::DenseMatrix
  • Morpheus::CooMatrix
  • Morpheus::CsrMatrix
  • Morpheus::DiaMatrix
  • Morpheus::DynamicMatrix - Need to check here how it behaves when switching to another format.

Invalid results for DIA and ELL on GPU backends with LayoutRight

When a DiaMatrix or EllMatrix is constructed with Kokkos::LayoutRight returns invalid results on Morpheus::Cuda and Morpheus::HIP for multiply() and update_diagonal() algorithms, due to improper construction of the linearized index in each of the following kernels:

  • update_dia_diagonal_kernel() in core/src/impl/Dia/Kernels/Morpheus_MatrixOperations_Impl.hpp
  • spmv_dia_kernel() in core/src/impl/Dia/Kernels/Morpheus_Multiply_Impl.hpp
  • update_ell_diagonal_kernel() in core/src/impl/Ell/Kernels/Morpheus_MatrixOperations_Impl.hpp
  • spmv_ell_kernel() in core/src/impl/Ell/Kernels/Morpheus_MatrixOperations_Impl.hpp

Update Matrix Diagonal

Enable support for the user to be able to update matrix diagonal:

  • Each matrix format has a diagonal vector diag of size num_rows that contains the index of diagonal values in the values array.
  • values[diag[i]] = k updates the diagonal entry of row i

Github workflows

Include Github workflows to launch unit tests over every new push on development branches and on master.

  • on GitHub's runners (CPU)
  • on self-hosted runners (CPU+CUDA+HIP)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.