Giter Club home page Giter Club logo

cudecomp's Introduction

cuDecomp

An Adaptive Pencil Decomposition Library for NVIDIA GPUs

Introduction

cuDecomp is a library for managing 1D (slab) and 2D (pencil) parallel decompositions of 3D Cartesian spatial domains on NVIDIA GPUs, with routines to perform global transpositions and halo communications. The library is inspired by the 2DECOMP&FFT Fortran library, a popular decomposition library for numerical simulation codes, with a similar set of available transposition routines. While 2DECOMP&FFT and similar libraries in the past have been written to target CPU systems, this library is designed for GPU systems, leveraging CUDA-aware MPI and additional communication libraries optimized for GPUs, like the NVIDIA Collective Communication Library (NCCL) and NVIDIA OpenSHMEM Library (NVSHMEM).

Please refer to the documentation for additional information on the library and usage details.

This library is currently in a research-oriented state, and has been released as a companion to a paper presented at the PASC22 conference (link). We are making it available here as it can be useful in other applications outside of this study or as a benchmarking tool and usage example for various GPU communication libraries to perform transpose and halo communication.

Please contact us or open a GitHub issue if you are interested in using this library in your own solvers and have questions on usage and/or feature requests.

Build

Method 1: Makefile with Configuration file (deprecated)

To build the library, you must first create a configuration file to point the installed to dependent library paths and enable/disable features. See the default nvhpcsdk.conf for an example of settings to build the library using the NVHPC SDK compilers and libraries. The configs/ directory also contains several sample build configuration files for a number of GPU compute clusters, like Perlmutter, Summit, and Marconi 100.

With this configuration file created, you can build the library using the command

$ make -j CONFIGFILE=<path to your configuration file>

The library will be compiled and installed in a newly created build/ directory. This build method is deprecated and will be removed in a future release.

Method 2: CMake (recommended)

We also enable builds using CMake. A CMake build of the library without additional examples/tests can be completed using the following commands

$ mkdir build
$ cd build
$ cmake ..
$ make -j

There are several build variables available to configure the CMake build which can be found at the top of the project CMakeLists.txt file. As an example, to configure the build to compile additional examples and enable NVSHMEM backends, you can run the following CMake command

$ cmake -DCUDECOMP_BUILD_EXTRAS=1 -DCUDECOMP_ENABLE_NVSHMEM=1 ..

Dependencies

We strongly recommend building this library using NVHPC SDK compilers and libraries, as the SDK contains all required dependencies for this library and is the focus of our testing. Fortran features are only supported using NVHPC SDK compilers.

One exception is NVSHMEM, which uses a bootstrapping layer that depends on your MPI installation. The NVSHMEM library packaged within NVHPC supports OpenMPI only. If you require usage of a different MPI implementation (e.g. Spectrum MPI or Cray MPICH), you need to either build NVSHMEM against your desired MPI implementation, or build a custom MPI bootstrap layer. Please refer to this NVSHMEM documentation section for more details.

Additionally, this library utilizes CUDA-aware MPI and is only compatible with MPI libraries with these features enabled.

License

This library is released under a BSD 3-clause license, which can be found in LICENSE.

cudecomp's People

Contributors

p-costa avatar romerojosh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

cudecomp's Issues

Missing MPI import in `cudecomp.h`

Following the removal of MPI, cutensor, and NCCL imports at the top of cudecomp.h in this commit
there is now an undefined reference to MPI_Comm at this line of the include file:

cudecompResult_t cudecompInit(cudecompHandle_t* handle, MPI_Comm mpi_comm);

This is a bit awkward because it forces the user to import manually MPI even in files that do not need to call the init function. Could it be possible to add back that MPI import?

Issue with CMake when cuDecomp is used a subdirectory

First of all, thank you very much for adding a CMake build system in #15 .

I would like to point out that in your CMakeLists.txt you set the compilers name (nvc++) as the CXX compiler, while this works well when cuDecomp is the top project, it fails for us when we add cuDecomp as subdirectory with this error

"nvc++ " is not a full path and was not found in the PATH.

All you have to do is replace this code

# Use NVHPC compilers by default
set(CMAKE_CXX_COMPILER "nvc++")
set(CMAKE_Fortran_COMPILER "nvfortran")

# Locate and use NVHPC CMake configuration
find_program(NVHPC_CXX_BIN "nvc++")

With this one

# find NVHPC compilers
find_program(NVHPC_CXX_BIN "nvc++")
find_program(NVHPC_Fortran_BIN "nvfortran")
# Use NVHPC compilers by default
set(CMAKE_CXX_COMPILER ${NVHPC_CXX_BIN})
set(CMAKE_Fortran_COMPILER ${NVHPC_Fortran_BIN})

I would be happy to make PR if you want.

Thank you.

Issue with linker when calling from python

Hello,

I am trying to port and bind this library in Python
I was able to build it then link it to my library and use it from the library (Project is here : JaxDecomp )

But when I call it from Python via pybind11, I get this weird link error

cuDecomp/build/lib/libcudecomp.so: undefined symbol: __sync_val_compare_and_swap_4

I have tried to link libstdc++ , libatomic, and libgcc_s but nothing happend.

Do you have any idea why I am getting this error?

Building on Snellius HPC

Hi, thanks for sharing this library. I'm trying to build it on the Dutch national HPC Snellius, but I have run into trouble with the compilation. The lib stage of the Makefile appears to complete without problems, but once it moves on to the tests stage, it prints out many errors about not being able to find the libraries.

From the example config files, I believe I am pointing all the necessary variables to the right places, but the tests seem unable to find any of the NVIDIA libraries such as nccl.

My config file is as below, and here is my log file from the make command. Please let me know if you can see an obvious fix or have any suggestions.

# Having run
# module load 2022
# module load foss/2022a
# module load NVHPC/22.7
# NVHPC_HOME=/opt/nvidia/hpc_sdk/Linux_x86_64/2022
NVHPC_HOME=${EBROOTNVHPC}/Linux_x86_64/22.7

# Required variables to define
# MPICXX=mpicxx
# MPIF90=mpifort
CUDA_HOME=${NVHPC_HOME}/cuda
MPI_HOME=${NVHPC_HOME}/comm_libs/hpcx/latest/ompi
MPICXX=${MPI_HOME}/bin/mpicxx
MPIF90=${MPI_HOME}/bin/mpifort
NCCL_HOME=${NVHPC_HOME}/comm_libs/nccl
CUFFT_HOME=${NVHPC_HOME}/math_libs
CUTENSOR_HOME=${NVHPC_HOME}/math_libs
CUDACXX_HOME=${CUDA_HOME}

# Optional variables
CUDA_CC_LIST=61
BUILD_FORTRAN=1
ENABLE_NVTX=1
ENABLE_NVSHMEM=1
NVSHMEM_HOME=${NVHPC_HOME}/comm_libs/nvshmem

[Installation error] Run setup.py,command execution error

error message:

CMake Error at CMakeLists.txt:5 (find_package):
By not providing "FindNVHPC.cmake" in CMAKE_MODULE_PATH this project has
asked CMake to find a package configuration file provided by "NVHPC", but
CMake did not find one.

Could not find a package configuration file provided by "NVHPC" with any of
the following names:

NVHPCConfig.cmake
nvhpc-config.cmake

Add the installation prefix of "NVHPC" to CMAKE_PREFIX_PATH or set
"NVHPC_DIR" to a directory containing one of the above files. If "NVHPC"
provides a separate development package or SDK, be sure it has been
installed.

-- Configuring incomplete, errors occurred!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.