Giter Club home page Giter Club logo

pacxx-llvm's Introduction

Welcome to PACXX!

The PACXX (Programming Accelerators with C++) Project started in 2013 as PhD Thesis and is finally open source.

PACXX is a simple, lightweight and still powerful programming model for accelerators in C++. PACXX was primary planed as replacement to CUDA in a time without C++11/14 support. In the past years PACXX did not only advance GPU programming to C++14 and beyond, but also becomes portable across different hardware architectures.

Currently, PACXX supports Nvidia GPUs with Compute Capability of 2.0 and above, CPUs from different vendors (Intel, AMD, ARM) and in some weeks from now PACXX will rock on ROCm enabled GPUs from AMD as well.

Getting Started

First of all clone the source:

git clone --recursive https://github.com/pacxx/pacxx-llvm llvm

The main repo has set up all submodules you need to get going including the PACXX runtime and the modified Clang frontend.

Build PACXX: pipeline status

mkdir build && cd build

cmake .. -DCMAKE_BUILD_TYPE=Release -DBUILD_SHARED_LIBS=ON -DLLVM_ENABLE_RTTI=ON -DLLVM_ENABLE_CXX1Y=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_FLAGS_RELEASE="-O3"

make -j<number of cores>

Get some coffee, this can take some time.

Write your first PACXX Program

#include <PACXX.h>
#include <vector>
#include <algorithm>

using namespace pacxx::v2;

int main(int argc, char *argv[]) {

  Executor::Create<CUDARuntime>(0); // create an executor

  auto &exec = Executor::get(0);    // retrieve the default executor

  size_t size = 128;

  std::vector<int> a(size, 1);      // allocate some memory on the host
  std::vector<int> b(size, 2);
  std::vector<int> c(size, 0);
  std::vector<int> gold(size, 0);

  auto &da = exec.allocate<int>(a.size());  // allocate some memory on the device 
  auto &db = exec.allocate<int>(b.size());
  auto &dc = exec.allocate<int>(c.size());

  da.upload(a.data(), a.size());    // upload data to the device
  db.upload(b.data(), b.size());
  dc.upload(c.data(), c.size());

  auto pa = da.get();   // grab the raw pointer from the device address space
  auto pb = db.get();
  auto pc = dc.get();

  auto vadd = [=](range &config) {  // define the vector addition kernel
    auto i = config.get_global(0);  // get the global id (in x-dimension) for the thread  
    if (i < size)
      pc[i] = pa[i] + pb[i] + 2;
  };

  exec.launch(vadd, {{1}, {128}});  // launch the kernel with 128 threads in 1 block
  dc.download(c.data(), c.size());  // download the results from the device 

  std::transform(a.begin(), a.end(), b.begin(), gold.begin(), [](auto a, auto b) { return a + b + 2; }); 
  if (std::equal(c.begin(), c.end(), gold.begin())) // check the results
    return 0; // passed
  else
    return 1; // failed
}

To compile your code, the easiest way is to use Cmake. The following script shows how you can integrate PACXX into you Cmake build system:

cmake_minimum_required(VERSION 3.5)
project(vadd)

set(CMAKE_MODULE_PATH ${PACXX_DIR}/lib/cmake/pacxx)

find_package(PACXX REQUIRED)

include_directories(${PACXX_INCLUDE_DIRECTORY} ${PACXX_INCLUDE_DIRECTORY}/pacxx)
 
set(CMAKE_CXX_STANDARD 14)

set(SOURCE_FILES ${CMAKE_CURRENT_SOURCE_DIR}/${PROJECT_NAME}.cpp)

add_executable(${PROJECT_NAME} ${SOURCE_FILES})

add_pacxx_to_target(${PROJECT_NAME} ${CMAKE_CURRENT_BINARY_DIR} ${SOURCE_FILES})

Configure your Cmake project using:

mkdir build && cd build

CC=<pacxx_install_prefix>/bin/clang CXX=<pacxx_install_prefix>/bin clang++ ccmake .. -DPACXX_DIR=<pacxx_install_prefix>

make 

If everything was set up correctly you should now get an executable linked against the PACXX runtime and good to go.

Running the executable with PACXX_LOG_LEVEL=2 env variable set will give you the verbose output of the runtime:

CUDARuntime.cpp:186: note: VERBOSE: CUDARuntime has found 1 CUDA devices
CUDARuntime.cpp:34: note: VERBOSE: Creating cudaCtx for device: 0 0 0x2055550
CUDARuntime.cpp:43: note: VERBOSE: Initializing PTXBackend for Tesla K20c (dev: 0) with compute capability 3.5
PTXBackend.cpp:52: note: VERBOSE: Intializing LLVM components for PTX generation!
CoreInitializer.cpp:32: note: VERBOSE: Core components initialized!
Executor.cpp:88: note: VERBOSE: Created new Executor with id: 0
MSPEngine.cpp:49: note: DEBUG: MSP Engine disabled!
CUDARuntime.cpp:93: note: VERBOSE: //
                                   // Generated by LLVM NVPTX Back-End
                                   //
                                   
                                   // ptx stripped for shortness                  
Timing.h:45: note: VERBOSE: CUDARuntime.cpp:71 compileAndLink timed: 4497us
Executor.h:191: note: VERBOSE: allocating memory: 512
Executor.h:191: note: VERBOSE: allocating memory: 512
Executor.h:191: note: VERBOSE: allocating memory: 512
CUDAKernel.cpp:43: note: VERBOSE: setting kernel arguments
CUDAKernel.cpp:51: note: DEBUG: Launching kernel: _ZN5pacxx2v213genericKernelIZL19test_vadd_low_leveliPPcE3$_0EEvT_
CUDAKernel.cpp:55: note: VERBOSE: Kernel configuration: 
                                  blocks(1,1,1)
                                  threads(128,1,1)
                                  shared_mem=0
Executor.h:99: note: VERBOSE: destroying executor 0

Upcoming Stuff

  • Support for AMD GPUs through the HIP stack on the ROCm infrastructure.

Known Issues

  • Nvidia's libdevice must be linked manually to get all math functions in device code. This will be fixed in a future update.
  • Atomic Operations are more or less a bad hack.
  • Missing support for constant memory regions on GPUs.
  • SLEEF fails to compile on AVX2 architectures due to missing intrinsics in llvm 6.0. (currently under investigation)
  • Documentation. Well yes the only available documentation on the PACXX runtime and the programming model itself is source code.

Want to contribute?

Contributions are always welcome. If you want to contribute to PACXX just open a pull request.

Publications

Haidl M, Gorlatch S. 2014. ‘PACXX: Towards a Unified Programming Model for Programming Accelerators using C++14.’ Contributed to the The LLVM Compiler Infrastructure in HPC Workshop at Supercomputing '14, New Orleans. doi: 10.1109/LLVM-HPC.2014.9.

Haidl M, Hagedorn B, Gorlatch S. 2016. ‘Programming GPUs with C++14 and Just-In-Time Compilation.’ Contributed to the Advances in Parallel Computing: On the Road to Exascale, ParCo2015, Edinburgh, Schottland. doi: 10.3233/978-1-61499-621-7-247.

Haidl M, Steuwer M, Humernbrum T, Gorlatch S. 2016. ‘Multi-Stage Programming for GPUs in Modern C++ using PACXX.’ Contributed to the The 9th Annual Workshop on General Purpose Processing Using Graphics Processing Unit, GPGPU '16, Barcelona, Spain. doi: 10.1145/2884045.2884049.

Haidl M, Gorlatch S. 2017. ‘High-Level Programming for Many-Cores using C++14 and the STL.’ International Journal of Parallel Programming 2017. doi: 10.1007/s10766-017-0497-y.

Haidl M, Steuwer M, Dirks H, Humernbrum T, Gorlatch S. 2017. ‘Towards Composable GPU Programming: Programming GPUs with Eager Actions and Lazy Views.’ In Proceedings of the 8th International Workshop on Programming Models and Applications for Multicores and Manycores, edited by Chen Q, Huang Z, 58-67. New York, NY: ACM. doi: 10.1145/3026937.3026942.

Haidl M, Moll S, Klein L, Sun H, Hack S, Gorlatch S. 2017 'PACXXv2 + RV: An LLVM-based Portable High-Performance Programming Model.' In Proceedings of the 4th of the Fourth Workshop on the LLVM Compiler Infrastructure in HPC at Supercomputing '17, Denver, ACM doi: 10.1145/3148173.3148185

pacxx-llvm's People

Contributors

ahatanak avatar arsenm avatar asl avatar atrick avatar bob-wilson avatar chandlerc avatar chapuni avatar cunningbaldrick avatar d0k avatar ddunbar avatar dexonsmith avatar dsandersllvm avatar dwblaikie avatar echristo avatar espindola avatar isanbard avatar lattner avatar lhames avatar majnemer avatar mbrukman avatar nlewycky avatar resistor avatar rksimon avatar rnk avatar rotateright avatar sanjoy avatar stoklund avatar tnorthover avatar topperc avatar tstellaramd avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.