harrism / hemi Goto Github PK

View Code? Open in Web Editor NEW

338.0 29.0 53.0 852 KB

Simple utilities to enable code reuse and portability between CUDA C/C++ and standard C/C++.

License: BSD 3-Clause "New" or "Revised" License

Makefile 7.07% C++ 82.28% Cuda 10.40% C 0.26%

cuda cuda-device gpu cuda-kernels c-plus-plus hemi

hemi's People

Contributors

Stargazers

Watchers

hemi's Issues

Visual Studio x64

Hi,

I'm trying to compile some of the examples as x64 under Visual Studio 2013. While it compiles ok, the constant HEMI_CUDA_COMPILER is not defined and hence the device code is not executed. If I define this constant it still doesn't execute. Note the 32bit code works fine.

Thanks.

Hello World with hemi::Array doesn't work, but doesn't error either

I tried a minimal hello world multiply input by 2 and store back into array example, and nothing happens, as if the lazy HostToDevice and DeviceToHost copying in hemi::Array isn't being called. Code below.

// using GNU/6.3.0 and CUDA/9.0
// nvcc -o test test.cpp -std=c++14 -lcuda -Ihemi -x cu && ./test

#include <cstdio>
#include <numeric> // iota
#include "hemi/hemi.h"
#include "hemi/array.h"
#include "hemi/launch.h"
#include "hemi/device_api.h"
#include "hemi/execution_policy.h"

// strangely prints ex: "setting data[3]=0", "setting data[1]=0", etc.
HEMI_LAUNCHABLE void Kernel(int *data) {
   printf("setting data[%d]=%d\n",hemi::globalThreadIndex(),data[hemi::globalThreadIndex()]*2);
   data[hemi::globalThreadIndex()] *= 2;
   __syncthreads(); // no error here means this is definitely running on the device.
}

int main() {
    const int THREADS(4);

    // delcare and init with [0, 1, 2, 3]
    hemi::Array<int> d(THREADS, false);
    std::iota(d.writeOnlyHostPtr(), d.writeOnlyHostPtr()+THREADS, 0);

    // invoke device and sync
    hemi::ExecutionPolicy ep;
    ep.setGridSize(THREADS);
    ep.setBlockSize(1);
    hemi::cudaLaunch(ep, Kernel, d.writeOnlyDevicePtr());
    hemi::deviceSynchronize();

    cudaError_t error = cudaGetLastError();
    if(error != cudaSuccess)
    {
	    // print the CUDA error message and exit
	    printf("CUDA error: %s\n", cudaGetErrorString(error));
	    exit(-1);
    }

    // print all 4 results: expect [0, 2, 4, 6]
    printf("results: ");
    for (int i=0; i<THREADS; i++) {
       printf("%d ",d.readOnlyHostPtr()[i]); // produces uninitialized data [0, 0, 0, 0] why?
       //printf("%d ",d.writeOnlyHostPtr()[i]); // produces pre-kernel data [0, 1, 2, 3] why?
    }
    printf("\n");

        return(0);
}

Adding support for CMake

Hi Mark,

as to take advantage of your tool within a CMake driven project, I had to write this (rather trivial) file:
https://github.com/hegner/cmaketools/blob/master/modules/FindHemi.cmake

Maybe you want to make it available somewhere for others to use.

Benedikt

Problems with HEMI_DEV_CALLABLE_INLINE static linkage

#define HEMI_DEV_CALLABLE_INLINE        static inline

Same problem as Issue #2 with forcing static linkage when compiling in separate compilation mode.

host implementations of atomic functions?

I think it would be useful to offer a simple implementation of functions like atomicAdd(), atomicInc, etc, that are used when compiling hemi code for CPU execution. Currently I have to use #ifdef HEMI_DEV_CODE to hide uses of atomicAdd() from the host compiler.

If this sounds reasonable, I'm happy to do the implementation and issue a pull request.

Test framework of choice for project?

Continuing the conversation from #9.
Some questions:

Do we feel like we need to add tests for device code?
Do we want to include the test framework as a submodule (if possible)? Just list it as a dependency for the user to figure out installation on their own?

Use inline instead of forceinline

#define HEMI_DEV_CALLABLE_INLINE        __host__ __device__ __forceinline__

inline would be correct as well, do you really need __forceinline__? With the “inline” keyword, the compiler may choose to ignore inlining for some reason (code size, compile time etc.), with __forceinline__, the compiler has no flexibility.

[Request] Runtime configuration choice of CPU or GPU

Are there any plans to provide runtime configuration of a kernel launch?

hemi::launch appears to be compile time configured to run on either the host or the device, but it would be nice if we could choose where something is suppose to run at runtime so a client lib (of a the lib we are developing) could choose what resources they want to use.

(feat req) OpenMP acceleration in parallel_for

It would be great if, when CUDA is disabled, hemi::paralell_for ran on all available CPUs.
I tried adding a simple #pragma omp for myself, but for some reason OpenMP refused to cooperate...

grid_stride_range and parallel_for want explicit cast betwen int and size_t

The following code won't compile (nvcc 7.5 + g++ 4.9):

hemi::Arraya(10);
int *ad = a.writeOnlyPtr();
hemi::parallel_for(0, a.size(), [=] HEMI_LAMBDA(int i) {
ad[i] = 1;
}

The problem is that 0 is an int and a.size() is a size_t.
grid_stride_range has the same problem.

Workaround: replace 0 with (size_t)0, which however is ugly and counter-intuitive.

Use HEMI_LAMBDAs as return values of host functions?

Hi,
I'm trying to have host-side C++11 methods return HEMI_LAMBDA functions; would appreciate your opinion if it is even possible: http://stackoverflow.com/questions/36095952/use-lambdas-to-get-polymorphism-recursion-and-true-oo-style-in-cuda?noredirect=1#comment59847779_36095952

Make HEMI_DEV_ALIGN macro more flexible for non CUDACC path

I would tweak this slightly for the case where the host compiler version is not supported. This tweak allows the user to provide the definition for the macro on the command line, rather than having to modify hemi.h:

#if !defined(HEMI_DEV_ALIGN) 

  #if defined(__GNUC__)
    #define HEMI_DEV_ALIGN(n) __attribute__((aligned(n)))
  #elif defined(_MSC_VER)
    #define HEMI_DEV_ALIGN(n) __declspec(align(n))
  #else
    #error "Please provide a definition for HEMI_DEV_ALIGN macro for your host compiler!"
  #endif

#endif

How to get result type of device lambda ?

auto fun = [] device(int i)->float{
return
}

fun is a cuda lambda, how can I use type traits get its result type and argument type

#include <tuple>

template <typename T>
struct function_traits
	: public function_traits<decltype(&T::operator())>
{
	typedef double type;
};


template <typename _ClassType, typename _ReturnType, typename... _Args>
struct function_traits<_ReturnType(_ClassType::*)(_Args...) const> {
	enum { arguments_size = sizeof...(_Args) };

	typedef _ReturnType result_type;

	template <size_t _index>
	struct arguments
	{
		typedef typename std::tuple_element<_index, std::tuple<_Args...>>::type type;
		// the i-th argument is equivalent to the i-th tuple element of a tuple
		// composed of those arguments.
	};
};

int main() {

	//auto fun = [] __host__ __device__(int i)->float {
	//	return 3.0f;
	//}; compile OK!

	auto fun = [] __device__(int i)->float {
		return 3.0f;
	}; // compile Error!

	auto size = function_traits<decltype(fun)>::arguments_size;

}

Common math function support?

Is it desirable in general to have a short math header that contains common functions from and are supported on the device? Maybe templated to allow for selection of the fast functions on CUDA? I've noticed that the compiler can get confused and go for a template of something from cmath where instead you'd want to promote an int to a double and use the device function instead.

I use std::pow() and std::abs() in my own device code, so it's useful for me. Something like being able to call hemi::pow(x,y) and get reasonable results on both the host and device?

Problems with making HEMI_DEFINE_CONSTANT static on host

#define HEMI_DEFINE_CONSTANT(def, value) __constant__ def ## _devconst = value; static def ## _hostconst = value;

With separate compilation, an extern linkage constant variable defined in one translation unit may be referenced by another translation unit. Forcing the host version of the constant to have internal linkage (“static”) won’t work for this scenario. Also, “static” linkage variables cannot be template arguments, which would break the following code:

__constant__ int XXX = 200;

template <int *addr>
__device__  int getValue(void) { return *addr; }

__global__ void kern(int *ptr) {  *ptr = getValue<&XXX>(); }

The snippet above compiles fine, but changing the linkage of “XXX” to static and recompiling fails:

j1.cu(6): error: a template argument may not reference a non-external entity

Perhaps the macro should have an argument to specify the linkage?

range.hpp in VS2013

File range.hpp won't compile in VS2013 (CUDA 7.5). The problem seems to be line 228. The following errors are reported by VS:

Warning 1 warning C4267: 'argument' : conversion from 'size_t' to 'int', possible loss of data c:...\visual studio 2013\projects\hemitest\hemitest\hemi\configure.h 34 1 HemiTest
Error 2 error C2144: syntax error : 'auto' should be preceded by ';' c:...\visual studio 2013\projects\hemitest\hemitest\hemi\range\range.hpp 228 1 HemiTest
Error 3 error C4430: missing type specifier - int assumed. Note: C++ does not support default-int c:...\visual studio 2013\projects\hemitest\hemitest\hemi\range\range.hpp 228 1 HemiTest
Error 4 error C2238: unexpected token(s) preceding ';' c:...\visual studio 2013\projects\hemitest\hemitest\hemi\range\range.hpp 228 1 HemiTest
Error 5 error C2143: syntax error : missing ';' before '' c:...\visual studio 2013\projects\hemitest\hemitest\hemi\range\range.hpp 228 1 HemiTest
Error 6 error C2988: unrecognizable template declaration/definition c:...\visual studio 2013\projects\hemitest\hemitest\hemi\range\range.hpp 228 1 HemiTest
Error 7 error C2059: syntax error : '' c:...\visual studio 2013\projects\hemitest\hemitest\hemi\range\range.hpp 228 1 HemiTest
Error 8 error C2238: unexpected token(s) preceding ';' c:...\visual studio 2013\projects\hemitest\hemitest\hemi\range\range.hpp 233 1 HemiTest
Error 9 error C2144: syntax error : 'bool' should be preceded by ';' c:...\visual studio 2013\projects\hemitest\hemitest\hemi\range\range.hpp 236 1 HemiTest
Error 10 error C4430: missing type specifier - int assumed. Note: C++ does not support default-int c:...\visual studio 2013\projects\hemitest\hemitest\hemi\range\range.hpp 236 1 HemiTest

harrism / hemi Goto Github PK

hemi's People

Contributors

Stargazers

Watchers

Forkers

hemi's Issues

Recommend Projects

Recommend Topics

Recommend Org