harrism / hemi Goto Github PK
View Code? Open in Web Editor NEWSimple utilities to enable code reuse and portability between CUDA C/C++ and standard C/C++.
License: BSD 3-Clause "New" or "Revised" License
Simple utilities to enable code reuse and portability between CUDA C/C++ and standard C/C++.
License: BSD 3-Clause "New" or "Revised" License
Hi,
I'm trying to compile some of the examples as x64 under Visual Studio 2013. While it compiles ok, the constant HEMI_CUDA_COMPILER is not defined and hence the device code is not executed. If I define this constant it still doesn't execute. Note the 32bit code works fine.
Thanks.
I tried a minimal hello world multiply input by 2 and store back into array example, and nothing happens, as if the lazy HostToDevice
and DeviceToHost
copying in hemi::Array
isn't being called. Code below.
// using GNU/6.3.0 and CUDA/9.0
// nvcc -o test test.cpp -std=c++14 -lcuda -Ihemi -x cu && ./test
#include <cstdio>
#include <numeric> // iota
#include "hemi/hemi.h"
#include "hemi/array.h"
#include "hemi/launch.h"
#include "hemi/device_api.h"
#include "hemi/execution_policy.h"
// strangely prints ex: "setting data[3]=0", "setting data[1]=0", etc.
HEMI_LAUNCHABLE void Kernel(int *data) {
printf("setting data[%d]=%d\n",hemi::globalThreadIndex(),data[hemi::globalThreadIndex()]*2);
data[hemi::globalThreadIndex()] *= 2;
__syncthreads(); // no error here means this is definitely running on the device.
}
int main() {
const int THREADS(4);
// delcare and init with [0, 1, 2, 3]
hemi::Array<int> d(THREADS, false);
std::iota(d.writeOnlyHostPtr(), d.writeOnlyHostPtr()+THREADS, 0);
// invoke device and sync
hemi::ExecutionPolicy ep;
ep.setGridSize(THREADS);
ep.setBlockSize(1);
hemi::cudaLaunch(ep, Kernel, d.writeOnlyDevicePtr());
hemi::deviceSynchronize();
cudaError_t error = cudaGetLastError();
if(error != cudaSuccess)
{
// print the CUDA error message and exit
printf("CUDA error: %s\n", cudaGetErrorString(error));
exit(-1);
}
// print all 4 results: expect [0, 2, 4, 6]
printf("results: ");
for (int i=0; i<THREADS; i++) {
printf("%d ",d.readOnlyHostPtr()[i]); // produces uninitialized data [0, 0, 0, 0] why?
//printf("%d ",d.writeOnlyHostPtr()[i]); // produces pre-kernel data [0, 1, 2, 3] why?
}
printf("\n");
return(0);
}
Hi Mark,
as to take advantage of your tool within a CMake driven project, I had to write this (rather trivial) file:
https://github.com/hegner/cmaketools/blob/master/modules/FindHemi.cmake
Maybe you want to make it available somewhere for others to use.
Benedikt
#define HEMI_DEV_CALLABLE_INLINE static inline
Same problem as Issue #2 with forcing static linkage when compiling in separate compilation mode.
I think it would be useful to offer a simple implementation of functions like atomicAdd(), atomicInc, etc, that are used when compiling hemi code for CPU execution. Currently I have to use #ifdef HEMI_DEV_CODE to hide uses of atomicAdd() from the host compiler.
If this sounds reasonable, I'm happy to do the implementation and issue a pull request.
Continuing the conversation from #9.
Some questions:
Do we feel like we need to add tests for device code?
Do we want to include the test framework as a submodule (if possible)? Just list it as a dependency for the user to figure out installation on their own?
#define HEMI_DEV_CALLABLE_INLINE __host__ __device__ __forceinline__
inline
would be correct as well, do you really need __forceinline__
? With the “inline” keyword, the compiler may choose to ignore inlining for some reason (code size, compile time etc.), with __forceinline__
, the compiler has no flexibility.
Are there any plans to provide runtime configuration of a kernel launch?
hemi::launch appears to be compile time configured to run on either the host or the device, but it would be nice if we could choose where something is suppose to run at runtime so a client lib (of a the lib we are developing) could choose what resources they want to use.
It would be great if, when CUDA is disabled, hemi::paralell_for ran on all available CPUs.
I tried adding a simple #pragma omp for myself, but for some reason OpenMP refused to cooperate...
The following code won't compile (nvcc 7.5 + g++ 4.9):
hemi::Arraya(10);
int *ad = a.writeOnlyPtr();
hemi::parallel_for(0, a.size(), [=] HEMI_LAMBDA(int i) {
ad[i] = 1;
}
The problem is that 0 is an int and a.size() is a size_t.
grid_stride_range has the same problem.
Workaround: replace 0 with (size_t)0, which however is ugly and counter-intuitive.
Hi,
I'm trying to have host-side C++11 methods return HEMI_LAMBDA functions; would appreciate your opinion if it is even possible: http://stackoverflow.com/questions/36095952/use-lambdas-to-get-polymorphism-recursion-and-true-oo-style-in-cuda?noredirect=1#comment59847779_36095952
I would tweak this slightly for the case where the host compiler version is not supported. This tweak allows the user to provide the definition for the macro on the command line, rather than having to modify hemi.h:
#if !defined(HEMI_DEV_ALIGN)
#if defined(__GNUC__)
#define HEMI_DEV_ALIGN(n) __attribute__((aligned(n)))
#elif defined(_MSC_VER)
#define HEMI_DEV_ALIGN(n) __declspec(align(n))
#else
#error "Please provide a definition for HEMI_DEV_ALIGN macro for your host compiler!"
#endif
#endif
auto fun = [] device(int i)->float{
return
}
fun is a cuda lambda, how can I use type traits get its result type and argument type
#include <tuple>
template <typename T>
struct function_traits
: public function_traits<decltype(&T::operator())>
{
typedef double type;
};
template <typename _ClassType, typename _ReturnType, typename... _Args>
struct function_traits<_ReturnType(_ClassType::*)(_Args...) const> {
enum { arguments_size = sizeof...(_Args) };
typedef _ReturnType result_type;
template <size_t _index>
struct arguments
{
typedef typename std::tuple_element<_index, std::tuple<_Args...>>::type type;
// the i-th argument is equivalent to the i-th tuple element of a tuple
// composed of those arguments.
};
};
int main() {
//auto fun = [] __host__ __device__(int i)->float {
// return 3.0f;
//}; compile OK!
auto fun = [] __device__(int i)->float {
return 3.0f;
}; // compile Error!
auto size = function_traits<decltype(fun)>::arguments_size;
}
Is it desirable in general to have a short math header that contains common functions from and are supported on the device? Maybe templated to allow for selection of the fast functions on CUDA? I've noticed that the compiler can get confused and go for a template of something from cmath where instead you'd want to promote an int to a double and use the device function instead.
I use std::pow() and std::abs() in my own device code, so it's useful for me. Something like being able to call hemi::pow(x,y) and get reasonable results on both the host and device?
#define HEMI_DEFINE_CONSTANT(def, value) __constant__ def ## _devconst = value; static def ## _hostconst = value;
With separate compilation, an extern linkage constant variable defined in one translation unit may be referenced by another translation unit. Forcing the host version of the constant to have internal linkage (“static”) won’t work for this scenario. Also, “static” linkage variables cannot be template arguments, which would break the following code:
__constant__ int XXX = 200;
template <int *addr>
__device__ int getValue(void) { return *addr; }
__global__ void kern(int *ptr) { *ptr = getValue<&XXX>(); }
The snippet above compiles fine, but changing the linkage of “XXX” to static and recompiling fails:
j1.cu(6): error: a template argument may not reference a non-external entity
Perhaps the macro should have an argument to specify the linkage?
File range.hpp won't compile in VS2013 (CUDA 7.5). The problem seems to be line 228. The following errors are reported by VS:
Warning 1 warning C4267: 'argument' : conversion from 'size_t' to 'int', possible loss of data c:...\visual studio 2013\projects\hemitest\hemitest\hemi\configure.h 34 1 HemiTest
Error 2 error C2144: syntax error : 'auto' should be preceded by ';' c:...\visual studio 2013\projects\hemitest\hemitest\hemi\range\range.hpp 228 1 HemiTest
Error 3 error C4430: missing type specifier - int assumed. Note: C++ does not support default-int c:...\visual studio 2013\projects\hemitest\hemitest\hemi\range\range.hpp 228 1 HemiTest
Error 4 error C2238: unexpected token(s) preceding ';' c:...\visual studio 2013\projects\hemitest\hemitest\hemi\range\range.hpp 228 1 HemiTest
Error 5 error C2143: syntax error : missing ';' before '' c:...\visual studio 2013\projects\hemitest\hemitest\hemi\range\range.hpp 228 1 HemiTest
Error 6 error C2988: unrecognizable template declaration/definition c:...\visual studio 2013\projects\hemitest\hemitest\hemi\range\range.hpp 228 1 HemiTest
Error 7 error C2059: syntax error : '' c:...\visual studio 2013\projects\hemitest\hemitest\hemi\range\range.hpp 228 1 HemiTest
Error 8 error C2238: unexpected token(s) preceding ';' c:...\visual studio 2013\projects\hemitest\hemitest\hemi\range\range.hpp 233 1 HemiTest
Error 9 error C2144: syntax error : 'bool' should be preceded by ';' c:...\visual studio 2013\projects\hemitest\hemitest\hemi\range\range.hpp 236 1 HemiTest
Error 10 error C4430: missing type specifier - int assumed. Note: C++ does not support default-int c:...\visual studio 2013\projects\hemitest\hemitest\hemi\range\range.hpp 236 1 HemiTest
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.