Giter Club home page Giter Club logo

hemi's People

Contributors

gimperiale avatar harrism avatar jleni avatar mastbaum avatar tmjbradley avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hemi's Issues

Visual Studio x64

Hi,

I'm trying to compile some of the examples as x64 under Visual Studio 2013. While it compiles ok, the constant HEMI_CUDA_COMPILER is not defined and hence the device code is not executed. If I define this constant it still doesn't execute. Note the 32bit code works fine.

Thanks.

Hello World with hemi::Array doesn't work, but doesn't error either

I tried a minimal hello world multiply input by 2 and store back into array example, and nothing happens, as if the lazy HostToDevice and DeviceToHost copying in hemi::Array isn't being called. Code below.

// using GNU/6.3.0 and CUDA/9.0
// nvcc -o test test.cpp -std=c++14 -lcuda -Ihemi -x cu && ./test

#include <cstdio>
#include <numeric> // iota
#include "hemi/hemi.h"
#include "hemi/array.h"
#include "hemi/launch.h"
#include "hemi/device_api.h"
#include "hemi/execution_policy.h"

// strangely prints ex: "setting data[3]=0", "setting data[1]=0", etc.
HEMI_LAUNCHABLE void Kernel(int *data) {
   printf("setting data[%d]=%d\n",hemi::globalThreadIndex(),data[hemi::globalThreadIndex()]*2);
   data[hemi::globalThreadIndex()] *= 2;
   __syncthreads(); // no error here means this is definitely running on the device.
}

int main() {
    const int THREADS(4);

    // delcare and init with [0, 1, 2, 3]
    hemi::Array<int> d(THREADS, false);
    std::iota(d.writeOnlyHostPtr(), d.writeOnlyHostPtr()+THREADS, 0);

    // invoke device and sync
    hemi::ExecutionPolicy ep;
    ep.setGridSize(THREADS);
    ep.setBlockSize(1);
    hemi::cudaLaunch(ep, Kernel, d.writeOnlyDevicePtr());
    hemi::deviceSynchronize();

    cudaError_t error = cudaGetLastError();
    if(error != cudaSuccess)
    {
	    // print the CUDA error message and exit
	    printf("CUDA error: %s\n", cudaGetErrorString(error));
	    exit(-1);
    }

    // print all 4 results: expect [0, 2, 4, 6]
    printf("results: ");
    for (int i=0; i<THREADS; i++) {
       printf("%d ",d.readOnlyHostPtr()[i]); // produces uninitialized data [0, 0, 0, 0] why?
       //printf("%d ",d.writeOnlyHostPtr()[i]); // produces pre-kernel data [0, 1, 2, 3] why?
    }
    printf("\n");

        return(0);
}

host implementations of atomic functions?

I think it would be useful to offer a simple implementation of functions like atomicAdd(), atomicInc, etc, that are used when compiling hemi code for CPU execution. Currently I have to use #ifdef HEMI_DEV_CODE to hide uses of atomicAdd() from the host compiler.

If this sounds reasonable, I'm happy to do the implementation and issue a pull request.

Test framework of choice for project?

Continuing the conversation from #9.
Some questions:

Do we feel like we need to add tests for device code?
Do we want to include the test framework as a submodule (if possible)? Just list it as a dependency for the user to figure out installation on their own?

Use inline instead of __forceinline__

#define HEMI_DEV_CALLABLE_INLINE        __host__ __device__ __forceinline__

inline would be correct as well, do you really need __forceinline__? With the “inline” keyword, the compiler may choose to ignore inlining for some reason (code size, compile time etc.), with __forceinline__, the compiler has no flexibility.

[Request] Runtime configuration choice of CPU or GPU

Are there any plans to provide runtime configuration of a kernel launch?

hemi::launch appears to be compile time configured to run on either the host or the device, but it would be nice if we could choose where something is suppose to run at runtime so a client lib (of a the lib we are developing) could choose what resources they want to use.

(feat req) OpenMP acceleration in parallel_for

It would be great if, when CUDA is disabled, hemi::paralell_for ran on all available CPUs.
I tried adding a simple #pragma omp for myself, but for some reason OpenMP refused to cooperate...

grid_stride_range and parallel_for want explicit cast betwen int and size_t

The following code won't compile (nvcc 7.5 + g++ 4.9):

hemi::Arraya(10);
int *ad = a.writeOnlyPtr();
hemi::parallel_for(0, a.size(), [=] HEMI_LAMBDA(int i) {
ad[i] = 1;
}

The problem is that 0 is an int and a.size() is a size_t.
grid_stride_range has the same problem.

Workaround: replace 0 with (size_t)0, which however is ugly and counter-intuitive.

Make HEMI_DEV_ALIGN macro more flexible for non CUDACC path

I would tweak this slightly for the case where the host compiler version is not supported. This tweak allows the user to provide the definition for the macro on the command line, rather than having to modify hemi.h:

#if !defined(HEMI_DEV_ALIGN) 

  #if defined(__GNUC__)
    #define HEMI_DEV_ALIGN(n) __attribute__((aligned(n)))
  #elif defined(_MSC_VER)
    #define HEMI_DEV_ALIGN(n) __declspec(align(n))
  #else
    #error "Please provide a definition for HEMI_DEV_ALIGN macro for your host compiler!"
  #endif

#endif

How to get result type of device lambda ?

auto fun = [] device(int i)->float{
return
}

fun is a cuda lambda, how can I use type traits get its result type and argument type

#include <tuple>

template <typename T>
struct function_traits
	: public function_traits<decltype(&T::operator())>
{
	typedef double type;
};


template <typename _ClassType, typename _ReturnType, typename... _Args>
struct function_traits<_ReturnType(_ClassType::*)(_Args...) const> {
	enum { arguments_size = sizeof...(_Args) };

	typedef _ReturnType result_type;

	template <size_t _index>
	struct arguments
	{
		typedef typename std::tuple_element<_index, std::tuple<_Args...>>::type type;
		// the i-th argument is equivalent to the i-th tuple element of a tuple
		// composed of those arguments.
	};
};

int main() {

	//auto fun = [] __host__ __device__(int i)->float {
	//	return 3.0f;
	//}; compile OK!

	auto fun = [] __device__(int i)->float {
		return 3.0f;
	}; // compile Error!

	auto size = function_traits<decltype(fun)>::arguments_size;

}

Common math function support?

Is it desirable in general to have a short math header that contains common functions from and are supported on the device? Maybe templated to allow for selection of the fast functions on CUDA? I've noticed that the compiler can get confused and go for a template of something from cmath where instead you'd want to promote an int to a double and use the device function instead.

I use std::pow() and std::abs() in my own device code, so it's useful for me. Something like being able to call hemi::pow(x,y) and get reasonable results on both the host and device?

Problems with making HEMI_DEFINE_CONSTANT static on host

#define HEMI_DEFINE_CONSTANT(def, value) __constant__ def ## _devconst = value; static def ## _hostconst = value;

With separate compilation, an extern linkage constant variable defined in one translation unit may be referenced by another translation unit. Forcing the host version of the constant to have internal linkage (“static”) won’t work for this scenario. Also, “static” linkage variables cannot be template arguments, which would break the following code:

__constant__ int XXX = 200;

template <int *addr>
__device__  int getValue(void) { return *addr; }

__global__ void kern(int *ptr) {  *ptr = getValue<&XXX>(); }

The snippet above compiles fine, but changing the linkage of “XXX” to static and recompiling fails:

j1.cu(6): error: a template argument may not reference a non-external entity

Perhaps the macro should have an argument to specify the linkage?

range.hpp in VS2013

File range.hpp won't compile in VS2013 (CUDA 7.5). The problem seems to be line 228. The following errors are reported by VS:

Warning 1 warning C4267: 'argument' : conversion from 'size_t' to 'int', possible loss of data c:...\visual studio 2013\projects\hemitest\hemitest\hemi\configure.h 34 1 HemiTest
Error 2 error C2144: syntax error : 'auto' should be preceded by ';' c:...\visual studio 2013\projects\hemitest\hemitest\hemi\range\range.hpp 228 1 HemiTest
Error 3 error C4430: missing type specifier - int assumed. Note: C++ does not support default-int c:...\visual studio 2013\projects\hemitest\hemitest\hemi\range\range.hpp 228 1 HemiTest
Error 4 error C2238: unexpected token(s) preceding ';' c:...\visual studio 2013\projects\hemitest\hemitest\hemi\range\range.hpp 228 1 HemiTest
Error 5 error C2143: syntax error : missing ';' before '' c:...\visual studio 2013\projects\hemitest\hemitest\hemi\range\range.hpp 228 1 HemiTest
Error 6 error C2988: unrecognizable template declaration/definition c:...\visual studio 2013\projects\hemitest\hemitest\hemi\range\range.hpp 228 1 HemiTest
Error 7 error C2059: syntax error : '' c:...\visual studio 2013\projects\hemitest\hemitest\hemi\range\range.hpp 228 1 HemiTest
Error 8 error C2238: unexpected token(s) preceding ';' c:...\visual studio 2013\projects\hemitest\hemitest\hemi\range\range.hpp 233 1 HemiTest
Error 9 error C2144: syntax error : 'bool' should be preceded by ';' c:...\visual studio 2013\projects\hemitest\hemitest\hemi\range\range.hpp 236 1 HemiTest
Error 10 error C4430: missing type specifier - int assumed. Note: C++ does not support default-int c:...\visual studio 2013\projects\hemitest\hemitest\hemi\range\range.hpp 236 1 HemiTest

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.