Giter Club home page Giter Club logo

cl-cuda's People

Contributors

drsplinter avatar fare avatar ghollisjr avatar gos-k avatar melisgl avatar takagi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cl-cuda's Issues

Any way to run on Windows?

Setting up cl-cuda seems to hook into gcc to create the FFI. GCC is well and good thanks to MSYS2/MinGW64, but apparently the CUDA toolkit and MinGW don't play nice together. Is there any way to set up cl-cuda to use the Windows CUDA toolchain?

support unsigned long long type

Support unsigned long long type which is used in curand library.

See melisgl/cl-cuda@85c27a967e00edf6ef57ddebfacf2d4f30d76682 in #4.

no class named CFFI-GROVEL::PROCESS-OP

Trying to load cl-cuda in sbcl, I get this error:

* (ql:quickload :cl-cuda)

debugger invoked on a LOAD-SYSTEM-DEFINITION-ERROR in thread #<THREAD "main thread" RUNNING {1002A8B383}>: Error while trying to load definition for system cl-cuda from pathname /home/dev/quicklisp/local-projects/cl-cuda/cl-cuda.asd: There is no class named CFFI-GROVEL::PROCESS-OP.

What am I doing wrong?

grovel size_t type

Grovel size_t type which is environment-dependent.

Question:

  • where to place a grovel specifiation file?

See melisgl/cl-cuda@d6e6dd94a5ca7a8243f23f7eddecbbd56aa51ceb in #4

PROGN statements and brace blocks "{ ... }" in CUDA C

Currenlty, the compiler make brace blocks { ... } when compiling following statements:

  • IF
  • LET
  • SYMBOL-MACROLET
  • DO
  • WITH-SHARED-MEMORY

On the other hand, It does not make brace blocks when compiling PROGN statement.

Should PROGN statements correspond to brace blocks in CUDA C?

If yes, what should LET statements be compiled into?

{
  int x = 0;
  return x;
}

or

{
  int x = 0;
  {
    return x;
  }
}

I want to adopt the former compiled code.

use _v2 of CUDA functions when available

Use _v2 of CUDA functions when available:

  • cuCtxCreate_v2
  • cuCtxDestroy_v2
  • cuMemAlloc_v2
  • cuMemFree_v2
  • cuMemcpyHtoD_v2
  • cuMemcpyDtoH_v2
  • cuEventDestroy_v2

Question:

  • are there any other functions having _v2?

See melisgl/cl-cuda@db464369fa42f7090fa6ec6b3ee216d0279ee320 in #4

improve compiling cl-cuda type to CUDA C type

Improve the way to compile cl-cuda type to CUDA C type.

  • int -> "int" : OK
  • curand-state-xorwow -> "curandStateXORWOW" : NG
  • curand-state-xorwow -> "curandStateXORWOW_t" : OK

Currently, cl-cuda type is translated to string simply.

See melisgl/cl-cuda@85c27a967e00edf6ef57ddebfacf2d4f30d76682 in #4.

Add initializer syntax for CUDA vector types.

Add initializer syntax for CUDA vector types as compiled:

__device__ float4 foo = { 0.0, 1.0, 2.0, 3.0 };

It's because __device__, __constant__ and __shared__ variables are not allowed for dynamic initialization so the following is invalid:

__device__ float4 foo = make_float4( 0.0, 1.0, 2.0, 3.0 );

warnings caused by forward references

The variables below are forward-referenced and cause warnings:

  • +built-in-functions+
  • +built-in-macros+
  • kernel-manager

See melisgl/cl-cuda@97ea6cf7bdfc7450c033152b7d6b3d555bb5efd2 in issue #4 .

warnings caused by an unused argument

The unused argument type in definition of defkernelconst macro causes a warning.

See melisgl/cl-cuda@97ea6cf7bdfc7450c033152b7d6b3d555bb5efd2 in issue #4 .

support curand XORWOW

Support curand XORWOW:

  • curand_init
  • curand_uniform
  • curandStateXORWOW_t

Depends on #15, #19, #21 and #22.

See melisgl/cl-cuda@85c27a967e00edf6ef57ddebfacf2d4f30d76682 in #4.

Failing vector-add test (Linux amd64 CUDA 5)

I'm failing this test on FC17

uname -r
3.9.10-100.fc17.x86_64
nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2012 NVIDIA Corporation
Built on Fri_Sep_21_17:28:58_PDT_2012
Cuda compilation tools, release 5.0, V0.2.1221
sbcl --version
SBCL 1.0.57-1.fc17

The error I receive is: CUDA_ERROR_LAUNCH_FAILED, which is, afaik, a generic error if "something" went wrong.

WARNING: This may not be a bug, in fact, this may be a misconfiguration on my side, however, I'd appreciate if you could tell me what else to check.

This is the output from the test:

VECTOR-ADD> (main)
CU-INIT succeeded.
CU-DEVICE-GET succeeded.
CU-CTX-CREATE succeeded.
CU-MEM-ALLOC succeeded.
CU-MEM-ALLOC succeeded.
CU-MEM-ALLOC succeeded.
CU-MEMCPY-HOST-TO-DEVICE succeeded.
CU-MEMCPY-HOST-TO-DEVICE succeeded.
nvcc -arch=sm_11 -I /home/wvxvw/quicklisp/local-projects/cl-cuda/include -ptx -o /tmp/cl-cuda-sBBXlw.ptx /tmp/cl-cuda-sBBXlw.cu
CU-MODULE-LOAD succeeded.
CU-MODULE-GET-FUNCTION succeeded.
CU-LAUNCH-KERNEL succeeded.
; Evaluation aborted on #<SIMPLE-ERROR "~A failed with driver API error No. ~A.~%~A" {1003FEF573}>.

Cannot find CUDA SDK

Hello,
Running latest CCL with Version 1.11-r16635 on OS X 10.10.5
Could load and compile cl-cuda without problem.
I have a hard time referencing my version of CUDA which is NVIDIA-CUDA-7.5
When I run any cuda example, I get an error message:
e.g. (cl-cuda-examples.diffuse0:main)

Error: CUDA SDK not found.
While executing: CL-CUDA.DRIVER-API:CU-INIT, in process Listener(4).
How do I configure cl-cuda to reference the right framework ?
Should I recompile ?

Maybe a silly question - My first time using this library.

Appropriately compile single and double precision float values.

Compile single float values to be explicitly typed to avoid being compiled as double float values.

before

0.0

after

0.0f

Additionally, fix double float values which are now compiled as (double)0.0 to 0.0, double float literal.

before

(double)0.0

after

0.0

Misleading information when failed to load libcuda.

Fact

  • When cl-cuda is being loaded, it is completed even if loading libcuda failed.
  • After that, when foreign functions in libcuda are called, CL-CUDA.DRIVER-API::SDK-NOT-FOUND-ERROR is raised to show "CUDA SDK not found."

Problem

  • We can not find out that the reason of the condition raised is CUDA SDK is not found or found but its loading has some problems.

See also
#42

don't fail if cuda library cannot be loaded

Don't fail if cuda library cannot be loaded.

Question:

  • why want to ignore-errors if cuda library cannot be loaded, because without cuda library cl-cuda makes no sense.

See melisgl/cl-cuda@ca0bde3fe89db1192f89bf2a702990900e996c61 in #4

grovel CUdeviceptr type

Grovel CUdeviceptr type from cuda_kernel.h.

Question:

  • also grovel other CUDA driver API types?
  • also grovel other CUDA driver API functions, structures and enumerations?
  • where to place a grovel specification file?

See melisgl/cl-cuda@d6e6dd94a5ca7a8243f23f7eddecbbd56aa51ceb in #4

add double float support

Add double float suport:

  • double
  • double3
  • double4

See melisgl/cl-cuda@ea8cf60e9c74e878973d85338f1ab727b76b68b3 and melisgl/cl-cuda@67f96a0b530808e70af7c495f7735d5ad9b29034 in #4.

Latter says about -arch=sm13 NVCC options needed for double floats.

Infer global's type from its initial value.

Infer global's type from its initial value, removing type argument from DEFGLOBAL macro.

before

(defglobal x int 1)

after

(defglobal x 1)    ; type of x is inferred as int.

Add selector macros for CUDA vector types' CL counterparts.

Add selector macros for CUDA vector types' CL counterparts: float3, float4, double3 and double4.

(defmacro with-float4 ((x y z w) value &body body)
  (once-only (value)
    `(let ((,x (float4-x ,value))
           (,y (float4-y ,value))
           (,z (float4-z ,value))
           (,w (float4-w ,value)))
       (declare (ignorable ,x ,y ,z ,w))
       ,@body)))

add some math functions

Add math functions:

  • exp
  • log
  • __double2int_rn

Depends on #15.

See melisgl/cl-cuda@1713af4a7a6d8cdbb3048d8f4f21ac99f6010d21 in #4.

can't define a __device__ kernel function that returns void type

Currently, a function specifier is determined by its return type, that __global__ for void type and __device__ for not void type.

For example,

(defkernel foo (void ())
  (return))

is compiled into:

__global__ void foo () {
  return;
}

Because of this rule, a __device__ kernel function that returns void type can't be defined.

To solve this problem, following syntaxes may be given:

(defdevicekernel foo (void ()) ...
(defkernel (foo :device) (void ()) ...
(defkernel foo :device (void ()) ...
(defkernel foo ((void :device) ()) ...
(defkernel foo (void :device ()) ...

I think of choosing the second one. Function specifiers can be omitted and the current rule is applied in such case.

:global is specified:

(defkernel (foo :global) (void ())
  (return))
;; compiled into: __global__ void foo () { ... }

:device is specified:

(defkernel (bar :device) (void ())
  (return))
;; compiled into: __device__ void bar () { ... }

__global__ is complemented because return type is void:

(defkernel foofoo (void ())
  (return))
;; compiled into: __global__ void foofoo () { ... }

__device__ is complemented because return type is int:

(defkernel baz (int ())
  (return 1))
;; compiled into: __device__ int baz () { ... }

passing structure type references to ALLOC-GL-ARRAY's TYPE argument

In the definition of cl-cuda-interop:alloc-memory-block, since alloc-gl-array function's type argument accepts only symbols, structure type references must be passd to type in bare style which is actually deprecated in CFFI. For example, foo must be passed instead of (:struct foo).

NG: (alloc-gl-array '(:struct foo) count)
OK: (alloc-gl-array 'foo count)

As a working around for this problem, I define bare-cffi-type function which convert structure type references from the form (:struct foo) to foo, and pass its returning value to alloc-gl-array function.

(alloc-gl-array (bare-cffi-type type) count)

This problem is already reported on cl-opengl's issue tracker #41.

to be accepted in Quicklisp distribution

Currently, cl-cuda is not available in Quicklisp distribution because of its testing policy (see #514 in quicklisp-projects).

It may be accepted if it just finished to be compiled without condition on an environment where CUDA SDK is not installed even though it does not work.

Approach:

  1. try to compile grovel files which include cuda.h before evaluate the defsystem form in cl-cuda.asd
  2. a condition would be signaled since CUDA SDK is not installed
  3. handle the condition and push a flag to *features* which mentions CUDA SDK is not installed
  4. in the defsystem form, avoid cffi-grovel:grovel-file form to be evaluated by looking *features*

Quetions:

  • may be warned that some symbols are not found if avoid cffi-grovel:grovel-file?

warnings caused by specifying cffi structure type

Specifying cffi structure type without :struct keyword causes warnings. For example, float3 structure type should be specified as '(:struct float3), not 'float3, to avoid warnings.

See melisgl/cl-cuda@97ea6cf7bdfc7450c033152b7d6b3d555bb5efd2 in issue #4 .

Support cuModuleGetGlobal driver API.

Support cuModuleGetGlobal dirver API. It is useful in case using parameters which are dynamically determined in a program but not changed across launching kernel functions.

  • Support cuModuleGetGlobal driver API.
  • Introduce a cl-cuda API to define CUDA C global.

support pointers and integers when launching kernels

Only MEMORY-BLOCKs were suported previously which is fine as long as
one uses ALLOC-MEMORY-BLOCK. With this change CU-DEVICE-PTRs obtained
directly from CU-MEM-ALLOC can be used.

See melisgl/cl-cuda@67f96a0b530808e70af7c495f7735d5ad9b29034 in #4.

Can't compile on Ubuntu 14.4 / CUDA 6.5

Hi there, I get the following error trying to quickload cl-cuda. The error message at the end is in German, it says "fatal error: cuda.h: File or directory not found":

  • (ql:quickload :cl-cuda)
    To load "cl-cuda":
    Load 1 ASDF system:
    cl-cuda
    ; Loading "cl-cuda"
    ........; cc -m64 -I/home/mwoehrle/quicklisp/dists/quicklisp/software/cffi_0.14.0/ -o /home/mwoehrle/.cache/common-lisp/sbcl-1.1.14.debian-linux-x64/home/mwoehrle/quicklisp/local-projects/local-projects/cl-cuda/src/driver-api/type-grovel /home/mwoehrle/.cache/common-lisp/sbcl-1.1.14.debian-linux-x64/home/mwoehrle/quicklisp/local-projects/local-projects/cl-cuda/src/driver-api/type-grovel.c

debugger invoked on a CFFI-GROVEL:GROVEL-ERROR in thread #<THREAD "main thread" RUNNING {1002A8AF53}>: External process exited with code 1.
Command was: "cc" "-m64" "-I/home/mwoehrle/quicklisp/dists/quicklisp/software/cffi_0.14.0/" "-o" "/home/mwoehrle/.cache/common-lisp/sbcl-1.1.14.debian-linux-x64/home/mwoehrle/quicklisp/local-projects/local-projects/cl-cuda/src/driver-api/type-grovel" "/home/mwoehrle/.cache/common-lisp/sbcl-1.1.14.debian-linux-x64/home/mwoehrle/quicklisp/local-projects/local-projects/cl-cuda/src/driver-api/type-grovel.c"
Output was:
/home/mwoehrle/.cache/common-lisp/sbcl-1.1.14.debian-linux-x64/home/mwoehrle/quicklisp/local-projects/local-projects/cl-cuda/src/driver-api/type-grovel.c:6:18: fatal error: cuda.h: Datei oder Verzeichnis nicht gefunden
#include <cuda.h>
^
compilation terminated.

Simple Example not working

After trying the code on the main page...

(defun main ()
(let* ((dev-id 0)
(n 1024)
(threads-per-block 256)
(blocks-per-grid (/ n threads-per-block)))
(with-cuda (dev-id)
(with-memory-blocks ((a 'float n)
(b 'float n)
(c 'float n))
(random-init a n)
(random-init b n)
(sync-memory-block a :host-to-device)
(sync-memory-block b :host-to-device)
(vec-add-kernel a b c n
:grid-dim (list blocks-per-grid 1 1)
:block-dim (list threads-per-block 1 1))
(sync-memory-block c :device-to-host)
(verify-result a b c n)))))

I got this error

nvcc exits with code: 127
/usr/bin/env: nvcc: No such file or directory
[Condition of type SIMPLE-ERROR]

OpenGL interoperability's performance lose after cleaning-up

After cleaning-up, the performance of N-body example with OpenGL interoperability seems worse than that of before cleaning-up. If my memory is correct, OpenGL interoperability gave a little performance gain.

  • before cleaning-up, about 5 percent of performance gain with OpenGL interoperability
  • after cleaning-up, about 40 percent of performance lose with OpenGL interoperability

It is a disappointing result if OpenGL interoperability gives performance lose.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.