takagi / cl-cuda Goto Github PK

View Code? Open in Web Editor NEW

276.0 276.0 24.0 788 KB

Cl-cuda is a library to use NVIDIA CUDA in Common Lisp programs.

License: MIT License

Common Lisp 94.66% C 5.34%

cl-cuda's People

Contributors

Stargazers

Watchers

cl-cuda's Issues

Rewrite sph example with cuModuleGetGlobal.

Rewrite sph example with cuModuleGetGlobal after #46 .

Generating temporary CUDA C file name fails on Windows.

OSICAT does not support MKTEMP on Windows, we can't get temporary CUDA C file name with it.

Any way to run on Windows?

Setting up cl-cuda seems to hook into gcc to create the FFI. GCC is well and good thanks to MSYS2/MinGW64, but apparently the CUDA toolkit and MinGW don't play nice together. Is there any way to set up cl-cuda to use the Windows CUDA toolchain?

support unsigned long long type

Support unsigned long long type which is used in curand library.

See melisgl/cl-cuda@85c27a967e00edf6ef57ddebfacf2d4f30d76682 in #4.

no class named CFFI-GROVEL::PROCESS-OP

Trying to load cl-cuda in sbcl, I get this error:

* (ql:quickload :cl-cuda)

debugger invoked on a LOAD-SYSTEM-DEFINITION-ERROR in thread #<THREAD "main thread" RUNNING {1002A8B383}>: Error while trying to load definition for system cl-cuda from pathname /home/dev/quicklisp/local-projects/cl-cuda/cl-cuda.asd: There is no class named CFFI-GROVEL::PROCESS-OP.

What am I doing wrong?

Support long type.

grovel size_t type

Grovel size_t type which is environment-dependent.

Question:

where to place a grovel specifiation file?

See melisgl/cl-cuda@d6e6dd94a5ca7a8243f23f7eddecbbd56aa51ceb in #4

test which rely on the memory available on the GPU

The test in t/test-cl-cuda.lisp rely on the memory available on the GPU.

(is-error (cl-cuda::alloc-memory-block 'int  (* 1024 1024 256)) simple-error)

See melisgl/cl-cuda@97ea6cf7bdfc7450c033152b7d6b3d555bb5efd2 in issue #4 .

update Installation section in README

Update Installation section in README about quicklisp availability.

I requested quicklisp to add cl-cuda in its distribution but rejected because of its testing policy.

quicklisp/quicklisp-projects#514

PROGN statements and brace blocks "{ ... }" in CUDA C

Currenlty, the compiler make brace blocks { ... } when compiling following statements:

IF
LET
SYMBOL-MACROLET
DO
WITH-SHARED-MEMORY

On the other hand, It does not make brace blocks when compiling PROGN statement.

Should PROGN statements correspond to brace blocks in CUDA C?

If yes, what should LET statements be compiled into?

{
  int x = 0;
  return x;
}

{
  int x = 0;
  {
    return x;
  }
}

I want to adopt the former compiled code.

use _v2 of CUDA functions when available

Use _v2 of CUDA functions when available:

cuCtxCreate_v2
cuCtxDestroy_v2
cuMemAlloc_v2
cuMemFree_v2
cuMemcpyHtoD_v2
cuMemcpyDtoH_v2
cuEventDestroy_v2

Question:

are there any other functions having _v2?

See melisgl/cl-cuda@db464369fa42f7090fa6ec6b3ee216d0279ee320 in #4

improve compiling cl-cuda type to CUDA C type

Improve the way to compile cl-cuda type to CUDA C type.

int -> "int" : OK
curand-state-xorwow -> "curandStateXORWOW" : NG
curand-state-xorwow -> "curandStateXORWOW_t" : OK

Currently, cl-cuda type is translated to string simply.

See melisgl/cl-cuda@85c27a967e00edf6ef57ddebfacf2d4f30d76682 in #4.

Add initializer syntax for CUDA vector types.

Add initializer syntax for CUDA vector types as compiled:

__device__ float4 foo = { 0.0, 1.0, 2.0, 3.0 };

It's because __device__, __constant__ and __shared__ variables are not allowed for dynamic initialization so the following is invalid:

__device__ float4 foo = make_float4( 0.0, 1.0, 2.0, 3.0 );

COMPILE-DOUBLE is not compiled with expected precision.

COMPILE-DOUBLE is not compiled with expected precision and its test fails.

  × basic case 2 
    "(double)1.2345679" is expected to be "(double)1.23456789012345"

warnings caused by forward references

The variables below are forward-referenced and cause warnings:

+built-in-functions+
+built-in-macros+
kernel-manager

See melisgl/cl-cuda@97ea6cf7bdfc7450c033152b7d6b3d555bb5efd2 in issue #4 .

error when loading

Component ASDF/USER::CFFI-GROVEL not found

modify to defparameter nvcc-options variable

Should *nvcc-options* variable be defparametered than defvared?

See melisgl/cl-cuda@ea8cf60e9c74e878973d85338f1ab727b76b68b3 in #4.

size_t should be defined using cffi-grovel

Since the value of size_t depends on its environment, it should be defined using cffi-grovel in cl-cuda.

warnings caused by an unused argument

The unused argument type in definition of defkernelconst macro causes a warning.

See melisgl/cl-cuda@97ea6cf7bdfc7450c033152b7d6b3d555bb5efd2 in issue #4 .

support curand XORWOW

Support curand XORWOW:

curand_init
curand_uniform
curandStateXORWOW_t

Depends on #15, #19, #21 and #22.

See melisgl/cl-cuda@85c27a967e00edf6ef57ddebfacf2d4f30d76682 in #4.

Failing vector-add test (Linux amd64 CUDA 5)

I'm failing this test on FC17

uname -r
3.9.10-100.fc17.x86_64

nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2012 NVIDIA Corporation
Built on Fri_Sep_21_17:28:58_PDT_2012
Cuda compilation tools, release 5.0, V0.2.1221

sbcl --version
SBCL 1.0.57-1.fc17

The error I receive is: CUDA_ERROR_LAUNCH_FAILED, which is, afaik, a generic error if "something" went wrong.

WARNING: This may not be a bug, in fact, this may be a misconfiguration on my side, however, I'd appreciate if you could tell me what else to check.

This is the output from the test:

VECTOR-ADD> (main)
CU-INIT succeeded.
CU-DEVICE-GET succeeded.
CU-CTX-CREATE succeeded.
CU-MEM-ALLOC succeeded.
CU-MEM-ALLOC succeeded.
CU-MEM-ALLOC succeeded.
CU-MEMCPY-HOST-TO-DEVICE succeeded.
CU-MEMCPY-HOST-TO-DEVICE succeeded.
nvcc -arch=sm_11 -I /home/wvxvw/quicklisp/local-projects/cl-cuda/include -ptx -o /tmp/cl-cuda-sBBXlw.ptx /tmp/cl-cuda-sBBXlw.cu
CU-MODULE-LOAD succeeded.
CU-MODULE-GET-FUNCTION succeeded.
CU-LAUNCH-KERNEL succeeded.
; Evaluation aborted on #<SIMPLE-ERROR "~A failed with driver API error No. ~A.~%~A" {1003FEF573}>.

Cannot find CUDA SDK

Hello,
Running latest CCL with Version 1.11-r16635 on OS X 10.10.5
Could load and compile cl-cuda without problem.
I have a hard time referencing my version of CUDA which is NVIDIA-CUDA-7.5
When I run any cuda example, I get an error message:
e.g. (cl-cuda-examples.diffuse0:main)

Error: CUDA SDK not found.
While executing: CL-CUDA.DRIVER-API:CU-INIT, in process Listener(4).
How do I configure cl-cuda to reference the right framework ?
Should I recompile ?

Maybe a silly question - My first time using this library.

get size of pointer to cl-cuda's array type

Get the size of pointer to cl-cuda's array type using cffi:foreign-type-size function.

See melisgl/cl-cuda@ea8cf60e9c74e878973d85338f1ab727b76b68b3 in #4.

Appropriately compile single and double precision float values.

Compile single float values to be explicitly typed to avoid being compiled as double float values.

before

0.0

after

0.0f

Additionally, fix double float values which are now compiled as (double)0.0 to 0.0, double float literal.

before

(double)0.0

after

0.0

Misleading information when failed to load libcuda.

Fact

When cl-cuda is being loaded, it is completed even if loading libcuda failed.
After that, when foreign functions in libcuda are called, CL-CUDA.DRIVER-API::SDK-NOT-FOUND-ERROR is raised to show "CUDA SDK not found."

Problem

We can not find out that the reason of the condition raised is CUDA SDK is not found or found but its loading has some problems.

See also
#42

modify to defparameter +built-in-functions+

Modify to defparameter +built-in-functions+.

Questions:

also defparameter +built-in-macros+?
relate with #5?

See melisgl/cl-cuda@85c27a967e00edf6ef57ddebfacf2d4f30d76682 in #4.

Use prove instead of cl-test-more.

Cl-test-more, an unit testing library, is now named "prove".

runtime api interopability (cublas)

Using the driver api precludes the usage of the runtime api in the same application ([1]) . Unfortunately cublas, cufft, etc are all based on the runtime api. Is the technically possible to migrate cl-cuda to the runtime api? If possible, is it perhaps undesirable for some reason?

[1] http://stackoverflow.com/questions/242894/cuda-driver-api-vs-cuda-runtime

Support Multi-GPU programming.

define CL-CUDA-MISC package in convert-error-string.lisp itself

Currently, cl-cuda-misc package is defined in misc/package.lisp, separated from misc/convert-error-string.lisp.

To follow one-package-per-file style, merge its definition into misc/convert-error-string.lisp.

Automatically set nvcc's "-arch" option from cuDeviceComputeCapability.

A device's compute capability can be get from cuDeviceComputeCapability driver API. We can use it to pass nvcc's -arch option.

don't fail if cuda library cannot be loaded

Don't fail if cuda library cannot be loaded.

Question:

why want to ignore-errors if cuda library cannot be loaded, because without cuda library cl-cuda makes no sense.

See melisgl/cl-cuda@ca0bde3fe89db1192f89bf2a702990900e996c61 in #4

grovel CUdeviceptr type

Grovel CUdeviceptr type from cuda_kernel.h.

Question:

also grovel other CUDA driver API types?
also grovel other CUDA driver API functions, structures and enumerations?
where to place a grovel specification file?

See melisgl/cl-cuda@d6e6dd94a5ca7a8243f23f7eddecbbd56aa51ceb in #4

add double float support

Add double float suport:

double
double3
double4

See melisgl/cl-cuda@ea8cf60e9c74e878973d85338f1ab727b76b68b3 and melisgl/cl-cuda@67f96a0b530808e70af7c495f7735d5ad9b29034 in #4.

Latter says about -arch=sm13 NVCC options needed for double floats.

Infer global's type from its initial value.

Infer global's type from its initial value, removing type argument from DEFGLOBAL macro.

before

(defglobal x int 1)

after

(defglobal x 1)    ; type of x is inferred as int.

Add selector macros for CUDA vector types' CL counterparts.

Add selector macros for CUDA vector types' CL counterparts: float3, float4, double3 and double4.

(defmacro with-float4 ((x y z w) value &body body)
  (once-only (value)
    `(let ((,x (float4-x ,value))
           (,y (float4-y ,value))
           (,z (float4-z ,value))
           (,w (float4-w ,value)))
       (declare (ignorable ,x ,y ,z ,w))
       ,@body)))

Write CPU version of sph example again.

Write CPU version of sph example again after #49.

add some math functions

Add math functions:

exp
log
__double2int_rn

Depends on #15.

See melisgl/cl-cuda@1713af4a7a6d8cdbb3048d8f4f21ac99f6010d21 in #4.

Add tests for stream.

Add tests for stream, introduced in 4eb8f9f.

can't define a device kernel function that returns void type

Currently, a function specifier is determined by its return type, that __global__ for void type and __device__ for not void type.

For example,

(defkernel foo (void ())
  (return))

is compiled into:

__global__ void foo () {
  return;
}

Because of this rule, a __device__ kernel function that returns void type can't be defined.

To solve this problem, following syntaxes may be given:

(defdevicekernel foo (void ()) ...
(defkernel (foo :device) (void ()) ...
(defkernel foo :device (void ()) ...
(defkernel foo ((void :device) ()) ...
(defkernel foo (void :device ()) ...

I think of choosing the second one. Function specifiers can be omitted and the current rule is applied in such case.

:global is specified:

(defkernel (foo :global) (void ())
  (return))
;; compiled into: __global__ void foo () { ... }

:device is specified:

(defkernel (bar :device) (void ())
  (return))
;; compiled into: __device__ void bar () { ... }

__global__ is complemented because return type is void:

(defkernel foofoo (void ())
  (return))
;; compiled into: __global__ void foofoo () { ... }

__device__ is complemented because return type is int:

(defkernel baz (int ())
  (return 1))
;; compiled into: __device__ int baz () { ... }

Output definitions in defined order.

Output definitions in defined order, currently reversed.

passing structure type references to ALLOC-GL-ARRAY's TYPE argument

In the definition of cl-cuda-interop:alloc-memory-block, since alloc-gl-array function's type argument accepts only symbols, structure type references must be passd to type in bare style which is actually deprecated in CFFI. For example, foo must be passed instead of (:struct foo).

NG: (alloc-gl-array '(:struct foo) count)
OK: (alloc-gl-array 'foo count)

As a working around for this problem, I define bare-cffi-type function which convert structure type references from the form (:struct foo) to foo, and pass its returning value to alloc-gl-array function.

(alloc-gl-array (bare-cffi-type type) count)

This problem is already reported on cl-opengl's issue tracker #41.

to be accepted in Quicklisp distribution

Currently, cl-cuda is not available in Quicklisp distribution because of its testing policy (see #514 in quicklisp-projects).

It may be accepted if it just finished to be compiled without condition on an environment where CUDA SDK is not installed even though it does not work.

Approach:

try to compile grovel files which include cuda.h before evaluate the defsystem form in cl-cuda.asd
a condition would be signaled since CUDA SDK is not installed
handle the condition and push a flag to *features* which mentions CUDA SDK is not installed
in the defsystem form, avoid cffi-grovel:grovel-file form to be evaluated by looking *features*

Quetions:

may be warned that some symbols are not found if avoid cffi-grovel:grovel-file?

warnings caused by specifying cffi structure type

Specifying cffi structure type without :struct keyword causes warnings. For example, float3 structure type should be specified as '(:struct float3), not 'float3, to avoid warnings.

See melisgl/cl-cuda@97ea6cf7bdfc7450c033152b7d6b3d555bb5efd2 in issue #4 .

Support cuModuleGetGlobal driver API.

Support cuModuleGetGlobal dirver API. It is useful in case using parameters which are dynamically determined in a program but not changed across launching kernel functions.

Support cuModuleGetGlobal driver API.
Introduce a cl-cuda API to define CUDA C global.

support pointers and integers when launching kernels

Only MEMORY-BLOCKs were suported previously which is fine as long as
one uses ALLOC-MEMORY-BLOCK. With this change CU-DEVICE-PTRs obtained
directly from CU-MEM-ALLOC can be used.

See melisgl/cl-cuda@67f96a0b530808e70af7c495f7735d5ad9b29034 in #4.

Can't compile on Ubuntu 14.4 / CUDA 6.5

Hi there, I get the following error trying to quickload cl-cuda. The error message at the end is in German, it says "fatal error: cuda.h: File or directory not found":

(ql:quickload :cl-cuda)
To load "cl-cuda":
Load 1 ASDF system:
cl-cuda
; Loading "cl-cuda"
........; cc -m64 -I/home/mwoehrle/quicklisp/dists/quicklisp/software/cffi_0.14.0/ -o /home/mwoehrle/.cache/common-lisp/sbcl-1.1.14.debian-linux-x64/home/mwoehrle/quicklisp/local-projects/local-projects/cl-cuda/src/driver-api/type-grovel /home/mwoehrle/.cache/common-lisp/sbcl-1.1.14.debian-linux-x64/home/mwoehrle/quicklisp/local-projects/local-projects/cl-cuda/src/driver-api/type-grovel.c

debugger invoked on a CFFI-GROVEL:GROVEL-ERROR in thread #<THREAD "main thread" RUNNING {1002A8AF53}>: External process exited with code 1.
Command was: "cc" "-m64" "-I/home/mwoehrle/quicklisp/dists/quicklisp/software/cffi_0.14.0/" "-o" "/home/mwoehrle/.cache/common-lisp/sbcl-1.1.14.debian-linux-x64/home/mwoehrle/quicklisp/local-projects/local-projects/cl-cuda/src/driver-api/type-grovel" "/home/mwoehrle/.cache/common-lisp/sbcl-1.1.14.debian-linux-x64/home/mwoehrle/quicklisp/local-projects/local-projects/cl-cuda/src/driver-api/type-grovel.c"
Output was:
/home/mwoehrle/.cache/common-lisp/sbcl-1.1.14.debian-linux-x64/home/mwoehrle/quicklisp/local-projects/local-projects/cl-cuda/src/driver-api/type-grovel.c:6:18: fatal error: cuda.h: Datei oder Verzeichnis nicht gefunden
#include <cuda.h>
^
compilation terminated.

get the size of basic types using CFFI:FOREIGN-TYPE-SIZE function

The size of basic-types can be got with cffi:foreign-type-size function.

See melisgl/cl-cuda@85c27a967e00edf6ef57ddebfacf2d4f30d76682 in #4.

Simple Example not working

After trying the code on the main page...

(defun main ()
(let* ((dev-id 0)
(n 1024)
(threads-per-block 256)
(blocks-per-grid (/ n threads-per-block)))
(with-cuda (dev-id)
(with-memory-blocks ((a 'float n)
(b 'float n)
(c 'float n))
(random-init a n)
(random-init b n)
(sync-memory-block a :host-to-device)
(sync-memory-block b :host-to-device)
(vec-add-kernel a b c n
:grid-dim (list blocks-per-grid 1 1)
:block-dim (list threads-per-block 1 1))
(sync-memory-block c :device-to-host)
(verify-result a b c n)))))

I got this error

nvcc exits with code: 127
/usr/bin/env: nvcc: No such file or directory
[Condition of type SIMPLE-ERROR]

OpenGL interoperability's performance lose after cleaning-up

After cleaning-up, the performance of N-body example with OpenGL interoperability seems worse than that of before cleaning-up. If my memory is correct, OpenGL interoperability gave a little performance gain.

before cleaning-up, about 5 percent of performance gain with OpenGL interoperability
after cleaning-up, about 40 percent of performance lose with OpenGL interoperability

It is a disappointing result if OpenGL interoperability gives performance lose.

takagi / cl-cuda Goto Github PK

cl-cuda's People

Contributors

Stargazers

Watchers

Forkers

cl-cuda's Issues

Recommend Projects

Recommend Topics

Recommend Org