takagi / cl-cuda Goto Github PK
View Code? Open in Web Editor NEWCl-cuda is a library to use NVIDIA CUDA in Common Lisp programs.
License: MIT License
Cl-cuda is a library to use NVIDIA CUDA in Common Lisp programs.
License: MIT License
Rewrite sph example with cuModuleGetGlobal after #46 .
OSICAT does not support MKTEMP on Windows, we can't get temporary CUDA C file name with it.
Setting up cl-cuda
seems to hook into gcc
to create the FFI. GCC is well and good thanks to MSYS2/MinGW64, but apparently the CUDA toolkit and MinGW don't play nice together. Is there any way to set up cl-cuda
to use the Windows CUDA toolchain?
Support unsigned long long type which is used in curand library.
See melisgl/cl-cuda@85c27a967e00edf6ef57ddebfacf2d4f30d76682 in #4.
Trying to load cl-cuda in sbcl, I get this error:
* (ql:quickload :cl-cuda)
debugger invoked on a LOAD-SYSTEM-DEFINITION-ERROR in thread #<THREAD "main thread" RUNNING {1002A8B383}>: Error while trying to load definition for system cl-cuda from pathname /home/dev/quicklisp/local-projects/cl-cuda/cl-cuda.asd: There is no class named CFFI-GROVEL::PROCESS-OP.
What am I doing wrong?
Support long type.
Grovel size_t
type which is environment-dependent.
Question:
See melisgl/cl-cuda@d6e6dd94a5ca7a8243f23f7eddecbbd56aa51ceb in #4
The test in t/test-cl-cuda.lisp rely on the memory available on the GPU.
(is-error (cl-cuda::alloc-memory-block 'int (* 1024 1024 256)) simple-error)
See melisgl/cl-cuda@97ea6cf7bdfc7450c033152b7d6b3d555bb5efd2 in issue #4 .
Update Installation section in README about quicklisp availability.
I requested quicklisp to add cl-cuda in its distribution but rejected because of its testing policy.
Currenlty, the compiler make brace blocks { ... }
when compiling following statements:
On the other hand, It does not make brace blocks when compiling PROGN statement.
Should PROGN statements correspond to brace blocks in CUDA C?
If yes, what should LET statements be compiled into?
{
int x = 0;
return x;
}
or
{
int x = 0;
{
return x;
}
}
I want to adopt the former compiled code.
Use _v2 of CUDA functions when available:
Question:
See melisgl/cl-cuda@db464369fa42f7090fa6ec6b3ee216d0279ee320 in #4
Improve the way to compile cl-cuda type to CUDA C type.
int
-> "int"
: OKcurand-state-xorwow
-> "curandStateXORWOW"
: NGcurand-state-xorwow
-> "curandStateXORWOW_t"
: OKCurrently, cl-cuda type is translated to string simply.
See melisgl/cl-cuda@85c27a967e00edf6ef57ddebfacf2d4f30d76682 in #4.
Add initializer syntax for CUDA vector types as compiled:
__device__ float4 foo = { 0.0, 1.0, 2.0, 3.0 };
It's because __device__
, __constant__
and __shared__
variables are not allowed for dynamic initialization so the following is invalid:
__device__ float4 foo = make_float4( 0.0, 1.0, 2.0, 3.0 );
COMPILE-DOUBLE is not compiled with expected precision and its test fails.
ร basic case 2
"(double)1.2345679" is expected to be "(double)1.23456789012345"
The variables below are forward-referenced and cause warnings:
See melisgl/cl-cuda@97ea6cf7bdfc7450c033152b7d6b3d555bb5efd2 in issue #4 .
Component ASDF/USER::CFFI-GROVEL not found
Should *nvcc-options*
variable be defparameter
ed than defvar
ed?
See melisgl/cl-cuda@ea8cf60e9c74e878973d85338f1ab727b76b68b3 in #4.
Since the value of size_t depends on its environment, it should be defined using cffi-grovel in cl-cuda.
The unused argument type
in definition of defkernelconst
macro causes a warning.
See melisgl/cl-cuda@97ea6cf7bdfc7450c033152b7d6b3d555bb5efd2 in issue #4 .
I'm failing this test on FC17
uname -r
3.9.10-100.fc17.x86_64
nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2012 NVIDIA Corporation
Built on Fri_Sep_21_17:28:58_PDT_2012
Cuda compilation tools, release 5.0, V0.2.1221
sbcl --version
SBCL 1.0.57-1.fc17
The error I receive is: CUDA_ERROR_LAUNCH_FAILED, which is, afaik, a generic error if "something" went wrong.
WARNING: This may not be a bug, in fact, this may be a misconfiguration on my side, however, I'd appreciate if you could tell me what else to check.
This is the output from the test:
VECTOR-ADD> (main)
CU-INIT succeeded.
CU-DEVICE-GET succeeded.
CU-CTX-CREATE succeeded.
CU-MEM-ALLOC succeeded.
CU-MEM-ALLOC succeeded.
CU-MEM-ALLOC succeeded.
CU-MEMCPY-HOST-TO-DEVICE succeeded.
CU-MEMCPY-HOST-TO-DEVICE succeeded.
nvcc -arch=sm_11 -I /home/wvxvw/quicklisp/local-projects/cl-cuda/include -ptx -o /tmp/cl-cuda-sBBXlw.ptx /tmp/cl-cuda-sBBXlw.cu
CU-MODULE-LOAD succeeded.
CU-MODULE-GET-FUNCTION succeeded.
CU-LAUNCH-KERNEL succeeded.
; Evaluation aborted on #<SIMPLE-ERROR "~A failed with driver API error No. ~A.~%~A" {1003FEF573}>.
Hello,
Running latest CCL with Version 1.11-r16635 on OS X 10.10.5
Could load and compile cl-cuda without problem.
I have a hard time referencing my version of CUDA which is NVIDIA-CUDA-7.5
When I run any cuda example, I get an error message:
e.g. (cl-cuda-examples.diffuse0:main)
Error: CUDA SDK not found.
While executing: CL-CUDA.DRIVER-API:CU-INIT, in process Listener(4).
How do I configure cl-cuda to reference the right framework ?
Should I recompile ?
Maybe a silly question - My first time using this library.
Get the size of pointer to cl-cuda's array
type using cffi:foreign-type-size
function.
See melisgl/cl-cuda@ea8cf60e9c74e878973d85338f1ab727b76b68b3 in #4.
Compile single float values to be explicitly typed to avoid being compiled as double float values.
before
0.0
after
0.0f
Additionally, fix double float values which are now compiled as (double)0.0
to 0.0
, double float literal.
before
(double)0.0
after
0.0
Fact
Problem
See also
#42
Cl-test-more, an unit testing library, is now named "prove".
Using the driver api precludes the usage of the runtime api in the same application ([1]) . Unfortunately cublas, cufft, etc are all based on the runtime api. Is the technically possible to migrate cl-cuda to the runtime api? If possible, is it perhaps undesirable for some reason?
[1] http://stackoverflow.com/questions/242894/cuda-driver-api-vs-cuda-runtime
Support Multi-GPU programming.
Currently, cl-cuda-misc
package is defined in misc/package.lisp
, separated from misc/convert-error-string.lisp
.
To follow one-package-per-file style, merge its definition into misc/convert-error-string.lisp
.
A device's compute capability can be get from cuDeviceComputeCapability
driver API. We can use it to pass nvcc's -arch
option.
Don't fail if cuda library cannot be loaded.
Question:
ignore-errors
if cuda library cannot be loaded, because without cuda library cl-cuda makes no sense.See melisgl/cl-cuda@ca0bde3fe89db1192f89bf2a702990900e996c61 in #4
Grovel CUdeviceptr
type from cuda_kernel.h.
Question:
See melisgl/cl-cuda@d6e6dd94a5ca7a8243f23f7eddecbbd56aa51ceb in #4
Add double float suport:
See melisgl/cl-cuda@ea8cf60e9c74e878973d85338f1ab727b76b68b3 and melisgl/cl-cuda@67f96a0b530808e70af7c495f7735d5ad9b29034 in #4.
Latter says about -arch=sm13
NVCC options needed for double floats.
Infer global's type from its initial value, removing type argument from DEFGLOBAL macro.
before
(defglobal x int 1)
after
(defglobal x 1) ; type of x is inferred as int.
Add selector macros for CUDA vector types' CL counterparts: float3, float4, double3 and double4.
(defmacro with-float4 ((x y z w) value &body body)
(once-only (value)
`(let ((,x (float4-x ,value))
(,y (float4-y ,value))
(,z (float4-z ,value))
(,w (float4-w ,value)))
(declare (ignorable ,x ,y ,z ,w))
,@body)))
Write CPU version of sph example again after #49.
Add tests for stream, introduced in 4eb8f9f.
Currently, a function specifier is determined by its return type, that __global__
for void
type and __device__
for not void
type.
For example,
(defkernel foo (void ())
(return))
is compiled into:
__global__ void foo () {
return;
}
Because of this rule, a __device__
kernel function that returns void
type can't be defined.
To solve this problem, following syntaxes may be given:
(defdevicekernel foo (void ()) ...
(defkernel (foo :device) (void ()) ...
(defkernel foo :device (void ()) ...
(defkernel foo ((void :device) ()) ...
(defkernel foo (void :device ()) ...
I think of choosing the second one. Function specifiers can be omitted and the current rule is applied in such case.
:global
is specified:
(defkernel (foo :global) (void ())
(return))
;; compiled into: __global__ void foo () { ... }
:device
is specified:
(defkernel (bar :device) (void ())
(return))
;; compiled into: __device__ void bar () { ... }
__global__
is complemented because return type is void
:
(defkernel foofoo (void ())
(return))
;; compiled into: __global__ void foofoo () { ... }
__device__
is complemented because return type is int
:
(defkernel baz (int ())
(return 1))
;; compiled into: __device__ int baz () { ... }
Output definitions in defined order, currently reversed.
In the definition of cl-cuda-interop:alloc-memory-block
, since alloc-gl-array
function's type
argument accepts only symbols, structure type references must be passd to type
in bare style which is actually deprecated in CFFI. For example, foo
must be passed instead of (:struct foo)
.
NG: (alloc-gl-array '(:struct foo) count)
OK: (alloc-gl-array 'foo count)
As a working around for this problem, I define bare-cffi-type
function which convert structure type references from the form (:struct foo)
to foo
, and pass its returning value to alloc-gl-array
function.
(alloc-gl-array (bare-cffi-type type) count)
This problem is already reported on cl-opengl's issue tracker #41.
Currently, cl-cuda is not available in Quicklisp distribution because of its testing policy (see #514 in quicklisp-projects).
It may be accepted if it just finished to be compiled without condition on an environment where CUDA SDK is not installed even though it does not work.
Approach:
cuda.h
before evaluate the defsystem
form in cl-cuda.asd
*features*
which mentions CUDA SDK is not installeddefsystem
form, avoid cffi-grovel:grovel-file
form to be evaluated by looking *features*
Quetions:
cffi-grovel:grovel-file
?Specifying cffi structure type without :struct keyword causes warnings. For example, float3
structure type should be specified as '(:struct float3)
, not 'float3
, to avoid warnings.
See melisgl/cl-cuda@97ea6cf7bdfc7450c033152b7d6b3d555bb5efd2 in issue #4 .
Support cuModuleGetGlobal dirver API. It is useful in case using parameters which are dynamically determined in a program but not changed across launching kernel functions.
Only MEMORY-BLOCKs were suported previously which is fine as long as
one uses ALLOC-MEMORY-BLOCK. With this change CU-DEVICE-PTRs obtained
directly from CU-MEM-ALLOC can be used.
See melisgl/cl-cuda@67f96a0b530808e70af7c495f7735d5ad9b29034 in #4.
Hi there, I get the following error trying to quickload cl-cuda. The error message at the end is in German, it says "fatal error: cuda.h: File or directory not found":
debugger invoked on a CFFI-GROVEL:GROVEL-ERROR in thread #<THREAD "main thread" RUNNING {1002A8AF53}>: External process exited with code 1.
Command was: "cc" "-m64" "-I/home/mwoehrle/quicklisp/dists/quicklisp/software/cffi_0.14.0/" "-o" "/home/mwoehrle/.cache/common-lisp/sbcl-1.1.14.debian-linux-x64/home/mwoehrle/quicklisp/local-projects/local-projects/cl-cuda/src/driver-api/type-grovel" "/home/mwoehrle/.cache/common-lisp/sbcl-1.1.14.debian-linux-x64/home/mwoehrle/quicklisp/local-projects/local-projects/cl-cuda/src/driver-api/type-grovel.c"
Output was:
/home/mwoehrle/.cache/common-lisp/sbcl-1.1.14.debian-linux-x64/home/mwoehrle/quicklisp/local-projects/local-projects/cl-cuda/src/driver-api/type-grovel.c:6:18: fatal error: cuda.h: Datei oder Verzeichnis nicht gefunden
#include <cuda.h>
^
compilation terminated.
The size of basic-types can be got with cffi:foreign-type-size
function.
See melisgl/cl-cuda@85c27a967e00edf6ef57ddebfacf2d4f30d76682 in #4.
After trying the code on the main page...
(defun main ()
(let* ((dev-id 0)
(n 1024)
(threads-per-block 256)
(blocks-per-grid (/ n threads-per-block)))
(with-cuda (dev-id)
(with-memory-blocks ((a 'float n)
(b 'float n)
(c 'float n))
(random-init a n)
(random-init b n)
(sync-memory-block a :host-to-device)
(sync-memory-block b :host-to-device)
(vec-add-kernel a b c n
:grid-dim (list blocks-per-grid 1 1)
:block-dim (list threads-per-block 1 1))
(sync-memory-block c :device-to-host)
(verify-result a b c n)))))
I got this error
nvcc exits with code: 127
/usr/bin/env: nvcc: No such file or directory
[Condition of type SIMPLE-ERROR]
After cleaning-up, the performance of N-body example with OpenGL interoperability seems worse than that of before cleaning-up. If my memory is correct, OpenGL interoperability gave a little performance gain.
It is a disappointing result if OpenGL interoperability gives performance lose.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.