Deion I try to define a kernel by using cupy's <code class="

Thank you <a class="user-mention notranslate" data-hovercard-type="user" data-hovercar

Can I launch cupy kernels in C++? about cupy HOT 15 OPEN

chaoming0625 commented on June 15, 2024

Can I launch cupy kernels in C++?

from cupy.

Comments (15)

chaoming0625 commented on June 15, 2024 2

Thank you @kmaehashi @leofang . Currently, I am using the pointer in RawKernel.kernel.ptr just as @leofang pointed out. However, I also agree the suggestion of @kmaehashi is right.

The motivation for my question is to use cupy as a compiler to compile custom CUDA extensions on JAX. JAX's jit system needs to register an XLA custom call when using customized Cuda kernels. Usually, we need to write Cuda code, pre-compile it, bind it to Python, and register kernels in XLA. To remove such a complex process, we can directly compile the source code (as a Python code) at the Python level, then get the compiled kernel, throw it into the custom call, and all things are compatible with jax's jit system, with the minimal efforts (only writing the Python string).

Currently, we are working on this functionality.

from cupy.

kmaehashi commented on June 15, 2024 1

Are there any specific reasons to use CuPy for that purpose? If your C++ application needs to compile CUDA code on the fly, you can just call NVRTC to get cubin/ptx.

from cupy.

takagi commented on June 15, 2024

You cannot launch a kernel defined by cupy.RawKernel, however, an option may be using cupy.RawModule that can be used to load a .cubin or .ptx file. Does it fit?
https://docs.cupy.dev/en/stable/reference/generated/cupy.RawModule.html

from cupy.

chaoming0625 commented on June 15, 2024

So a great answer! Therefore, the key is to use RawModule to generate a .cubin or .ptx file, then I load the generated under the c++ backend to run. Am I right?

from cupy.

chaoming0625 commented on June 15, 2024

Moreover, can cupyx.jit.rawkernel compiled kernels to be saved into a .ptx file?

from cupy.

takagi commented on June 15, 2024

What I meant was the opposite. You write in C++ (.cu file) and compile it into .cubin or .ptx files, then you can use them from the RawModule. cupyx.jit.rawkernel doesn't have a feature to save .ptx file in a way that is easily usable from external programs.

from cupy.

chaoming0625 commented on June 15, 2024

Thanks for the explanation. I am wondering how to get the compiled binary when using cupy.RawKernel?

from cupy.

takagi commented on June 15, 2024

Please supply a path to the compiled binary to path argument of RawModule (not RawKernel). https://docs.cupy.dev/en/stable/reference/generated/cupy.RawModule.html

from cupy.

chaoming0625 commented on June 15, 2024

I try to use cupy to compile the cuda code, and get its compiled kernel, rather than providing the path of a compiled CUDA binary (*.cubin) or a PTX file. So, I am wondering how to provide the cuda source code and then get the cupy compiled binary file path?

from cupy.

chaoming0625 commented on June 15, 2024

Or, how can I get the kernel under $HOME/.cupy/kernel_cache/ directory? The name of a .cubin file seems have no pattern?

from cupy.

leofang commented on June 15, 2024

In theory, for a given RawKernel (either you use it directly, or get it via RawModule.get_function) you can retrieve the CUFunction pointer via RawKernel.kernel.ptr, but

This is not public API
This is untested

It's unclear to me either why you'd need this, @chaoming0625 could you elaborate?

from cupy.

chaoming0625 commented on June 15, 2024

Moreover, can I get the pointer of the function after compiling through cupyx.jit.rawkernel?

from cupy.

leofang commented on June 15, 2024

Thanks for sharing your use case @chaoming0625, this is very interesting!

Would you be able to point us how you use this capability to make CuPy and Jax interoperable at the kernel level? I would love to see how it allows you to avoid writing complex boilerplate code. Eventually, I would like to learn how to craft a small interop demo like the one we showed for PyTorch-CuPy:
https://docs.cupy.dev/en/stable/user_guide/interoperability.html#using-custom-kernels-in-pytorch
If you already have a small demo that we can copy/paste to the document that's even better! 😄

Moreover, can I get the pointer of the function after compiling through cupyx.jit.rawkernel?

Right now it's not public API either, but according to the internal implementation (subject to change)

cupy/cupyx/jit/_interface.py

Line 91 in 4179286

kern, enable_cg = self._cache.get((in_types, device_id), (None, None))

it is possible to get the Function object from jit.rawkernel._cache once instantiated (it's the ~~key~~ value of the cache). Then, you can get the CUFunction pointer via Function.ptr as before.

If you show us your workflow as I ask above, it'll help us stabilize the interface and expose these features properly. Thanks!

from cupy.

Can I launch cupy kernels in C++? about cupy HOT 15 OPEN

Comments (15)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent