Giter Club home page Giter Club logo

Comments (19)

wang-y-z avatar wang-y-z commented on August 24, 2024 2

Hi @ptillet , I was wondering will tirton support CSR/COO format spmm(or other sparse kernel) cuda kernel in the future without TensorCore, thx.

from triton.

ptillet avatar ptillet commented on August 24, 2024

Yep. The triton.ops.blocksparse.matmul supports S=DD, D=SD and D=D*S modes, all with tensor cores.

from triton.

YukeWang96 avatar YukeWang96 commented on August 24, 2024

Is there any code example that I can follow?
I check the official document, I could not search the one that describe this

Thanks!

from triton.

YukeWang96 avatar YukeWang96 commented on August 24, 2024

BTW, What is the type and format of S and D?
are they scipy.coo/csr?
Also, does this API support large sparse matrix in .mtx format?

from triton.

ptillet avatar ptillet commented on August 24, 2024

Oh, I think I may have misunderstood you. Triton only supports block-sparsity at the moment. An example of how it can be used for block-sparse attention can be seen here https://github.com/ptillet/triton/blob/master/python/test/test_blocksparse.py#L150-L159

The format consists of (1) a layout tensor of ones and zeros; (2) the raw data, as the concatenation of all the flattened blocks

from triton.

YukeWang96 avatar YukeWang96 commented on August 24, 2024

Hi,

For block-sparse computation, is it similar to something described in this post from NVIDIA?

Thanks!

from triton.

ptillet avatar ptillet commented on August 24, 2024

Yep

from triton.

YukeWang96 avatar YukeWang96 commented on August 24, 2024

Hi,

does the underlying implementation of this block-sparse API of triton for TC-core acceleration is based on cuSPARSE Block-SpMM? is this correct?

Thanks!

from triton.

ptillet avatar ptillet commented on August 24, 2024

Nope, it's based on custom kernels written in the Triton programming language. Much more concise. I didn't get a chance to compare the performance since there's no python api for cusparse block-spmm, but I would expect ours to be on par

from triton.

YukeWang96 avatar YukeWang96 commented on August 24, 2024

I have several follow-up questions.

  1. In order to use TensorCore, should I set the dtype=float16?
  2. What is the difference between using different MODE, such as sdd, dsd, dds.
  3. To compute a GEMM with MxNxK, is this setting for test_matmul correct?
test_matmul('sdd', True, False, 64, 'float16', Z=1, H=1, M=512, N=384, K=256)
  1. From the performance point of view, how should we set the BLOCK value?

Thanks a lot!

from triton.

ptillet avatar ptillet commented on August 24, 2024

1 - Yes. The FP32 version uses CUDA cores
2 - There are three modes:
SDD: sparse = dense x dense, a.k.a. sampled dense-dense matrix multiplication
DSD: dense = sparse x dense, the lhs is sparse
DDS: dense = dense x sparse, the rhs is sparse
3 - I believe so
4 - Supported block sizes are [16, 32, 64, 128]. At equal number of FLOPS, bigger blocks will perform better. 128 should have GPU efficiency roughly on the order of a dense matmul. Anything below may be quite a bit worse -- especially 16.

from triton.

YukeWang96 avatar YukeWang96 commented on August 24, 2024

is there any limit on the sparse matrix size?
for example an nxn sparse matrix, should I estimate its memory size as n x n x sizeof(float)?

from triton.

ptillet avatar ptillet commented on August 24, 2024

The memory footprint of our block-sparse matrices is equal to nnz * sizeof(dtype), where nnz is the number of non-zero elements

from triton.

YukeWang96 avatar YukeWang96 commented on August 24, 2024

Is there any example of loading a sparse matrix from .mtx sparse matrix?
I found that the example only shows the initialization of a sparse matrix without loading an external dataset.

from triton.

ptillet avatar ptillet commented on August 24, 2024

I have never tried loading .mtx data. I can give more information on the internal format used by Tritonm though. It is essentially COO, and consists of (1) a block-sparse layout tensor, and (2) a data tensor that is just of the concatenation of all flattened blocks, row-major.

from triton.

YukeWang96 avatar YukeWang96 commented on August 24, 2024

ok, essentially, .mtx is similar to COO. Do you have the example of how I can initialize a sparse tensor with COO (edge list)?
for example the edge list is

COO = [ 
[1, 2, 3],
[2, 3, 1]
]

to represent the non-zero element on [1, 2], [2,3], and [3, 1], so how should I initialize a sparse tensor with this COO?
Thanks!

from triton.

ptillet avatar ptillet commented on August 24, 2024

I don't have such an example, but looking at triton.testing.sparsify_tensor could be pretty useful https://github.com/ptillet/triton/blob/master/python/triton/testing.py#L12-L16 . Essentially converts a dense tensor + layout into a sparse tensor. You can see the format there :)

from triton.

lhl2017 avatar lhl2017 commented on August 24, 2024

Is there a way to see how the generated kernel works for Cuda programmers? Is it the best way to check C-like code? @ptillet

from triton.

ptillet avatar ptillet commented on August 24, 2024

Unfortunately, not at the moment. This would be particularly hard to do, since the Triton front-end directly generates LLVM-like SSA code with explicit branches. The best Triton will be able to do for the foreseeable future is to display the PTX

from triton.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.