Comments (5)
Hello,
You can use the bench-blas executable included in the package. If CMake detects other BLAS implementations on your computer (clBLAS, OpenBLAS, cuBLAS...), it will benchmark against those.
USAGE: ${BUILD_DIR}/bench/bench-blas gemm
Alternatively, you can link whatever executable you want against Isaac instead of clBLAS. It'll work for BLAS1, GEMV and GEMM.
from triton.
Quick questions:
(1) Is there any option to specify single or double for gemm ? like sgemm, dgemm ?
(2) Can I just run one "N" instance ? like, square 5000 x 5000 ?
(3) I see that ISAAC is outperforming clBLAS by 395 vs. 123 below. I interpreted right , right ?
(4) Compared to ViennaCL, what is the high-level difference ?
Thanks ! for the helpful answers.
./bench-blas 0 gemm
#Benchmark : BLAS
#----------------
#gemm (GFLOPS)
"N" "ISAAC" "clBLAS"
"square896" 395 123
"square2560" 390 117
"conv1" 251 59
"conv2" 326 124
"conv3" 225 109
"conv4" 286 103
"conv5" 214 93
"ica32" 83 13
"ica256" 337 64
"32rank1-4096" 275 111
"32rank1-3456" 270 116
"32rank1-896" 179 97
from triton.
(1) Although DGEMM is supported, I had no time to run the auto-tuner for double precision on all existing architectures. I will make it more easily benchmarkable once everything is included.
(2) For now, you can edit bench/blas.cpp to add the shapes that you want. Ideally, I should indeed provide a config file that lets one benchmark isaac more easily for arbitrary shapes. The shapes included for this benchmark are square, those found in 5 layers of alexnet convolutions (if you do im2col), shapes found in covariance/ica computation, and shapes found in SVD (32 rank1 updated on a 896,3456 and 4096 elements square matrices).
(3) Yes, that's correct! 395GFLOPS vs 123GFLOPS
(4) Oh, many things! I wrote the BLAS3 kernels for ViennaCL, actually. The most notable difference is that ViennaCL is tuned for square matrices, while ISAAC uses a machine learning model to be tuned for any input shape. ViennaCL GEMM also has less efficient bounds checking (isaac uses tricky pointer arithmetics, ViennaCL uses cleanup kernels).
Ultimately, ViennaCL/CLBlas/etc.'s device database are datastructures that map compute devices to kernel parameters, while ISAAC's maps compute devices to a model that predicts kernel parameters given input shapes.
from triton.
Great. Thank you for the information !
On Wed, Sep 14, 2016 at 4:59 PM, ptillet [email protected] wrote:
(1) Although DGEMM is supported, I had no time to run the auto-tuner for
double precision on all existing architectures. I will make it more easily
benchmarkable once everything is included.
(2) For now, you can edit bench/blas.cpp to add the shapes that you want.
Ideally, I should indeed provide a config file that lets one benchmark
isaac more easily for arbitrary shapes. The shapes included for this
benchmark are square, those found in 5 layers of alexnet convolutions (if
you do im2col), shapes found in covariance/ica computation, and shapes
found in SVD (32 rank1 updated on a 896,3456 and 4096 elements square
matrices).
(3) Yes, that's correct! 395GFLOPS vs 123GFLOPS
(4) Oh, many things! I wrote the BLAS3 kernels for ViennaCL, actually. The
most notable difference is that ViennaCL is tuned for square matrices,
while ISAAC uses a machine learning model to be tuned for any input shape.
ViennaCL GEMM also has less efficient bounds checking (isaac uses tricky
pointer arithmetics, ViennaCL uses cleanup kernels).
Ultimately, ViennaCL/CLBlas/etc.'s device database are datastructures that
map compute devices to kernel parameters, while ISAAC's maps compute
devices to a model that predicts kernel parameters given input shapes.—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#6 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AVK7CoR1b-0vnu9EHQt81LODeeQtd_JZks5qqCgHgaJpZM4J87MP
.
from triton.
You're welcome :)
from triton.
Related Issues (20)
- how to use transpose in pytorch HOT 1
- Support masking in atomic_cas?
- Triton Error [CUDA]: device kernel image is invalid HOT 1
- Cannot specify which device to use HOT 2
- flash-attention gradient calculation fail due to numerical error HOT 1
- Hitting an assertion in `RemoveLayoutConversions` Pass. Relevant for both cuda and hip backends. HOT 2
- Help build WHL for Windows
- How to add a pow function in python.triton.language.core? HOT 1
- tensor not support index in nvidia gpu?
- No matching distribution found for triton-nightly
- cdot minimal size operation?
- How to create a constant in kernel function
- It raises error when I run 06-fused-attention.py HOT 1
- How to understand getPtrAlignment in axisinfo?
- `03-matrix-multiplication.py` results in inaccurate numerics on H100 and MI300 using `float32` HOT 5
- Pytorch/Triton of 2.3.0 version loads bool type mask in triton, encountering error when use tl.where HOT 4
- Unable to download llvm when building from source. HOT 9
- can we use triton kernel in our model when using torch.onnx.export
- [BUG] Using both reduction and atomic operations along with autotune makes incorrect results HOT 2
- cross-compile triton kernels with tools/compile.py
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from triton.