Comments (4)
Hi,
Interesting, the OpenCL driver reports gfx803 instead of Fiji. What is the GPU and driver version?
from triton.
Also, can you retry now with the latest version of origin/master?
from triton.
Hi,
Updated version works much better now
$ ./bench-blas
Devices available:
------------------
[x] - gfx803 on AMD Accelerated Parallel Processing
------------------
BENCH M N K AT BT ISAAC
Deep 1760 16 1760 N N 0.30
Deep 1760 32 1760 N N 0.59
Deep 1760 64 1760 N N 0.70
Deep 1760 128 1760 N N 0.74
Deep 1760 7000 1760 N N 1.27
Deep 2048 16 2048 N N 0.34
Deep 2048 32 2048 N N 0.66
Deep 2048 64 2048 N N 0.80
Deep 2048 128 2048 N N 0.81
Deep 2048 7000 2048 N N 1.02
Deep 2560 16 2560 N N 0.54
Deep 2560 32 2560 N N 0.56
Deep 2560 64 2560 N N 0.62
Deep 2560 128 2560 N N 0.81
Deep 2560 7000 2560 N N 1.09
Deep 1760 16 1760 T N 0.28
Deep 1760 32 1760 T N 0.55
Deep 1760 64 1760 T N 1.01
Deep 1760 128 1760 T N 1.13
Deep 1760 7000 1760 T N 1.58
Deep 2048 16 2048 T N 0.07
Deep 2048 32 2048 T N 0.11
Deep 2048 64 2048 T N 0.38
Deep 2048 128 2048 T N 0.28
Deep 2048 7000 2048 T N 0.50
Deep 2560 16 2560 T N 0.25
Deep 2560 32 2560 T N 0.48
Deep 2560 64 2560 T N 0.53
Deep 2560 128 2560 T N 0.98
Deep 2560 7000 2560 T N 0.83
Deep 1760 7133 1760 N T 0.98
Memory access fault by GPU node-1 on address 0x916cf2000. Reason: Page not present or supervisor privilege.
Aborted (core dumped)
The error I see is not benchmark error I guess. clBLAS tests throw the same error. Probably it's on driver side.
I have RX 480 installed
01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Device 67df (rev c7)
I'm also confused by this "gfx803". I use driver from ROCm.
$ uname -r
4.6.0-kfd-compute-rocm-rel-1.4-16
$ /opt/rocm/opencl/bin/x86_64/clinfo
Number of platforms: 1
Platform Profile: FULL_PROFILE
Platform Version: OpenCL 2.0 AMD-APP (2300.5)
Platform Name: AMD Accelerated Parallel Processing
Platform Vendor: Advanced Micro Devices, Inc.
Platform Extensions: cl_khr_icd cl_amd_event_callback cl_amd_offline_devices
Platform Name: AMD Accelerated Parallel Processing
Number of devices: 1
Device Type: CL_DEVICE_TYPE_GPU
Vendor ID: 1002h
Board name: Device 67df
Device Topology: PCI[ B#1, D#0, F#0 ]
Max compute units: 36
...
Platform ID: 0x7f32f2e9f198
Name: gfx803
Vendor: Advanced Micro Devices, Inc.
Device OpenCL C version: OpenCL C 2.0
Driver version: 1.1 (HSA,LC)
Profile: FULL_PROFILE
Version: OpenCL 1.2
...
from triton.
Glad that it solved your problem.
So gfx803 is actually Polaris? The performance is very poor (1.6TFLOPS vs 5.0TFLOPS peak). Is it an R480m or the full desktop version? I get 70-75% of the peak on R9 Fury so that's a little odd.
from triton.
Related Issues (20)
- How do I compile a Triton program? HOT 8
- Compile error with fp8 block pointer usage
- Jitting Error: can't pass bfloat16 as a tl.dype to my kernel! HOT 1
- Segmentation fault when DataLoader processes are launched after compiling Triton kernels HOT 6
- error when call tl.sort for data of torch.bfloat16 HOT 2
- Segfault on mixed mm HOT 4
- How to get the type of data from its ptr? HOT 1
- @LyricZhao Can you add a tutorial for the tile pointer? I think that could be helpful. HOT 1
- [Problem][Porposal] How does `make_block_ptr` work HOT 2
- FP8 conversion on SM89 (Ada Lovelace) fails with outdated PTX version HOT 4
- Initializing runtime driver overwrites root logger's verbosity level
- About Int8 Matrix Multiplication
- Why not allow JITFunction as parameter to another JITFunction(high-order jit function)?
- Add support for triangular solve operation.
- Inconsistency between constants as arguments and captured globals HOT 3
- The relationship between operator running time and number of runs HOT 2
- How to figure out layout/order for loads via explicit Tma? HOT 10
- Support for tl shift operator on tensor
- tl.clamp bug : AttributeError("module 'triton.language' has no attribute 'clamp'") HOT 4
- fp8 tensor core support on h100 HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from triton.