Comments (17)
I've also noticed that HSAKMT environment variables don't work with AMDGPU.jl. We don't do any stderr capture to my knowledge. Do note that those variables apply to HCC, HIP, and MIOpen, none of which we use in any significant capacity (except for HIP, for device sync, which is not done automatically).
from amdgpu.jl.
Running on master still has failing tests, but way fewer:
Test Summary: | Pass Error Broken Total
AMDGPU | 1040 2 79 1121
Core | 1 1
HSA | 16 16
Codegen | 3 3
Device Functions | 179 75 254
ROCArray | 744 2 3 749
GPUArrays test suite | 744 2 746
math | 8 8
indexing scalar | 249 249
input output | 5 5
value constructors | 36 36
indexing multidimensional | 32 2 34
sliced setindex | 1 1
sliced setindex | 1 1
sliced setindex | 1 1
sliced setindex | 1 1
sliced setindex | 1 1
sliced setindex | 1 1
sliced setindex, CPU source | 1 1
sliced setindex, CPU source | 1 1
sliced setindex, CPU source | 1 1
sliced setindex, CPU source | 1 1
sliced setindex, CPU source | 1 1
sliced setindex, CPU source | 1 1
empty array | 15 15
GPU source | 2 1 3
CPU source | 2 1 3
JuliaGPU/CUDA.jl#461: sliced setindex | 1 1
interface | 7 7
conversions | 72 72
constructors | 335 335
ROCm External Libraries | 3 3
External Packages | 97 97
ERROR: LoadError: Some tests did not pass: 1040 passed, 0 failed, 2 errored, 79 broken.
in expression starting at /home/stefan/.julia/packages/AMDGPU/TAdgr/test/runtests.jl:27
ERROR: Package AMDGPU errored during testing
The matrix multiplication still crashes
julia> using AMDGPU; using LinearAlgebra
julia> N = 100; m = rand(Float64, N, N); a = rand(Float64, N); b = rand(Float64, N); m_g = ROCArray(m); a_g = ROCArray(a); b_g = ROCArray(b);
julia> mul!(b_g, m_g, a_g)
'+fp64-fp16-denormals' is not a recognized feature for this target (ignoring feature)
'-fp32-denormals' is not a recognized feature for this target (ignoring feature)
'+fp64-fp16-denormals' is not a recognized feature for this target (ignoring feature)
'-fp32-denormals' is not a recognized feature for this target (ignoring feature)
'+fp64-fp16-denormals' is not a recognized feature for this target (ignoring feature)
'-fp32-denormals' is not a recognized feature for this target (ignoring feature)
'+fp64-fp16-denormals' is not a recognized feature for this target (ignoring feature)
'-fp32-denormals' is not a recognized feature for this target (ignoring feature)
Memory access fault by GPU node-1 (Agent handle: 0x2177160) on address 0x640000. Reason: Page not present or supervisor privilege.
signal (6): Aborted
in expression starting at REPL[3]:1
Allocations: 31842465 (Pool: 31831316; Big: 11149); GC: 37
fish: “~/localcompiles/julia-1.6.0-bet…” terminated by signal SIGABRT (Abort)
Here is the manifest:
pkg> st --manifest
Status `~/Documents/ScratchSpace/julia_gpu/Manifest.toml`
[21141c5a] AMDGPU v0.2.2 `https://github.com/JuliaGPU/AMDGPU.jl.git#master`
[621f4979] AbstractFFTs v0.5.0
[79e6a3ab] Adapt v3.1.1
[b99e7846] BinaryProvider v0.5.10
[fa961155] CEnum v0.4.1
[34da2185] Compat v3.25.0
[187b0558] ConstructionBase v1.0.0
[864edb3b] DataStructures v0.18.9
[0c68f7d7] GPUArrays v6.2.0
[61eb1bfa] GPUCompiler v0.9.2
[929cbde3] LLVM v3.6.0
[1914dd2f] MacroTools v0.5.6
[bac558e1] OrderedCollections v1.3.3
[ae029012] Requires v1.1.2
[6c6a2e73] Scratch v1.0.3
[efcf1570] Setfield v0.7.0
[a759f4b9] TimerOutputs v0.5.7
[0dad84c5] ArgTools
[56f22d72] Artifacts
[2a0f44e3] Base64
[ade2ca70] Dates
[8bb1440f] DelimitedFiles
[8ba89e20] Distributed
[f43a241f] Downloads
[9fa8497b] Future
[b77e0a4c] InteractiveUtils
[b27032c2] LibCURL
[76f85450] LibGit2
[8f399da3] Libdl
[37e2e46d] LinearAlgebra
[56ddb016] Logging
[d6f4376e] Markdown
[a63ad114] Mmap
[ca575930] NetworkOptions
[44cfe95a] Pkg
[de0858da] Printf
[3fa0cd96] REPL
[9a3f8284] Random
[ea8e919c] SHA
[9e88b42a] Serialization
[1a1011a3] SharedArrays
[6462fe0b] Sockets
[2f01184e] SparseArrays
[10745b16] Statistics
[fa267f1f] TOML
[a4e569a6] Tar
[8dfed614] Test
[cf7118a7] UUIDs
[4ec0a83e] Unicode
[deac9b47] LibCURL_jll
[29816b5a] LibSSH2_jll
[c8ffd9c3] MbedTLS_jll
[14a3606d] MozillaCACerts_jll
[83775a58] Zlib_jll
[8e850ede] nghttp2_jll
from amdgpu.jl.
Seems like it might be a crash in rocBLAS, but I'm not sure since I don't regularly run AMDGPU with it enabled (because it sucks to build). Do you have rocBLAS installed?
from amdgpu.jl.
I do not think so. I checked with apt-get
and rocblas
was not installed. Then, just to check, I also ran sudo apt-get install rocblas
which reported successful (and brand new) install. However, the problem persists even after installing rocblas
, so I think it is something independent from it.
from amdgpu.jl.
I checked a couple of times with and without rocblas (by running sudo apt-get install/purge rocblas
and then running ] build AMDGPU
), but the crash in the matrix multiplication persists.
from amdgpu.jl.
I attempted various debug and serialization flags, as suggested in ROCm/tensorflow-upstream#302 and in https://rocmdocs.amd.com/en/latest/Other_Solutions/Other-Solutions.html , but I did not get any debug info out to stderr
!? Is AMDGPU.jl capturing and redirecting stderr
? Any other suggestions to try to track what exactly causes the memory fault?
Here is my attempt with the entirety of its console output:
$> export HCC_SERIALIZE_KERNEL=0x3; export HCC_SERIALIZE_COPY=0x3;
$> export HIP_TRACE_API=0x2; export MIOPEN_ENABLE_LOGGING_CMD=1;
$> ~/localcompiles/julia-1.6.0-beta1/bin/julia --project=.
_
_ _ _(_)_ | Documentation: https://docs.julialang.org
(_) | (_) (_) |
_ _ _| |_ __ _ | Type "?" for help, "]?" for Pkg help.
| | | | | | |/ _` | |
| | |_| | | | (_| | | Version 1.6.0-beta1 (2021-01-08)
_/ |\__'_|_|_|\__'_| | Official https://julialang.org/ release
|__/ |
julia> using AMDGPU; using LinearAlgebra
julia> N = 100; m = rand(Float64, N, N); a = rand(Float64, N); b = rand(Float64, N); m_g = ROCArray(m); a_g = ROCArray(a); b_g = ROCArray(b);
julia> mul!(b_g, m_g, a_g)
'+fp64-fp16-denormals' is not a recognized feature for this target (ignoring feature)
'-fp32-denormals' is not a recognized feature for this target (ignoring feature)
'+fp64-fp16-denormals' is not a recognized feature for this target (ignoring feature)
'-fp32-denormals' is not a recognized feature for this target (ignoring feature)
'+fp64-fp16-denormals' is not a recognized feature for this target (ignoring feature)
'-fp32-denormals' is not a recognized feature for this target (ignoring feature)
'+fp64-fp16-denormals' is not a recognized feature for this target (ignoring feature)
'-fp32-denormals' is not a recognized feature for this target (ignoring feature)
Memory access fault by GPU node-1 (Agent handle: 0x13812e0) on address 0x640000. Reason: Page not present or supervisor privilege.
from amdgpu.jl.
All of this was on rocm 4. I tried also installing tensorflow-rocm
, but that had the additional requirements of installing apt install rocm-libs rccl
. Tensorflow seemed to work fine, but after adding these extra libraries AMDGPU.jl
stopped building!? ] build AMDGPU
started reporting this error ROCm/ROCm#1269
I ended downgrading to rocm 3.5.1. Now AMDGPU.jl
seems to work. Tensforflow 2.4 does not work anymore, but I can downgrade tensorflow too.
There are test failures for the current release of AMDGPU
:
Test Summary: | Pass Fail Error Broken Total
AMDGPU | 1198 12 15 90 1315
Core | 1 1
HSA | 16 6 22
HSA Status Error | 1 1
Agent | 5 5
Memory | 10 6 16
Pointer-based | 3 3
Array-based | 2 2
Type-based | 1 1
Pointer information | 1 1
Page-locked memory (OS allocations) | 5 5
Exceptions | 3 3
Mutable structs | 1 1
Codegen | 3 3
Device Functions | 175 77 252
ROCArray | 1003 12 9 12 1036
GPUArrays test suite | 737 9 746
math | 8 8
indexing scalar | 249 249
input output | 5 5
value constructors | 36 36
indexing multidimensional | 25 9 34
sliced setindex | 1 1
sliced setindex | 1 1
sliced setindex | 1 1
sliced setindex | 1 1
sliced setindex | 1 1
sliced setindex | 1 1
sliced setindex, CPU source | 1 1
sliced setindex, CPU source | 1 1
sliced setindex, CPU source | 1 1
sliced setindex, CPU source | 1 1
sliced setindex, CPU source | 1 1
sliced setindex, CPU source | 1 1
empty array | 8 7 15
1D | 1 1 2
2D with other index Colon() | 2 2 4
2D with other index 1:5 | 2 2 4
2D with other index 5 | 2 2 4
GPU source | 2 1 3
CPU source | 2 1 3
JuliaGPU/CUDA.jl#461: sliced setindex | 1 1
interface | 7 7
conversions | 72 72
constructors | 335 335
ROCm External Libraries | 266 12 12 290
BLAS | 17 17
FFT | 106 12 12 130
T = ComplexF64 | 33 4 37
1D | 3 3
1D inplace | 2 2
2D | 3 3
2D inplace | 2 2
Batch 1D | 6 6
3D | 3 3
3D inplace | 2 2
Batch 2D (in 3D) | 5 2 7
Batch 2D (in 4D) | 7 2 9
T = ComplexF32 | 33 4 37
1D | 3 3
1D inplace | 2 2
2D | 3 3
2D inplace | 2 2
Batch 1D | 6 6
3D | 3 3
3D inplace | 2 2
Batch 2D (in 3D) | 5 2 7
Batch 2D (in 4D) | 7 2 9
T = Float32 | 20 2 6 28
1D | 4 4
2D | 4 4
Batch 1D | 4 2 6
3D | 4 4
Batch 2D (in 3D) | 1 3 4
Batch 2D (in 4D) | 3 3 6
T = Float64 | 20 2 6 28
1D | 4 4
2D | 4 4
Batch 1D | 4 2 6
3D | 4 4
Batch 2D (in 3D) | 1 3 4
Batch 2D (in 4D) | 3 3 6
rand | 143 143
ERROR: LoadError: Some tests did not pass: 1198 passed, 12 failed, 15 errored, 90 broken.
in expression starting at /home/stefan/.julia/packages/AMDGPU/UpYiP/test/runtests.jl:29
ERROR: Package AMDGPU errored during testing
And here are the tests on the current master branch, doing a bit better, but still having errors:
Test Summary: | Pass Fail Error Broken Total
AMDGPU | 1306 12 2 88 1408
Core | 1 1
HSA | 16 16
Codegen | 3 3
Device Functions | 179 75 254
ROCArray | 1010 12 2 12 1036
GPUArrays test suite | 744 2 746
math | 8 8
indexing scalar | 249 249
input output | 5 5
value constructors | 36 36
indexing multidimensional | 32 2 34
sliced setindex | 1 1
sliced setindex | 1 1
sliced setindex | 1 1
sliced setindex | 1 1
sliced setindex | 1 1
sliced setindex | 1 1
sliced setindex, CPU source | 1 1
sliced setindex, CPU source | 1 1
sliced setindex, CPU source | 1 1
sliced setindex, CPU source | 1 1
sliced setindex, CPU source | 1 1
sliced setindex, CPU source | 1 1
empty array | 15 15
GPU source | 2 1 3
CPU source | 2 1 3
JuliaGPU/CUDA.jl#461: sliced setindex | 1 1
interface | 7 7
conversions | 72 72
constructors | 335 335
ROCm External Libraries | 266 12 12 290
BLAS | 17 17
FFT | 106 12 12 130
T = ComplexF64 | 33 4 37
1D | 3 3
1D inplace | 2 2
2D | 3 3
2D inplace | 2 2
Batch 1D | 6 6
3D | 3 3
3D inplace | 2 2
Batch 2D (in 3D) | 5 2 7
Batch 2D (in 4D) | 7 2 9
T = ComplexF32 | 33 4 37
1D | 3 3
1D inplace | 2 2
2D | 3 3
2D inplace | 2 2
Batch 1D | 6 6
3D | 3 3
3D inplace | 2 2
Batch 2D (in 3D) | 5 2 7
Batch 2D (in 4D) | 7 2 9
T = Float32 | 20 2 6 28
1D | 4 4
2D | 4 4
Batch 1D | 4 2 6
3D | 4 4
Batch 2D (in 3D) | 1 3 4
Batch 2D (in 4D) | 3 3 6
T = Float64 | 20 2 6 28
1D | 4 4
2D | 4 4
Batch 1D | 4 2 6
3D | 4 4
Batch 2D (in 3D) | 1 3 4
Batch 2D (in 4D) | 3 3 6
rand | 143 143
External Packages | 97 97
ERROR: LoadError: Some tests did not pass: 1306 passed, 12 failed, 2 errored, 88 broken.
in expression starting at /home/stefan/.julia/packages/AMDGPU/AKLQk/test/runtests.jl:27
ERROR: Package AMDGPU errored during testing
from amdgpu.jl.
Am I correct in assuming that if I want to use 580 with AMDGPU.jl, I have to freeze rocm to version 3.5.1 and just hope for "best effort", without any guarantees given that the device seems to be going out of support in rocm?
Should I freeze the AMDGPU.jl version too? Should I expect future versions of AMDGPU.jl to lower the level of support for 580?
Is there a more "official" table of support, giving hardware versions, rocm versions, and AMDGPU.jl versions that are tested/supported?
from amdgpu.jl.
Sigh... now there is a separate problem (on rocm 3.5.1 and AMDGPU#master) that simply gives wrong answers (no crash, just incorrect answers) when I do matrix multiplication:
julia> N = 10; T = Float64; a,b,c = cpus = [rand(T, N, N) for i in 1:3]; ag,bg,cg = [ROCArray(i) for i in cpus];
julia> mul!(ag,bg,cg)
10×10 ROCMatrix{Float64}:
0.169517 0.666133 0.787853 0.952216 0.52438 0.226889 0.895567 0.563802 0.603744 0.0810141
0.774994 0.0350809 0.705357 0.544661 0.775764 0.966118 0.965179 0.351198 0.25837 0.0632102
0.947915 0.0939128 0.711592 0.964582 0.484883 0.503159 0.618847 0.199 0.598743 0.913767
0.166383 0.24303 0.0343327 0.954652 0.952374 0.911542 0.216517 0.144033 0.601291 0.205171
0.349153 0.223039 0.129581 0.442686 0.766986 0.551424 0.292206 0.0795419 0.43372 0.655484
0.173297 0.241994 0.915943 0.191715 0.202254 0.305148 0.221799 0.78068 0.75416 0.900042
0.137884 0.25165 0.342389 0.159862 0.355102 0.836764 0.989629 0.935794 0.526686 0.762097
0.116692 0.244034 0.724202 0.794337 0.168172 0.497086 0.937436 0.592061 0.813417 0.351207
0.33148 0.346618 0.96186 0.436207 0.430171 0.623167 0.823441 0.63495 0.477421 0.497221
0.411855 0.231901 0.578217 0.623853 0.0970518 0.633137 0.945868 0.616912 0.731479 0.731409
julia> mul!(a,b,c)
10×10 Matrix{Float64}:
2.87753 2.07307 3.1106 3.34475 2.74262 3.61348 3.07164 2.82941 2.95761 1.97778
3.03134 1.83036 3.24821 3.72434 3.07734 3.92818 3.27126 3.92044 3.94197 2.58042
2.89656 2.19109 2.64611 3.02358 2.89144 3.69149 2.87703 3.7068 3.77624 2.60822
1.46729 1.08032 1.38129 1.41364 1.54596 1.69974 1.36592 1.88397 1.82102 0.655919
1.90435 1.22412 1.73246 1.82557 1.94339 2.39507 1.9207 1.88028 2.36406 1.84471
1.96599 1.63776 2.00905 2.08006 1.7166 2.25551 1.6797 1.89926 1.67244 1.063
2.09604 1.65736 1.91219 2.21415 1.86685 2.60482 2.10144 2.70042 2.52462 1.43959
2.56943 1.31602 1.94323 2.61937 3.13482 2.81117 2.08695 2.95018 2.91306 1.97668
2.74569 2.02152 3.04165 3.21203 2.86864 3.48828 2.51794 2.95315 2.98953 2.64708
3.01357 2.14793 2.52376 2.93145 3.03869 3.55187 3.10702 3.39474 3.41577 2.14719
If you guys have any suggestions where to look for the source of these issues (or whether I should downgrade/upgrade to other versions), let me know. Either way, thanks for your effort in putting this library together!
Some community-sourced table of "this hardware ran successfully for me" would be really useful.
from amdgpu.jl.
I tested this on my Vega system, and I also get a memory access fault. I'll run this under my newly-working debugger in the next day or two.
Btw, our CI was running on an RX480 for the longest time, but I had to remove the card because HIP started killing the build process due to not being able to find code for the GPU (stupid problem, I should reproduce it and patch it upstream). I'll probably put the RX480 in another machine and add it to the CI queue so that we ensure that we still have working support.
from amdgpu.jl.
Is there a way to donate to the CI effort? (money or compute time, especially if I can get my 580 to do CI for you; I am competent enough sysadmin to run a docker on this computer that is accessible to your CI jobs). It is in my selfish interest to get 580 with configuration similar to mine (ubuntu with same drivers and rocm version) ;)
By the way, as a new users I was definitely very confused by what rocm version I should be using. What version of rocm is used by the CI?
from amdgpu.jl.
We currently use Buildkite to host CI, which runs under docker-compose, so it's pretty nicely isolated. I'll talk to the JuliaGPU devs and see what they think.
Also, the ROCm config is not fixed to a particular version, which is something I would like to fix by providing ROCm libraries as JLLs, but that's complicated by such a config not working on my musl system 😄 It's on the roadmap, though.
from amdgpu.jl.
While I wait for a response on the CI question, I found that the issue does not turn into a regular device error when running with -g2 --check-bounds=yes
on AMDGPU master (-g2
is for outputting a full device stacktrace on error), which indicates to me that this is either a miscompile, or a bug somewhere where unsafe_load
/unsafe_store
is being called manually (since array accesses are bounds-checked).
from amdgpu.jl.
Regarding CI: because adding buildkite agents requires sharing our global secret key with the agent's owner, we can't reasonably accept outside CI. However, I plan to setup an RX480 runner and ensure that we run it for all PRs, to ensure older cards still work as much as possible. We'll also be potentially getting access to a lot of newer (but still Vega arch) AMD GPUs soon, so hopefully we can use some of them for CI.
from amdgpu.jl.
In terms of donations from the community, I would appreciate any bug reports, code contributions, or ideas for improvements you and others might have. That's more valuable to me than CI by a long shot 🙂
from amdgpu.jl.
Sounds great! If this starts working I would certainly be active giving feedback. I do have a bunch of projects that would use bitwise operations on integer types, so hopefully I will be able to stress-test that side of the project.
from amdgpu.jl.
I'm closing this in favor of #103, since the failing tests you reported are known to fail (see #91), or just generally unreliable (in my experience).
from amdgpu.jl.
Related Issues (20)
- AMDGPU fails test and crashes when initialized HOT 6
- Update rocSPARSE to ROCm 6.0 HOT 4
- Trying to import AMDGPU fails with (an lvm?) error HOT 7
- `reduce(f, A)` fails to compile if Julia is started with `--check-bounds=no`
- ERROR: HIPError(code hipErrorOutOfMemory, out of memory) HOT 7
- rocSPARSE tests hang Navi 3
- Error triggered by synchronize() HOT 3
- Multithreading code hangs HOT 3
- 2-`norm` for views of ROCArray falls back to scalar indexing
- Assertion failure due to illegal implicit addrspacecast
- Explore removing `always_inline` compiler option
- Rework hard memory limit with `maxSize` option HOT 1
- macOS GitHub Actions CI stuck on adding AMDGPU.jl HOT 2
- i get error when i call AMDGPU.rocBLAS.gemm_strided_batched!
- Failures when using ROCM builds that have particular type of debug info in them (both in JLL-mixed-mode and in system-ROCM mode), e.g. on Arch Linux HOT 1
- MI300X (gfx942) support for broadcast operations HOT 9
- error: Opaque pointers are only supported in -opaque-pointers mode HOT 11
- Adopt maybe collect garbage collection scheme similar to CUDA.jl
- GPUCompiler.CodeCache not defined HOT 3
- Wrap hipTensor
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from amdgpu.jl.