exanauts / cudss.jl Goto Github PK
View Code? Open in Web Editor NEWLicense: MIT License
License: MIT License
Is there any reason this isn't allowed?
I should also add an example where the ordering is computed outside CUDSS
(like AMD or SCOTCH).
using CUDA, CUDA.CUSPARSE
using CUDSS
using LinearAlgebra
using SparseArrays
using Test
println(CUDSS.CUDSS_INSTALLATION)
function cudss_generic()
n = 100
p = 5
@testset "precision = $T" for T in (Float32, Float64, ComplexF32, ComplexF64)
R = real(T)
@testset "Unsymmetric -- Non-Hermitian" begin
A_cpu = sprand(T, n, n, 0.02) + I
b_cpu = rand(T, n)
A_gpu = CuSparseMatrixCSR(A_cpu)
b_gpu = CuVector(b_cpu)
@testset "ldiv!" begin
x_cpu = zeros(T, n)
x_gpu = CuVector(x_cpu)
solver = lu(A_gpu)
ldiv!(x_gpu, solver, b_gpu)
r_gpu = b_gpu - A_gpu * x_gpu
@test norm(r_gpu) ≤ √eps(R)
A_gpu2 = rand(T) * A_gpu
lu!(solver, A_gpu2)
x_gpu .= b_gpu
ldiv!(solver, x_gpu)
r_gpu2 = b_gpu - A_gpu2 * x_gpu
@test norm(r_gpu2) ≤ √eps(R)
end
@testset "\\" begin
solver = lu(A_gpu)
x_gpu = solver \ b_gpu
r_gpu = b_gpu - A_gpu * x_gpu
@test norm(r_gpu) ≤ √eps(R)
A_gpu2 = rand(T) * A_gpu
lu!(solver, A_gpu2)
x_gpu = solver \ b_gpu
r_gpu2 = b_gpu - A_gpu2 * x_gpu
@test norm(r_gpu2) ≤ √eps(R)
end
end
@testset "Symmetric -- Hermitian" begin
@testset "view = $view" for view in ('F', 'L', 'U')
A_cpu = sprand(T, n, n, 0.01) + I
A_cpu = A_cpu + A_cpu'
B_cpu = rand(T, n, p)
(view == 'L') && (A_gpu = CuSparseMatrixCSR(A_cpu |> tril))
(view == 'U') && (A_gpu = CuSparseMatrixCSR(A_cpu |> triu))
(view == 'F') && (A_gpu = CuSparseMatrixCSR(A_cpu))
B_gpu = CuMatrix(B_cpu)
@testset "ldiv!" begin
X_cpu = zeros(T, n, p)
X_gpu = CuMatrix(X_cpu)
solver = ldlt(A_gpu; view)
ldiv!(X_gpu, solver, B_gpu)
R_gpu = B_gpu - A_gpu * X_gpu
@test norm(R_gpu) ≤ √eps(R)
c = rand(R)
A_cpu2 = c * A_cpu
A_gpu2 = c * A_gpu
ldlt!(solver, A_gpu2)
X_gpu .= B_gpu
ldiv!(solver, X_gpu)
R_gpu2 = B_gpu - CuSparseMatrixCSR(A_cpu2) * X_gpu
@test norm(R_gpu2) ≤ √eps(R)
end
@testset "\\" begin
solver = ldlt(A_gpu; view)
X_gpu = solver \ B_gpu
R_gpu = B_gpu - A_gpu * X_gpu
@test norm(R_gpu) ≤ √eps(R)
c = rand(R)
A_cpu2 = c * A_cpu
A_gpu2 = c * A_gpu
ldlt!(solver, A_gpu2)
X_gpu = solver \ B_gpu
R_gpu2 = B_gpu - CuSparseMatrixCSR(A_cpu2) * X_gpu
@test norm(R_gpu2) ≤ √eps(R)
end
end
end
@testset "SPD -- HPD" begin
@testset "view = $view" for view in ('F', 'L', 'U')
A_cpu = sprand(T, n, n, 0.01)
A_cpu = A_cpu * A_cpu' + I
B_cpu = rand(T, n, p)
(view == 'L') && (A_gpu = CuSparseMatrixCSR(A_cpu |> tril))
(view == 'U') && (A_gpu = CuSparseMatrixCSR(A_cpu |> triu))
(view == 'F') && (A_gpu = CuSparseMatrixCSR(A_cpu))
B_gpu = CuMatrix(B_cpu)
@testset "ldiv!" begin
X_cpu = zeros(T, n, p)
X_gpu = CuMatrix(X_cpu)
solver = cholesky(A_gpu; view)
ldiv!(X_gpu, solver, B_gpu)
R_gpu = B_gpu - A_gpu * X_gpu
@test norm(R_gpu) ≤ √eps(R)
c = rand(R)
A_cpu2 = c * A_cpu
A_gpu2 = c * A_gpu
cholesky!(solver, A_gpu2)
X_gpu .= B_gpu
ldiv!(solver, X_gpu)
R_gpu2 = B_gpu - CuSparseMatrixCSR(A_cpu2) * X_gpu
@test norm(R_gpu2) ≤ √eps(R)
end
@testset "\\" begin
solver = cholesky(A_gpu; view)
X_gpu = solver \ B_gpu
R_gpu = B_gpu - A_gpu * X_gpu
@test norm(R_gpu) ≤ √eps(R)
c = rand(R)
A_cpu2 = c * A_cpu
A_gpu2 = c * A_gpu
cholesky!(solver, A_gpu2)
X_gpu = solver \ B_gpu
R_gpu2 = B_gpu - CuSparseMatrixCSR(A_cpu2) * X_gpu
@test norm(R_gpu2) ≤ √eps(R)
end
end
end
end
end
cudss_generic()
We need to setup things with Moonshot
.
This issue is used to trigger TagBot; feel free to unsubscribe.
If you haven't already, you should update your TagBot.yml
to include issue comment triggers.
Please see this post on Discourse for instructions and more details.
If you'd like for me to do this for you, comment TagBot fix
on this issue.
I'll open a PR within a few hours, please be patient!
There are a few issues with the package given its separation from CUDA.jl:
lu
doesn't "just work", when this fixes it to work more like a regular sparse array. I would think users would be better served if there wasn't a special package to add on the side that fixed missing linear algebra functions.So, since this is just the sparse factorization of a sparse matrix defined in CUDA.jl, it's very odd that this one function is not in CUDA.jl.
CC @maleadt
I found cudss can't reproduce the same result:
using CUDA, CUDA.CUSPARSE
using CUDSS
using SparseArrays, LinearAlgebra
using Random
Random.seed!(666)
n = 1000
T = Float64
A_cpu = sprand(T, n, n, 0.05) + 0.01*I
x_cpu = zeros(T, n)
b_cpu = rand(T, n)
A_gpu = CuSparseMatrixCSR(A_cpu)
x_gpu = CuVector(x_cpu)
b_gpu = CuVector(b_cpu)
solver = CudssSolver(A_gpu, "G", 'F')
cudss("analysis", solver, x_gpu, b_gpu)
cudss("factorization", solver, x_gpu, b_gpu)
cudss("solve", solver, x_gpu, b_gpu)
x1 = deepcopy(x_gpu)
ldiv!(x_gpu,solver,b_gpu)
x2 = deepcopy(x_gpu)
println("error is ", norm(x1-x2))
It just want to solve the same linear system twice but the values of x_gpu
are different, i.e. norm(x1-x2)
is nonzero.
Is it a bug or the internal property of the method underlying cudss?
I have the following error message when loading CUDSS.jl with CUDA.jl 5.4:
ERROR: LoadError: MethodError: no method matching CUDA.APIUtils.HandleCache{CUDA.CuContext, Ptr{CUDSS.cudssContext}}()
Closest candidates are:
CUDA.APIUtils.HandleCache{K, V}(::Any, ::Any; max_entries) where {K, V}
@ CUDA ~/.julia/packages/CUDA/75aiI/lib/utils/cache.jl:17
Stacktrace:
[1] top-level scope
@ ~/.julia/packages/CUDSS/ACgsr/src/management.jl:20
[2] include(mod::Module, _path::String)
@ Base ./Base.jl:495
[3] include(x::String)
@ CUDSS ~/.julia/packages/CUDSS/ACgsr/src/CUDSS.jl:1
[4] top-level scope
@ ~/.julia/packages/CUDSS/ACgsr/src/CUDSS.jl:24
It looks like the culprit is:
https://github.com/exanauts/CUDSS.jl/blob/main/src/management.jl#L20
JuliaGPU/CUDA.jl#2262
We should replace the @ccall
macro with the @gcsafe_ccall
macro.
I should also update the function check
like Tim did in this commit:
JuliaGPU/CUDA.jl@41dcc90
using CUDA, CUDA.CUSPARSE
using CUDSS
using SparseArrays, LinearAlgebra
using Random
Random.seed!(666)
function example1(T::DataType, n::Int)
A_cpu = sprand(T, n, n, 1.0) + I
x_cpu = zeros(T, n)
b_cpu = rand(T, n)
A_gpu = CuSparseMatrixCSR(A_cpu)
x_gpu = CuVector(x_cpu)
b_gpu = CuVector(b_cpu)
solver = CudssSolver(A_gpu, "G", 'F')
cudss("analysis", solver, x_gpu, b_gpu)
cudss("factorization", solver, x_gpu, b_gpu)
cudss("solve", solver, x_gpu, b_gpu)
r_gpu = b_gpu - A_gpu * x_gpu
norm(r_gpu)
end
function example2(T::DataType, n::Int)
A_cpu = sprand(T, n, n, 1.0) + I
A_cpu = A_cpu + A_cpu' + I
x_cpu = zeros(T, n)
b_cpu = rand(T, n)
A_gpu = CuSparseMatrixCSR(A_cpu |> tril)
x_gpu = CuVector(x_cpu)
b_gpu = CuVector(b_cpu)
structure = T <: Real ? "S" : "H"
solver = CudssSolver(A_gpu, structure, 'L')
cudss("analysis", solver, x_gpu, b_gpu)
cudss("factorization", solver, x_gpu, b_gpu)
cudss("solve", solver, x_gpu, b_gpu)
r_gpu = b_gpu - CuSparseMatrixCSR(A_cpu) * x_gpu
norm(r_gpu)
end
function example3(T::DataType, n::Int)
A_cpu = sprand(T, n, n, 1.0)
A_cpu = A_cpu * A_cpu' + I
x_cpu = zeros(T, n)
b_cpu = rand(T, n)
A_gpu = CuSparseMatrixCSR(A_cpu |> triu)
x_gpu = CuVector(x_cpu)
b_gpu = CuVector(b_cpu)
structure = T <: Real ? "SPD" : "HPD"
solver = CudssSolver(A_gpu, structure, 'U')
cudss("analysis", solver, x_gpu, b_gpu)
cudss("factorization", solver, x_gpu, b_gpu)
cudss("solve", solver, x_gpu, b_gpu)
r_gpu = b_gpu - CuSparseMatrixCSR(A_cpu) * x_gpu
norm(r_gpu)
end
for n in (10, 5, 2, 1)
println("Size of the linear systems: $n")
for T in (Float32, Float64, ComplexF32, ComplexF64)
println("Precision: $T")
r1 = example1(T, n)
println("Residual norm for example1: $r1")
r2 = example2(T, n)
println("Residual norm for example2: $r2")
r3 = example3(T, n)
println("Residual norm for example3: $r3")
println()
end
end
ERROR: CUDSSError: an internal operation failed (code 7, CUDSS_STATUS_INTERNAL_ERROR)
using Test
using CUDA, CUDA.CUSPARSE
using CUDSS
using SparseArrays
using LinearAlgebra
using Random
Random.seed!(1)
n = 100
p = 5
T = ComplexF64
R = real(T)
A_cpu = sprand(T, n, n, 0.01) + I
A_cpu = A_cpu + A_cpu'
X_cpu = zeros(T, n, p)
B_cpu = rand(T, n, p)
A_gpu = CuSparseMatrixCSR(A_cpu)
X_gpu = CuMatrix(X_cpu)
B_gpu = CuMatrix(B_cpu)
matrix = CudssMatrix(A_gpu, "H", 'F', index='O')
config = CudssConfig()
data = CudssData()
solver = CudssSolver(matrix, config, data)
cudss("analysis", solver, X_gpu, B_gpu)
cudss("factorization", solver, X_gpu, B_gpu)
nnz_decomposition = cudss_get(solver, "lu_nnz")
inertia = cudss_get(solver, "inertia")
cudss("solve", solver, X_gpu, B_gpu)
R_gpu = B_gpu - CuSparseMatrixCSR(A_cpu) * X_gpu
@test norm(R_gpu) ≤ √eps(R)
Y_cpu = zeros(T, n, n)
Y_gpu = CuMatrix(Y_cpu)
C_cpu = Matrix{T}(I, n, n)
C_gpu = CuMatrix(C_cpu)
cudss("solve", solver, Y_gpu, C_gpu)
RR_gpu = C_gpu - CuSparseMatrixCSR(A_cpu) * Y_gpu
@test norm(RR_gpu) ≤ √eps(R)
using MatrixMarket
MatrixMarket.mmwrite("cudss_A.mtx", A_gpu |> SparseMatrixCSC)
MatrixMarket.mmwrite("cudss_B.mtx", B_gpu |> Matrix |> sparse)
MatrixMarket.mmwrite("cudss_X.mtx", X_gpu |> Matrix |> sparse)
MatrixMarket.mmwrite("cudss_Y.mtx", Y_gpu |> Matrix |> sparse)
Hi,
I tried to change (extract) the settings of a cudss solver other than the default one. It seems to be related to functions "cudss_set" and "cudss_get". However, Julia would crash once I try to change any parameters via "cudss_set". Is it currently an issue with the Julia wrapper?
We must call an analysis
before that the CudssData
can be destroyed.
using CUDSS
data = CudssData()
CUDSS.cudssDataDestroy(data.handle, data)
using CUDA, CUDA.CUSPARSE
using CUDSS
using SparseArrays, LinearAlgebra
using Random
Random.seed!(666)
function example1(T::DataType, n::Int, ir::Int)
A_cpu = sprand(T, n, n, 0.05) + I
x_cpu = zeros(T, n)
b_cpu = rand(T, n)
A_gpu = CuSparseMatrixCSR(A_cpu)
x_gpu = CuVector(x_cpu)
b_gpu = CuVector(b_cpu)
solver = CudssSolver(A_gpu, "G", 'F')
cudss_set(solver, "ir_n_steps", ir)
cudss("analysis", solver, x_gpu, b_gpu)
cudss("factorization", solver, x_gpu, b_gpu)
cudss("solve", solver, x_gpu, b_gpu)
r_gpu = b_gpu - A_gpu * x_gpu
norm(r_gpu)
end
function example2(T::DataType, n::Int, ir::Int)
p = 5
A_cpu = sprand(T, n, n, 0.05) + I
A_cpu = A_cpu + A_cpu'
X_cpu = zeros(T, n, p)
B_cpu = rand(T, n, p)
A_gpu = CuSparseMatrixCSR(A_cpu |> tril)
X_gpu = CuMatrix(X_cpu)
B_gpu = CuMatrix(B_cpu)
structure = T <: Real ? "S" : "H"
solver = CudssSolver(A_gpu, structure, 'L')
cudss_set(solver, "ir_n_steps", ir)
# Required with CUDSS 0.1.0
(T <: Complex) && cudss_set(solver, "pivot_type", 'N')
cudss("analysis", solver, X_gpu, B_gpu)
cudss("factorization", solver, X_gpu, B_gpu)
cudss("solve", solver, X_gpu, B_gpu)
R_gpu = B_gpu - CuSparseMatrixCSR(A_cpu) * X_gpu
norm(R_gpu)
end
function example3(T::DataType, n::Int, ir::Int)
p = 5
A_cpu = sprand(T, n, n, 0.01)
A_cpu = A_cpu * A_cpu' + I
X_cpu = zeros(T, n, p)
B_cpu = rand(T, n, p)
A_gpu = CuSparseMatrixCSR(A_cpu |> triu)
X_gpu = CuMatrix(X_cpu)
B_gpu = CuMatrix(B_cpu)
structure = T <: Real ? "SPD" : "HPD"
solver = CudssSolver(A_gpu, structure, 'U')
cudss_set(solver, "ir_n_steps", ir)
cudss("analysis", solver, X_gpu, B_gpu)
cudss("factorization", solver, X_gpu, B_gpu)
cudss("solve", solver, X_gpu, B_gpu)
R_gpu = B_gpu - CuSparseMatrixCSR(A_cpu) * X_gpu
norm(R_gpu)
end
# Size of the random linear systems
n = 100
for ir in (0,1,2)
println("Number of iterations of iterative refinement: $ir")
for T in (Float32, Float64, ComplexF32, ComplexF64)
println("Precision: $T")
r1 = example1(T, n, ir)
println("Residual norm for example1: $r1")
r2 = example2(T, n, ir)
println("Residual norm for example2: $r2")
r3 = example3(T, n, ir)
println("Residual norm for example3: $r3")
end
println()
end
Number of iterations of iterative refinement: 0
Precision: Float32
Residual norm for example1: 3.692087e-5
Residual norm for example2: 0.0002999944
Residual norm for example3: 8.5040307e-7
Precision: Float64
Residual norm for example1: 3.999599835861191e-13
Residual norm for example2: 1.8113843282606318e-13
Residual norm for example3: 1.438189339051409e-15
Precision: ComplexF32
Residual norm for example1: 7.7133656e-5
Residual norm for example2: 0.0015232839
Residual norm for example3: 1.3667711e-6
Precision: ComplexF64
Residual norm for example1: 5.983645231861647e-14
Residual norm for example2: 3.5220521611201605e-12
Residual norm for example3: 2.6205255057235815e-15
Number of iterations of iterative refinement: 1
Precision: Float32
Residual norm for example1: 689.0252
Residual norm for example2: 35.675114
Residual norm for example3: 19.295319
Precision: Float64
Residual norm for example1: 17.012903059690192
Residual norm for example2: 128.73250852726014
Residual norm for example3: 19.50544156059542
Precision: ComplexF32
Residual norm for example1: 91.10225
Residual norm for example2: 169.85542
Residual norm for example3: 37.091965
Precision: ComplexF64
Residual norm for example1: 66.7120050629793
Residual norm for example2: 178.9033657822368
Residual norm for example3: 33.54613616599457
Number of iterations of iterative refinement: 2
Precision: Float32
Residual norm for example1: 398.21548
Residual norm for example2: 144.65608
Residual norm for example3: 53.373383
Precision: Float64
Residual norm for example1: 351.5959719021192
Residual norm for example2: 785.4054754458682
Residual norm for example3: 52.362171222079056
Precision: ComplexF32
Residual norm for example1: 69.65438
Residual norm for example2: 869.5622
Residual norm for example3: 134.29175
Precision: ComplexF64
Residual norm for example1: 4401.508407855525
Residual norm for example2: 2843.681478289806
Residual norm for example3: 117.71330055583913
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.