Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect result when using CUDSS #52

Open
ovanvincq opened this issue Jul 12, 2024 · 5 comments
Open

Incorrect result when using CUDSS #52

ovanvincq opened this issue Jul 12, 2024 · 5 comments
Labels
bug Something isn't working

Comments

@ovanvincq
Copy link

ovanvincq commented Jul 12, 2024

I tried using CUDSS.jl with a sparse matrix coming from an electromagnetism problem (using the FEM method)

The sparse matrix is stored in a JLD2 file that can be download here: https://github.com/ovanvincq/test_cudss/blob/main/matrix.jld2 https://github.com/ovanvincq/test_cudss/raw/main/matrix.jld2

CUDSS (v0.3.1) fails to solve a linear system with this matrix.

When using the GPU (RTX3060Ti or GV100):

using JLD2, CUDSS, CUDA, CUDA.CUSPARSE

A_cpu=load("matrix.jld2","A")
T=eltype(A_cpu)
n=size(A_cpu,1)
x_cpu = zeros(T, n)
b_cpu = rand(T, n)

A_gpu = CuSparseMatrixCSR(A_cpu)
x_gpu = CuVector(x_cpu)
b_gpu = CuVector(b_cpu)

F_gpu=lu(A_gpu)
x_gpu = F_gpu \ b_gpu
r_gpu = b_gpu - A_gpu * x_gpu
norm(r_gpu)

returns norm(r_gpu)=7E17

When using the CPU:

F=lu(A_cpu)
x_cpu = F \ b_cpu
r_cpu = b_cpu - A_cpu * x_cpu
norm(r_cpu)

returns norm(r_cpu)=9E-8

Am I doing something wrong?

@amontoison
Copy link
Member

amontoison commented Jul 12, 2024

@ovanvincq I am unable to load the matrix when I tried to reproduce the issue.
Can you save and upload the matrix in the MartrixMarket format?

julia> A_cpu = load("./matrix.jld2","A")
Error encountered while load FileIO.File{FileIO.DataFormat{:JLD2}, String}("./matrix.jld2").

Fatal error:
ERROR: JLD2.InvalidDataException("Did not find a Superblock.")
Stacktrace:
  [1] find_superblock(f::JLD2.JLDFile{JLD2.MmapIO})
    @ JLD2 ~/.julia/packages/JLD2/MYcfT/src/superblock.jl:102
  [2] load_file_metadata!(f::JLD2.JLDFile{JLD2.MmapIO})
    @ JLD2 ~/.julia/packages/JLD2/MYcfT/src/JLD2.jl:403
  [3] jldopen(fname::String, wr::Bool, create::Bool, truncate::Bool, iotype::Type{…}; fallback::Type{…}, compress::Bool, mmaparrays::Bool, typemap::Dict{…}, parallel_read::Bool)
    @ JLD2 ~/.julia/packages/JLD2/MYcfT/src/JLD2.jl:392
  [4] jldopen
    @ ~/.julia/packages/JLD2/MYcfT/src/JLD2.jl:312 [inlined]
  [5] #jldopen#14
    @ ~/.julia/packages/JLD2/MYcfT/src/JLD2.jl:455 [inlined]
  [6] jldopen
    @ ~/.julia/packages/JLD2/MYcfT/src/JLD2.jl:449 [inlined]
  [7] fileio_load(f::FileIO.File{FileIO.DataFormat{:JLD2}, String}, varname::String; kwargs::@Kwargs{})
    @ JLD2 ~/.julia/packages/JLD2/MYcfT/src/fileio.jl:54
  [8] fileio_load(f::FileIO.File{FileIO.DataFormat{:JLD2}, String}, varname::String)
    @ JLD2 ~/.julia/packages/JLD2/MYcfT/src/fileio.jl:53
  [9] #invokelatest#2
    @ ./essentials.jl:892 [inlined]
 [10] invokelatest
    @ ./essentials.jl:889 [inlined]
 [11] action(call::Symbol, libraries::Vector{Union{…}}, file::FileIO.Formatted, args::String; options::@Kwargs{})
    @ FileIO ~/.julia/packages/FileIO/xOKyx/src/loadsave.jl:219
 [12] action
    @ ~/.julia/packages/FileIO/xOKyx/src/loadsave.jl:196 [inlined]
 [13] action
    @ ~/.julia/packages/FileIO/xOKyx/src/loadsave.jl:185 [inlined]
 [14] load(file::String, args::String; options::@Kwargs{})
    @ FileIO ~/.julia/packages/FileIO/xOKyx/src/loadsave.jl:113
 [15] load(file::String, args::String)
    @ FileIO ~/.julia/packages/FileIO/xOKyx/src/loadsave.jl:109
 [16] top-level scope
    @ REPL[23]:1
Stacktrace:
 [1] handle_error(e::JLD2.InvalidDataException, q::Base.PkgId, bt::Vector{Union{Ptr{Nothing}, Base.InterpreterIP}})
   @ FileIO ~/.julia/packages/FileIO/xOKyx/src/error_handling.jl:61
 [2] handle_exceptions(exceptions::Vector{Tuple{Any, Union{Base.PkgId, Module}, Vector}}, action::String)
   @ FileIO ~/.julia/packages/FileIO/xOKyx/src/error_handling.jl:56
 [3] action(call::Symbol, libraries::Vector{Union{…}}, file::FileIO.Formatted, args::String; options::@Kwargs{})
   @ FileIO ~/.julia/packages/FileIO/xOKyx/src/loadsave.jl:228
 [4] action
   @ ~/.julia/packages/FileIO/xOKyx/src/loadsave.jl:196 [inlined]
 [5] action
   @ ~/.julia/packages/FileIO/xOKyx/src/loadsave.jl:185 [inlined]
 [6] load(file::String, args::String; options::@Kwargs{})
   @ FileIO ~/.julia/packages/FileIO/xOKyx/src/loadsave.jl:113
 [7] load(file::String, args::String)
   @ FileIO ~/.julia/packages/FileIO/xOKyx/src/loadsave.jl:109
 [8] top-level scope
   @ REPL[23]:1
Some type information was truncated. Use `show(err)` to see complete types.

@ovanvincq
Copy link
Author

@amontoison Sorry, the link above was the link to file desciption and not to the raw jld2 file.

The correct link is : https://github.com/ovanvincq/test_cudss/raw/main/matrix.jld2

I also uploaded the matrix in the MatrixMarket format: https://github.com/ovanvincq/test_cudss/raw/main/matrix.mtx

@amontoison
Copy link
Member

@ovanvincq I confirm that I can reproduce the issue.
I will report this problem to the CUDSS developers and keep you updated.

@amontoison amontoison added the bug Something isn't working label Jul 13, 2024
@amontoison
Copy link
Member

@ovanvincq I got some feedback from NVIDIA, and the LU factorization is not robust enough for your problem.
The only way to solve it with CUDSS is to perform a scaling of A on the CPU before factorizing it on the GPU.
This kind of preprocessing is done automatically in mature linear solvers on the CPU but not in cuDSS.

@ovanvincq
Copy link
Author

@amontoison Thanks for this feedback.
However, I rescaled the matrix with the scaling matrix given by LinearAlgebra.lu to obtain a correctly scaled matrix but it doesn't work:

using JLD2, CUDSS, CUDA, CUDA.CUSPARSE,LinearAlgebra,SparseArrays

A_cpu=load("matrix.jld2","A") 
### Matrix rescaling
F=lu(A_cpu)
A_cpu=(sparse(Diagonal(F.Rs))* A_cpu)
###
T=eltype(A_cpu)
n=size(A_cpu,1)
x_cpu = zeros(T, n)
b_cpu = rand(T, n)

A_gpu = CuSparseMatrixCSR(A_cpu)
x_gpu = CuVector(x_cpu)
b_gpu = CuVector(b_cpu)

F_gpu=lu(A_gpu)
x_gpu = F_gpu \ b_gpu
r_gpu = b_gpu - A_gpu * x_gpu
norm(r_gpu)

gives norm(r_gpu)=4.703611480548198e8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants