-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sparse matrix * sparse vector is slow when the vector is relatively dense #58
Comments
Hello Augustin! Out of curiosity, what is your application? The extreme aspect ratio matrix with which you benchmark ( julia> using BenchmarkTools
julia> A = sprand(10^3, 10^3, 10.0^(-2)); # square, about ten entries per row/column
julia> x = sprand(10^3, 10.0^(-2)); # about ten entries
julia> @benchmark $A*$x
BenchmarkTools.Trial:
samples: 10000
evals/sample: 9
time tolerance: 5.00%
memory tolerance: 1.00%
memory estimate: 23.84 kb
allocs estimate: 4
minimum time: 2.37 μs (0.00% GC)
median time: 3.29 μs (0.00% GC)
mean time: 6.84 μs (31.31% GC)
maximum time: 369.23 μs (96.68% GC)
julia> @benchmark my_mul($A, $x)
BenchmarkTools.Trial:
samples: 10000
evals/sample: 1
time tolerance: 5.00%
memory tolerance: 1.00%
memory estimate: 15.38 kb
allocs estimate: 25
minimum time: 21.09 μs (0.00% GC)
median time: 21.94 μs (0.00% GC)
mean time: 23.96 μs (2.99% GC)
maximum time: 1.84 ms (96.69% GC) To check whether the existing code has room for (simple) improvement, I sketched a set of naive gather-scatter sparse-sparse matvec methods from scratch (code below, not performance tuned, and could probably avoid the julia> @benchmark spmatspvec($A, $x, spmatspvec_alloc($A, $x))
BenchmarkTools.Trial:
samples: 10000
evals/sample: 9
time tolerance: 5.00%
memory tolerance: 1.00%
memory estimate: 10.41 kb
allocs estimate: 4
minimum time: 2.18 μs (0.00% GC)
median time: 2.72 μs (0.00% GC)
mean time: 4.11 μs (19.71% GC)
maximum time: 302.21 μs (97.62% GC) The story is similar for the case you benchmarked, though with a far more modest allocation reduction (the scatter-gather workspace allocation dominates): julia> @benchmark $A*$x
BenchmarkTools.Trial:
samples: 11
evals/sample: 1
time tolerance: 5.00%
memory tolerance: 1.00%
memory estimate: 778.20 mb
allocs estimate: 7
minimum time: 421.87 ms (2.97% GC)
median time: 466.40 ms (11.56% GC)
mean time: 470.39 ms (12.19% GC)
maximum time: 551.85 ms (24.54% GC)
julia> @benchmark my_mul($A, $x)
BenchmarkTools.Trial:
samples: 89
evals/sample: 1
time tolerance: 5.00%
memory tolerance: 1.00%
memory estimate: 21.23 mb
allocs estimate: 52
minimum time: 52.13 ms (0.00% GC)
median time: 56.15 ms (3.34% GC)
mean time: 56.29 ms (3.29% GC)
maximum time: 62.77 ms (3.10% GC)
julia> @benchmark spmatspvec($A, $x, spmatspvec_alloc($A, $x))
BenchmarkTools.Trial:
samples: 11
evals/sample: 1
time tolerance: 5.00%
memory tolerance: 1.00%
memory estimate: 767.42 mb
allocs estimate: 7
minimum time: 428.40 ms (3.12% GC)
median time: 460.44 ms (11.58% GC)
mean time: 466.54 ms (12.11% GC)
maximum time: 554.29 ms (23.98% GC) But julia> @benchmark spmatspvec($A, $x, $w)
BenchmarkTools.Trial:
samples: 47
evals/sample: 1
time tolerance: 5.00%
memory tolerance: 1.00%
memory estimate: 4.48 mb
allocs estimate: 5
minimum time: 101.96 ms (0.00% GC)
median time: 106.70 ms (0.00% GC)
mean time: 107.08 ms (0.35% GC)
maximum time: 113.97 ms (0.00% GC) Much closer to the specialized ( _checkshapecompat(A::SparseMatrixCSC, x::SparseVector) =
size(A, 2) == length(x) || throw(DimensionMismatch(string("the number of ",
"columns of the sparse matrix must match the sparse vector's length")))
_checkshapecompat(A::SparseMatrixCSC, sgw::Vector) =
size(A, 1) <= length(sgw) || throw(DimensionMismatch(string("the scatter-gather ",
"workspace vector's length be at least the number of rows of the sparse matrix")))
"""
Given sparse matrix `A` and sparse vector `x` that you would like to multiply (or
surrogates with the shape of `A` and `x`), checks shape compatibility of `A` and `x`
and allocates an appropriately sized scatter-gather workspace.
"""
spmatspvec_alloc(A::SparseMatrixCSC, x::SparseVector) =
(_checkshapecompat(A, x); zeros(promote_type(eltype(A), eltype(x)), size(A, 1)))
"""
Given sparse matrix `A` and sparse vector `x` that you would like to multiply (or
surrogates with the shape of `A` and `x`), and also a scatter-gather workspace `sgw`,
checks shape compatibility of `A`, `x`, and `sgw`, and then zeros `sgw`.
"""
function spmatspvec_init!(A::SparseMatrixCSC, x::SparseVector, sgw::Vector)
_checkshapecompat(A, x)
_checkshapecompat(A, sgw)
fill!(sgw, 0)
end
"""
Returns `A*x` using scatter-gather workspace `sgw`. Assumes shape compatibility of
`A`, `x`, and `sgw`. Also assumes `sgw` begins zero'd, and returns with `sgw` zero'd.
"""
function spmatspvec(A::SparseMatrixCSC, x::SparseVector, sgw::Vector)
# Calculate the result's entries, scattering into sgw
resnnz::Int = 0
@inbounds for xk in 1:nnz(x)
xv = x.nzval[xk]
iszero(xv) && continue
xi = x.nzind[xk]
for Ak in A.colptr[xi]:(A.colptr[xi + 1] - 1)
# for Ak in nzrange(A, xi)
Ai = A.rowval[Ak]
Av = A.nzval[Ak]
sgwv = sgw[Ai]
iszero(sgwv) && (resnnz += 1)
sgw[Ai] = sgwv + Av * xv
end
end
# Gather the result's entries, zero'ing sgw
resnzind = Vector{eltype(x.nzind)}(resnnz)
resnzval = Vector{eltype(sgw)}(resnnz)
resk::Int = 1
# @inbounds for (sgwi, sgwv) in enumerate(sgw)
@inbounds for sgwi in 1:length(sgw)
sgwv = sgw[sgwi]
if !iszero(sgwv)
resnzind[resk] = sgwi
resnzval[resk] = sgwv
resk += 1
sgw[sgwi] = 0
end
end
return SparseVector(size(A, 1), resnzind, resnzval)
end
using Base.Test
let N = 100, M = 101, p = 0.1
A = sprand(N, M, p)
x = sprand(M, p)
# test allocation
@test_throws DimensionMismatch 4spmatspvec_alloc(spzeros(N, M), spzeros(N))
@test isa(spmatspvec_alloc(spzeros(Float64, N, M), spzeros(Float32, M)), Vector{Float64})
@test length(spmatspvec_alloc(spzeros(N, M), spzeros(M))) == N
@test iszero(spmatspvec_alloc(A, x))
# test initialization
w = spmatspvec_alloc(A, x)
@test_throws DimensionMismatch spmatspvec_init!(A, spzeros(N), w)
@test_throws DimensionMismatch spmatspvec_init!(A, x, zeros(1))
@test (rand!(w); spmatspvec_init!(A, x, w); iszero(w))
# test kernel
@test spmatspvec(A, x, w) == A*x
@test iszero(w)
end |
Hi, thanks for your response. My application is pricing/managing/reporting reinsurance contracts. We essentially have a simulation with a very large number of events which can potentially take losses. My matrix has a row for each event and a column for each contract. In practice a relatively small number of events actually take losses. Happy to talk more but perhaps this is not the right forum. The actual length of the sparse vector has not been an issue up to now because all the operations i have used scaled with the non zeros. I dont actually store the full vector anywhere. Unfortunately A*x allocates memory to store the full vector, which cannot fit in memory. I can see the merit of the scatter-gather method but it doesnt help in my 'pathological' case. As a matter of principle, sparse operations should scale with nnz and never have to allocate the full vector / matrix. Which something like Pragmatically this might not always be possible or necessary especially if there are no use cases which would benefit. I am happy to have this issue closed. Thanks again, Gus |
Couldn't this operation use a polyalgorithm that looks at the actual size/shape to decide the best approach? As long as the result type is predictable, should be fine. |
Chewed a bit more on this and here is a better solution for my specific use case. My I pre calculate a condensed matrix
|
A common approach to efficient sparse-sparse matvecs in cases like yours: For each nonzero value in The approach you take above to exploiting the additional information that julia> using BenchmarkTools
julia> m, n, p = 10^8, 10^3, 10.0^(-5);
julia> A = sprand(m, n, p);
julia> x = sprand(n, 0.3);
julia> Ac, nzrows, Am = pre_mul(A);
julia> @benchmark y = c_mul($Ac, $nzrows, $Am, $x)
BenchmarkTools.Trial:
samples: 381
evals/sample: 1
time tolerance: 5.00%
memory tolerance: 1.00%
memory estimate: 15.19 mb
allocs estimate: 7
minimum time: 10.77 ms (0.00% GC)
median time: 13.07 ms (13.23% GC)
mean time: 13.14 ms (12.22% GC)
maximum time: 17.13 ms (17.60% GC)
julia> aux = spmatspvec_init(A, promote_type(eltype(A), eltype(x)));
julia> @benchmark y = spmatspvec($A, $x, $aux)
BenchmarkTools.Trial:
samples: 581
evals/sample: 1
time tolerance: 5.00%
memory tolerance: 1.00%
memory estimate: 4.30 mb
allocs estimate: 5
minimum time: 7.05 ms (0.00% GC)
median time: 8.10 ms (0.00% GC)
mean time: 8.61 ms (5.09% GC)
maximum time: 11.94 ms (23.38% GC) You could avoid the remaining scatter-gather workspace scan by using e.g. a singly linked list to track which entries you populate in the scatter-gather workspace. Chances are that approach would only reduce runtime for significantly less populated using Base.SparseArrays: indtype
struct SpSpMatvecAux{Tv,Ti<:Integer}
condrowval::Vector{Ti}
nonzerorows::Vector{Ti}
sgw::Vector{Tv}
end
function spmatspvec_init{reseltype}(A::SparseMatrixCSC, ::Type{reseltype})
TiA = indtype(A)
nzrows = sort!(unique(A.rowval[1:nnz(A)]))
indexmap = Dict{TiA,TiA}(zip(nzrows, 1:length(nzrows)))
condrowval = TiA[ indexmap[A.rowval[Ak]] for Ak in 1:nnz(A) ]
sgworkspace = zeros(reseltype, length(nzrows))
return SpSpMatvecAux(condrowval, nzrows, sgworkspace)
end
"""
Returns `A*x` using auxiliary `aux`. Assumes shape compatibility of
`A` and `x`, and assumes `aux` was generated for `A` and `x` and not
modified outside this routine.
"""
function spmatspvec(A::SparseMatrixCSC, x::SparseVector, aux::SpSpMatvecAux)
# Calculate the result's entries, scattering into aux.sgw
resnnz::Int = 0
@inbounds for xk in 1:nnz(x)
xv = x.nzval[xk]
iszero(xv) && continue
xi = x.nzind[xk]
# for Ak in nzrange(A, xi)
for Ak in A.colptr[xi]:(A.colptr[xi + 1] - 1)
Ai = aux.condrowval[Ak]
Av = A.nzval[Ak]
sgwv = aux.sgw[Ai]
iszero(sgwv) && (resnnz += 1)
aux.sgw[Ai] = sgwv + Av * xv
end
end
# Gather the result's entries, zero'ing aux.sgw, and mapping back to actual row indices
resnzind = Vector{indtype(x)}(resnnz)
resnzval = Vector{eltype(aux.sgw)}(resnnz)
resk::Int = 1
sgwi, sgwimax = 1, length(aux.sgw)
@inbounds while sgwi <= sgwimax && resk <= resnnz
sgwi += 1
sgwv = aux.sgw[sgwi]
if !iszero(sgwv)
resnzind[resk] = aux.nonzerorows[sgwi]
resnzval[resk] = sgwv
resk += 1
aux.sgw[sgwi] = 0
end
end
return SparseVector(size(A, 1), resnzind, resnzval)
end Best! |
Hi @Sacha0 I think it should be: |
Good catch! Looks like either: (1) |
Hi @Sacha0
gives
|
Also a good catch! There are a number of ways you can fix that issue without adding another sweep ( aux.sgwv[Ai] = newval = sgwv + Av * xv
if iszero(sgwv) && !iszero(newval)
resnnz += 1
elseif !iszero(sgwv) && iszero(newval)
resnnz -= 1
end should do the trick. The second approach should be more efficient than the first in some cases. Best! |
Was this actually resolved? I got bitten by this today (on 0.6), and ended up hacking my own together (https://gist.github.com/simonbyrne/43a4041c76e18f79837b06dec459b820). |
No official fix. |
Okay, in that case I think we should reopen it. Sounds like some sort of polyalgorithm is the way forward as @StefanKarpinski suggested. |
Would it make sense to have a "HyperSparse" matrix type (probably in a package) for situations like this? |
I’m not sure what’s you mean: care to expand? |
Hypersparse typically means sparse in both rows and columns, i.e. efficient even in the case when nnz << min(nrows, ncols). There are a few data structures that are typically used for this kind of thing. @ViralBShah would know more about it. |
Ah thanks. Well that didn't apply in my case: I had nrows = ncols = O(1e6), and nnz = O(1e8) |
The sparse data structure we use (CSC) is a dense vector of sparse columns. It is good for linear algebra type stuff and the traditional finite element style problems, but not good for hypersparse problems that have less than 1 nonzero per row/column on average (if you use CSC for these, you are just spinning your wheels iterating through empty columns 90% of the time). You can do significantly better as discussed here. I think auto-detecting all this is a bit too magical. Perhaps a |
Ah seeing @simonbyrne 's comment above, maybe we have general inefficiencies in here. His case is certainly not hypersparse. |
Since we have a sparse vector now, we should stop converting this one to dense in the sparse-sparse matvec. |
I have tried many things to address this issue and must conclude, that the current solution (writing the results of the multiplication into a dense vector, then pruning it) is probably the best general purpose solution. The problem is, that the number of nonzero values in the result of a multiplication The suggested solutions work well for certain cases where A = sprandn(10_000, 10_000, 0.01)
v = sprand(10_000, 0.05) Another idea was to adopt the algorithm for sparse matrix times sparse matrix multiplication. A quick benchmark (using the
So it consumes slightly less memory than the current solution, but it is also slower... If someone knows an algorithm that is both fast and does not blow up the memory usage for general cases, I'm happy to try it! |
Hi,
The current implementation seems to produce a dense result vector which is then converted to sparse before returning. I ran into memory scaling issues. Perhaps something along the lines of
my_mul
below would be worth having?The text was updated successfully, but these errors were encountered: