-
-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extending inner products #334
Comments
I that this would be useful to have in base and I've thought about adding it several times. One of the benefits is that you can completely avoid a temporary |
yes - that was exactly the origin of this - I should have said, sorry. |
Would be nice to get some benefit from BLAS here, though it's only a BLAS2 operation. @xianyi, do you have a suggestion on the most efficient way to use BLAS for One option would be to break |
Ideally you'd also want a specialized method for the common case where And tridiagonal "mass" matrices are also common, as well as general sparse matrices |
I was mostly thinking of sparse cases for my own use. These should be relatively easy to implement in Julia? |
For the matvec, it seems that multithreading is almost the only speedup you'll get from BLAS. For a 500x500 problem and a Julia double loop with julia> blas_set_num_threads(1);
julia> @time for i = 1:10000; Ac_mul_B!(y, A, x); end # OpenBLAS 1 thread
0.484342 seconds
julia> blas_set_num_threads(4);
julia> @time for i = 1:10000; Ac_mul_B!(y, A, x); end # OpenBLAS 4 threads
0.184681 seconds
julia> @time for i = 1:10000; mymul(y, A, x); end # Julia
0.509110 seconds so if/when the |
I'm surprised there isn't some SIMD speedup. |
Probably also worthwhile to special case |
@stevengj I don't understand. Both OpenBLAS and Julia use SIMD so where would you have expected to have seen the speedup? |
In OpenBLAS the SIMD is hard-coded (I thought), whereas in Julia we rely on the compiler to do its magic, which doesn't always work as well. Maybe this is one of the cases where the compiler does a good job in exploiting SIMD. |
In my experiments with mat-vec and mat-mat, Julia has done a good job exploiting SIMD. For mat-vec my implemention was asymptotically as fast as OpenBlas, although they use a better algorithm for mat-mat, so theirs was faster in that case (about about 3x IIRC). |
To follow up on this, for when everything is sparse we can also do much better by interleaving the computations. e.g. in a recent project I did:
Once we have lazy transposes, we could overload the ternary multiplication operator |
As pointed out by @andyferris, we can now overload ternary
This gives a 2x speedup on my machine. |
Credit to @ViralBShah also I think? (Note we need to make the mechanics of |
I still think it would be nice to have |
@andyferris Why would we need to change |
I just mean the lowering. You want it to come out as |
Overloading the ternary * seems to do that already |
Yeah, as long as you use an explicit
|
This would be fixed if we settle on a definition of (and implement) a matrix transpose because then all the |
Have there been any regrets about implementing a vector transpose? |
I think that people are generally happy with the vector transpose. There have been a few comments where people have been surprised/confused/annoyed but I think that is to be expected given the length of #42. |
Yes, I have to say the experience with RowVectors behaviorally embedded as row matrices except for linear algebraic operations has been lovely. They generally just do what you want. |
The generalized inner product would be nice to have without creating a temporary. |
What is the current thinking on bringing this to Julia? This type of quadratic form x'Qy (often with x = y and Q symmetric) is ubiquitous in math-optimization (I'm surprised there is no blas function for it). I see there is some implementation in PDMats.jl for the case where Q is PSD (which is restrictive). Brief related discussion: https://discourse.julialang.org/t/matrix-multiplication-a-b-a |
@chriscoey I suspect this might be an issue waiting for someone to contribute a pull request to |
So, we do have a ternary function *(x::Adjoint{<:Any,<:AbstractVector}, A::SparseMatrixCSC, y::AbstractVector)
length(x) == A.m || throw(DimensionMismatch())
A.n == length(y) || throw(DimensionMismatch())
T = promote_type(eltype(x), eltype(A), eltype(y))
out = T(0)
rvals = rowvals(A)
nzvals = nonzeros(A)
@inbounds for col = 1:A.n
temp = T(0)
for k in nzrange(A, col)
temp += x[rvals[k]] * nzvals[k]
end
out += temp * y[col]
end
return out
end Would we insert this into |
I would have the |
Two related questions:
which I don't want to hard-code prefer an ability to overload it. Is this functionality sufficiently general (I think it is) that it should be in Base by default for the standard arrays?
EDIT: second question is deleted; it needs more thinking.
The text was updated successfully, but these errors were encountered: