-
-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regression in sparse-dense multiplication #1037
Comments
An initial analysis below. Firstly, I used PProfDetailsv1.9.3: ```julia _spmatmul!(::Vector{Float64}, ::SparseMatrixCSC{Float64, Int64}, ::Vector{Float64}, ::Bool, ::Bool) /home/ark/exp/52137.jlTotal: 52298 53841 (flat, cum) 99.86%
Observations
AssemblyAssembly Observations
More later! |
Could maybe be an LLVM upgrade where the vectorizer now does a worse job? |
LLVM CodeComparing the LLVM code, generated using the command
Hence, at this point, my best guess is that the 4x unrolling in LLVM code causes register spills and reloads (2x of 64 bits each) when mapping to my Intel CPU (i7-1165G7), leading to the increased cycles observed in Line 15 of the above post. -> Any suggestions to confirm this guess are welcome! NB: It could be that the generated native code itself is suboptimal. But, based on my initial reading, the code structure, except for minor differences in code and the 4x unrolling, seems pretty similar between the two versions. |
See also #1044 |
On M-series macs, I see On x64, the gap is larger: |
I bisected this to dbd82a4dbab0582a345679eb83b2d99d40c0356a (JuliaLang/julia#49747). It's a bit funny because in the PR it shows that sparse matmul seems to have gotten the biggest improvement from this change. Edit: I also noticed that the benchmark case has a diagonal sparse matrix which isn't very representative... Each |
Is this still an issue with julia 1.12-dev? Just asking in case updates to our codegen/llvm versions have addressed this. |
Running this piece of code, I get
Multiplication dispatch has changed over the 1.9 to 1.10 transition, but this is nothing but the barebone mutliplication code that we used to have "ever since", so without character processing and all that. And since this is not calling high-level functions, the issue must be outside of SparseArrays.jl, AFAIU.
x-ref JuliaSparse/SparseArrays.jl#469
The text was updated successfully, but these errors were encountered: