-
Notifications
You must be signed in to change notification settings - Fork 182
Optimize IntMap.Bin #995
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize IntMap.Bin #995
Conversation
This is most impressive. I will do my best to review it promptly. In the mean time, I have a big favor to ask: I was struggling the other day to merge the branch for version 0.7 into master. Do you think you could try to figure out how to do that right? |
(prefix + mask) is just a part of the key with an additional mask bit set, could this be abused for performance? With a key and a compatible prefix,
So the two comparison patterns could be reduced to the following: -- Single-key operation with eager failure
trimatch p k =
let o = xor p k
in case () of
() | o .&. p == 0 = goRight
| unsafeShiftR o 1 .&. p == 0 = goLeft
| otherwise = goFail
-- Merge
quadbranch p1 p2 =
let o = xor p k `unsafeShiftR` 1
in case () of
() | p1 == p2 = goEqual
| o .&. p1 == 0 = goBelowP1
| o .&. p2 == 0 = goBelowP2
| otherwise = goDiffer |
@treeowl I assume you mean #994. I don't know much about the CI setup but I'll take a look.
@BurningWitness it is unclear to me how that would work. For instance, |
Oh, true, |
Ah well if the prefix already matches, the branch into left or right is done with one comparison so it's hard to imagine something faster than that. |
I believe Previous: Correct me if I'm wrong, I know little about things at this low a level. Now, if only |
I doubt the comparison costs significantly over the test, but benchmarking is the way to know for sure. |
Okay I found a better way to branch that avoids @treeowl would you like to review? I think it's in a good shape now. |
There are barely any usages of these now. It also becomes tempting to use these if they exist, rather than thinking of a more efficient way.
It would be nice to wrap Also this should mean precise lookups can be expressed as trimatch p k =
if k < p
then if k >= lower p
then goLeft
else goFail
else if k <= upper p
then goRight
else goFail (I think this is faster than the algorithm listed in the paper, four operations in any direction instead of four for failure plus one for picking a side) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will need to do a more thorough review of this on Sunday, but I've left a few comments for now. I'm exited about your work here!
If that becomes the predominant way of viewing
I like that, since it uses the same view as |
Indeed that is the case, the failure branch needs to be let-bound with a |
I made a local benchmark on a |
The prefix is already checked so just check the mask bit for the children's keys. Use a more straightforward prefix comparison instead of bitwise ops since efficiency is not a major concern here.
Hi @treeowl, please review when you get the time. |
-- * pw .&. (pw-1) sets the mask bit and every bit below it to 0. This is the | ||
-- smallest key the Bin can have. | ||
-- | ||
-- First, we compare the prefixes to each other. Then we compare a prefix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do we learn by comparing the prefixes to each other? That's not exactly an obvious thing to do, and needs documentation. Here's my understanding so far:
p1
is the smallest permissible value in the right child of its Bin
. If pw1 < pw2
, then (according to left
) p1
is to the left of the split created by p2
's Bin
.
How exactly does that help?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know how to explain this in a way that conveys more information than the code.
If
pw1 < pw2
, then (according toleft
)p1
is to the left of the split created byp2
'sBin
.
That's correct.
How exactly does that help?
It helps because it narrows down the possibilities from 6 to 3. For instance, if p1
's Bin
is in the right child of p2
's Bin
it implies that pw1 > pw2
, since pw2
is the smallest permissible value of the right child of p2
's Bin
. Which is ruled out if pw1 < pw2
is true.
I'm ready to merge, but I want to make sure the introduction of the helpers in your last commit didn't hurt performance. Did you check the performance effect of that, or (alternatively) verify that the Core, unfolding, and unfolding guidance are the same? |
Awesome 🎉 |
Ok I think this is good to go. Updated the benchmarks. I found some odd increases in a few I also added the Let me know if things look good and I'll squash it into a few commits (benchmark flag, tests, main IntMap changes). |
Why would you set |
|
Let's wait with that one for now. I'd like to hear opinions from others on a separate PR. |
Sounds good. Removed the flag and re-ran the benchmarks. |
Aaaaand we're merged! Fantastic work! |
Thanks for reviewing! Next up... |
Replaces the separate
Prefix
andMask
fields in theBin
constructor with a single field that contains both.This reduces the memory required by a
Bin
from 5 to 4 words, at the cost of more computations (which are cheap bitwise ops) being necessary for certain operations.Implements #991 for
IntMap
only. The same can be done forIntSet
if this is accepted.Memory
Concretely, this reduces the memory required by an
IntMap
by ~12.5% (including the keys and values).Calculations: For a map with
n
key-values, there aren
Tip
s costing 3 words each andn-1
Bin
s costing 5 words each before this change and 4 words after. So we save about 1 out of 8 words per key-value.Compatibility
This PR, in its current state, makes breaking changes to IntMap internals which is reflected in the exports of the internal module
Data.IntMap.Internal
. There is no change in the exports of any other module.Benchmarks
Benchmarks done with GHC 9.6.3.
Last updated: 7e2ca1c
The output is from a hacky script I put together that compares tasty-bench's csv files.
Benchmark command:
cabal run <target> -- --csv <csv> +RTS -T
intmap-benchmarks
set-operations-intmap
Allocations have gone down in every benchmark, which is expected but quite pleasant to see.
Runtime has gone down in most benchmarks and gone up in a few.
Notable time regressions
The only significant regressions (>10%) I see are in the$O(W)$ , due to one the left map being
union-disj
andintersection-disj
set operations. However I don't consider this a problem because these operations on the specific disjoint maps used in the tests wrap up in[1..n]
and the right being[n+1...n+x]
. This is measuring the cost of one set of new bitwise ops vs old bitwise ops, which is unsurprisingly a little slower.Edit (58cbc6a):
union-disj
is better now.Edit (4b708ba):
intersection-disj
is comparable now.Let me know if anything else stands out as troublesome.