-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce TreapList #12
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty cool! I like it :)
I read only TreapListNode
which seems like the heart of it all, and I'm not sure I'll really read the rest... unless you insist :)
else -> index | ||
} | ||
|
||
private inline fun <T> traverseToIndex( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think a better name is something like stepTowardsIndex
(or something like that) because we only do one step.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, that's better
} | ||
} | ||
|
||
private fun computeHashCode(initial: Int): Int { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably can be done with forEachNode, and IMHO a bit more readable:
private fun computeHashCode(initial: Int): Int {
var code = 0
forEachNode { e ->
code = 31 * code + e.hashCode()
}
return code
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm using recursive descent here to avoid allocating the closure for the lamba in the forEachNode
call. Will add a note.
|
||
override fun listIterator(index: Int): ListIterator<E> = object : ListIterator<E> { | ||
/** stores the path back to the original root of the tree. */ | ||
val stack = ArrayDeque<TreapListNode<E>>() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any point in setting the size of the stack to the max depth? we probably don't have it, so maybe to 2 * log2(size)
or something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could estimate the max depth by checking the depth down a single path. I'll try that.
else -> this.with(right = this.right append that) | ||
} | ||
|
||
private fun split(index: Int): Pair<TreapListNode<E>?, TreapListNode<E>?> = traverseToIndex( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe add a comment that the index itself will be in the second treap in the result.
Maybe for efficiency add the check if the index is 0, or higher than the size (not sure if size-1 or size).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe add a comment that the index itself will be in the second treap in the result.
Will do
Maybe for efficiency add the check if the index is 0, or higher than the size (not sure if size-1 or size).
All of the callers already do those checks
when { | ||
index == size -> addLast(element) | ||
index == 0 -> addFirst(element) | ||
else -> split(index).let { (l, r) -> l!! append TreapListNode(element) append r!! } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably practically meaningless, but the first append
can be replaced with addLast
for better efficiency (very slight). Right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea!
@@ -0,0 +1,66 @@ | |||
package com.certora.collect | |||
|
|||
internal class EmptyTreapList<E> private constructor() : TreapList<E>, java.io.Serializable, AbstractList<E>() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's an interesting way of doing this...
I know the standard approach - that there is the recursive data structure, which if empty is null, but it's not really exposed, and always wrapped in an outer box.
This approach saves the memory needed for the outer object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It saves the additional GC heap churn from allocating the outer object every time we generate a new list. It also lets us optimize specifically for operations on empty lists (though who knows how significant that is?).
right?.forEachNode(action) | ||
} | ||
|
||
private fun forEachNodeIndexed(thisIndex: Int, action: (Int, E) -> Unit) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be removed, and the public function written as:
fun forEachElementIndexed(action : (Int, E) -> Unit) {
var i = 0
forEachNode {
action(i++, it)
}
}
Also, I believe this would also be much clearer and just as good as the current implementation:
override fun indexOf(element: E): Int? {
forEachElementIndexed { i, e ->
if (element == e) return i
}
return null
}
for the reverse case, maybe it's best to just implement reverse traversal? it's much simpler code-wise than these calculations. At least to me....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like elsewhere, the recursive descent here avoids allocating a closure. It probably doesn't make a significant difference for large lists, but for small ones this way should be noticeably faster.
override fun indexOf(element: E): Int = indexOfFirstNode(rootIndex(), element) ?: -1 | ||
override fun lastIndexOf(element: E): Int = indexOfLastNode(rootIndex(), element) ?: -1 | ||
|
||
override fun contains(element: E): Boolean = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
imho using forEachElement
here would be nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same reasoning about recursion vs. closure allocation here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yah, I see what you mean (though I am sure what the costs are).
I guess Ideally, we can write a forEachElement
which is an inline function, but that would mean we need to write it non-recursively - I guess much like the iterator code.
Of course then it can't be public?
anyway, probably not worth the effort - though I do find it much more readable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw, the benchmarking is so extreme - as perhaps the most extreme example: how come setting a value in an ArrayList
is so slow? yours is 40 times faster for the longest lists there, and the immutable version is even faster. That's super weird...
I'm comparing operations that generate a new list as their result. For example, For in-place mutation, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks great!
TreapList
is a Treap-basedPersistentList
implementation which significantly outperforms bothArrayList
and the referencePersistentList
implementation fromkotlinx.collections.immutable
for most operations. Nearly everyPersistentList
operation has a log-time implementation, including things like inserting one list in the middle of another list.TreapList
is notably a rather efficient persistent Deque, with O(log(N)) insertion and removal at both ends of the list.Design
TreapList
is organized as a Treap, with left/right being determined by the order of the elements in the list, and with randomly-assigned node priorities to balance the tree. This is different from how we assign priorities inTreapMap
/TreapSet
, where we hash the element values to get the priority; becauseTreapList
can contain multiple elements with the same value, we must randomly assign priorities to ensure that, say, a list of 1,000,000 entries with the same value won't just end up being a linked list. If we randomly assigned priorities in the set/map implementations, it would complicate unions, intersections, merges, etc. - but those operations do not apply to Lists, so random priorities are fine here.Also unlike the set/map implementations, each
TreapList
node tracks the size of the sublist represented by that node, enabling log-time indexing into the list.TreapList
is a little like a probabilistically balanced Rope, but as an arbitrary list.Performance
For most operations,
TreapList
outperformsArrayList
(when used as an immutable list), as well as the referencePersistentList
implementation fromkotlinx.collections.immutable
. The most notable exception is theget
operation, which forArrayList
is a simple array indexing operation:TreapList
generally matches or beatsArrayList
when adding a single element to the end of the list (producing a new list), but is beaten bykotlinx.collections.immutable
:However, for most other list operations,
TreapList
wins handily. For example, insertion of a single item at the front of the list:Replacing the value at a given index:
Or appending one list to another list:
More benchmarks are available; you can run
gradlew listBenchmark
to run them all.Memory Usage
One disadvantage of
TreapList
vs the alternatives is that a singleTreapList
of a given size is quite a lot larger than the equivalentArrayList
orkotlinx.collections.immutable
list:However,
TreapList
is better able to re-use allocations between "versions" of a list. For example, consider the scenario where we add one item at a time, in the middle of the list, retaining all intermediate results, and adding up the total size:We can see that in this extreme case,
TreapList
uses much less memory than the other alternatives.Real heap usage will of course depend on how much the specific use case is able to take advantage of the increased allocation sharing.