Introduce TreapList #12

ericeil · 2024-02-05T17:09:58Z

TreapList is a Treap-based PersistentList implementation which significantly outperforms both ArrayList and the reference PersistentList implementation from kotlinx.collections.immutable for most operations. Nearly every PersistentList operation has a log-time implementation, including things like inserting one list in the middle of another list.

TreapList is notably a rather efficient persistent Deque, with O(log(N)) insertion and removal at both ends of the list.

Design

TreapList is organized as a Treap, with left/right being determined by the order of the elements in the list, and with randomly-assigned node priorities to balance the tree. This is different from how we assign priorities in TreapMap/TreapSet, where we hash the element values to get the priority; because TreapList can contain multiple elements with the same value, we must randomly assign priorities to ensure that, say, a list of 1,000,000 entries with the same value won't just end up being a linked list. If we randomly assigned priorities in the set/map implementations, it would complicate unions, intersections, merges, etc. - but those operations do not apply to Lists, so random priorities are fine here.

Also unlike the set/map implementations, each TreapList node tracks the size of the sublist represented by that node, enabling log-time indexing into the list.

TreapList is a little like a probabilistically balanced Rope, but as an arbitrary list.

Performance

For most operations, TreapList outperforms ArrayList (when used as an immutable list), as well as the reference PersistentList implementation from kotlinx.collections.immutable. The most notable exception is the get operation, which for ArrayList is a simple array indexing operation:

TreapList generally matches or beats ArrayList when adding a single element to the end of the list (producing a new list), but is beaten by kotlinx.collections.immutable:

However, for most other list operations, TreapList wins handily. For example, insertion of a single item at the front of the list:

Replacing the value at a given index:

Or appending one list to another list:

More benchmarks are available; you can run gradlew listBenchmark to run them all.

Memory Usage

One disadvantage of TreapList vs the alternatives is that a single TreapList of a given size is quite a lot larger than the equivalent ArrayList or kotlinx.collections.immutable list:

However, TreapList is better able to re-use allocations between "versions" of a list. For example, consider the scenario where we add one item at a time, in the middle of the list, retaining all intermediate results, and adding up the total size:

We can see that in this extreme case, TreapList uses much less memory than the other alternatives.

Real heap usage will of course depend on how much the specific use case is able to take advantage of the increased allocation sharing.

yoav-rodeh

Pretty cool! I like it :)

I read only TreapListNode which seems like the heart of it all, and I'm not sure I'll really read the rest... unless you insist :)

yoav-rodeh · 2024-02-08T00:11:02Z

collect/src/main/kotlin/com/certora/collect/TreapListNode.kt

+        else -> index
+    }
+
+    private inline fun <T> traverseToIndex(


i think a better name is something like stepTowardsIndex (or something like that) because we only do one step.

Yep, that's better

yoav-rodeh · 2024-02-08T00:27:05Z

collect/src/main/kotlin/com/certora/collect/TreapListNode.kt

+        }
+    }
+
+    private fun computeHashCode(initial: Int): Int {


probably can be done with forEachNode, and IMHO a bit more readable:

private fun computeHashCode(initial: Int): Int { var code = 0 forEachNode { e -> code = 31 * code + e.hashCode() } return code }

I'm using recursive descent here to avoid allocating the closure for the lamba in the forEachNode call. Will add a note.

yoav-rodeh · 2024-02-08T00:30:21Z

collect/src/main/kotlin/com/certora/collect/TreapListNode.kt

+
+    override fun listIterator(index: Int): ListIterator<E> = object : ListIterator<E> {
+        /** stores the path back to the original root of the tree. */
+        val stack = ArrayDeque<TreapListNode<E>>()


any point in setting the size of the stack to the max depth? we probably don't have it, so maybe to 2 * log2(size) or something.

We could estimate the max depth by checking the depth down a single path. I'll try that.

yoav-rodeh · 2024-02-08T00:45:43Z

collect/src/main/kotlin/com/certora/collect/TreapListNode.kt

+        else -> this.with(right = this.right append that)
+    }
+
+    private fun split(index: Int): Pair<TreapListNode<E>?, TreapListNode<E>?> = traverseToIndex(


maybe add a comment that the index itself will be in the second treap in the result.

Maybe for efficiency add the check if the index is 0, or higher than the size (not sure if size-1 or size).

maybe add a comment that the index itself will be in the second treap in the result.

Will do

Maybe for efficiency add the check if the index is 0, or higher than the size (not sure if size-1 or size).

All of the callers already do those checks

yoav-rodeh · 2024-02-08T00:59:38Z

collect/src/main/kotlin/com/certora/collect/TreapListNode.kt

+        when {
+            index == size -> addLast(element)
+            index == 0 -> addFirst(element)
+            else -> split(index).let { (l, r) -> l!! append TreapListNode(element) append r!! }


probably practically meaningless, but the first append can be replaced with addLast for better efficiency (very slight). Right?

yoav-rodeh · 2024-02-08T01:06:17Z

collect/src/main/kotlin/com/certora/collect/EmptyTreapList.kt

@@ -0,0 +1,66 @@
+package com.certora.collect
+
+internal class EmptyTreapList<E> private constructor() : TreapList<E>, java.io.Serializable, AbstractList<E>() {


that's an interesting way of doing this...
I know the standard approach - that there is the recursive data structure, which if empty is null, but it's not really exposed, and always wrapped in an outer box.

This approach saves the memory needed for the outer object?

It saves the additional GC heap churn from allocating the outer object every time we generate a new list. It also lets us optimize specifically for operations on empty lists (though who knows how significant that is?).

yoav-rodeh · 2024-02-08T01:45:22Z

collect/src/main/kotlin/com/certora/collect/TreapListNode.kt

+        right?.forEachNode(action)
+    }
+
+    private fun forEachNodeIndexed(thisIndex: Int, action: (Int, E) -> Unit) {


I think this can be removed, and the public function written as:

fun forEachElementIndexed(action : (Int, E) -> Unit) { var i = 0 forEachNode { action(i++, it) } }

Also, I believe this would also be much clearer and just as good as the current implementation:

override fun indexOf(element: E): Int? { forEachElementIndexed { i, e -> if (element == e) return i } return null }

for the reverse case, maybe it's best to just implement reverse traversal? it's much simpler code-wise than these calculations. At least to me....

Like elsewhere, the recursive descent here avoids allocating a closure. It probably doesn't make a significant difference for large lists, but for small ones this way should be noticeably faster.

yoav-rodeh · 2024-02-08T01:50:32Z

collect/src/main/kotlin/com/certora/collect/TreapListNode.kt

+    override fun indexOf(element: E): Int = indexOfFirstNode(rootIndex(), element) ?: -1
+    override fun lastIndexOf(element: E): Int = indexOfLastNode(rootIndex(), element) ?: -1
+
+    override fun contains(element: E): Boolean =


imho using forEachElement here would be nice.

Same reasoning about recursion vs. closure allocation here.

yah, I see what you mean (though I am sure what the costs are).
I guess Ideally, we can write a forEachElement which is an inline function, but that would mean we need to write it non-recursively - I guess much like the iterator code.

Of course then it can't be public?

anyway, probably not worth the effort - though I do find it much more readable.

yoav-rodeh

btw, the benchmarking is so extreme - as perhaps the most extreme example: how come setting a value in an ArrayList is so slow? yours is 40 times faster for the longest lists there, and the immutable version is even faster. That's super weird...

ericeil · 2024-02-08T14:47:34Z

btw, the benchmarking is so extreme - as perhaps the most extreme example: how come setting a value in an ArrayList is so slow? yours is 40 times faster for the longest lists there, and the immutable version is even faster. That's super weird...

I'm comparing operations that generate a new list as their result. For example, list + elem generates a new list containing the elements of list and the additional element. For ArrayList this involves copying all of the existing elements into a new list, which is why it's so slow for large lists.

For in-place mutation, ArrayList would of course do much better; but the point of TreapList is to make "functional" updates more efficient.

yoav-rodeh

looks great!

TreapList

db81422

ericeil changed the title ~~TreapList~~ Introduce TreapList Feb 5, 2024

Add some comments

5db44dc

ericeil marked this pull request as ready for review February 5, 2024 22:06

ericeil requested review from jtoman and yoav-rodeh February 5, 2024 22:06

yoav-rodeh reviewed Feb 8, 2024

View reviewed changes

yoav-rodeh approved these changes Feb 8, 2024

View reviewed changes

Address @yoav-rodeh's comments

f107d17

ericeil merged commit 315e3db into Certora:main Feb 8, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce TreapList #12

Introduce TreapList #12

ericeil commented Feb 5, 2024 •

edited

Loading

yoav-rodeh left a comment

yoav-rodeh Feb 8, 2024

ericeil Feb 8, 2024

yoav-rodeh Feb 8, 2024

ericeil Feb 8, 2024

yoav-rodeh Feb 8, 2024

ericeil Feb 8, 2024

yoav-rodeh Feb 8, 2024

ericeil Feb 8, 2024

yoav-rodeh Feb 8, 2024

ericeil Feb 8, 2024

yoav-rodeh Feb 8, 2024

ericeil Feb 8, 2024

yoav-rodeh Feb 8, 2024

ericeil Feb 8, 2024

yoav-rodeh Feb 8, 2024

ericeil Feb 8, 2024

yoav-rodeh Feb 8, 2024

yoav-rodeh left a comment

ericeil commented Feb 8, 2024

yoav-rodeh left a comment

		@@ -0,0 +1,66 @@
		package com.certora.collect

		internal class EmptyTreapList<E> private constructor() : TreapList<E>, java.io.Serializable, AbstractList<E>() {

Introduce TreapList #12

Introduce TreapList #12

Conversation

ericeil commented Feb 5, 2024 • edited Loading

Design

Performance

Memory Usage

yoav-rodeh left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yoav-rodeh left a comment

Choose a reason for hiding this comment

ericeil commented Feb 8, 2024

yoav-rodeh left a comment

Choose a reason for hiding this comment

ericeil commented Feb 5, 2024 •

edited

Loading