[SPARK-51955] Adding release() to ReadStateStore interface and reusing ReadStore for Streaming Aggregations #50742

ericm-db · 2025-04-28T17:24:02Z

What changes were proposed in this pull request?

Adding a release() method to the ReadStateStore interface to properly close read stores without aborting them
Implementing a getWriteStore() method that allows converting a read-only store to a writable store
Creating a StateStoreRDDProvider interface for tracking state stores by partition ID
Enhancing StateStoreRDD to find and reuse existing state stores through RDD lineage
Improving task completion handling with proper cleanup listeners

Why are the changes needed?

Currently, stateful operations like aggregations follow a pattern where both read and write stores are opened simultaneously:
readStore.acquire()
writeStore.acquire()
writeStore.commit()
readStore.abort()
This pattern creates inefficiency because:

The abort() call on the read store unnecessarily invalidates the store's state, causing subsequent operations to reload the entire state store from scratch
Having two stores open simultaneously increases memory usage and can create contention issues
The upcoming lock hardening changes will only allow one state store to be open at a time, making this pattern incompatible

With the new approach, the usage paradigm becomes:
readStore = getReadStore()
writeStore = getWriteStore(readStore)
writeStore.commit()
This new paradigm allows us to reuse an existing read store by converting it to a write store using getWriteStore(), and properly clean up resources using release() instead of abort() when operations complete successfully. This avoids the unnecessary reloading of state data and improves performance while being compatible with future lock hardening changes.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Unit tests

Was this patch authored or co-authored using generative AI tooling?

No

liviazhu-db · 2025-04-29T22:33:11Z

...rc/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBStateStoreProvider.scala

+    loadStateStore(version, uniqueId, readOnly = false)
+  }
+
+  override def getWriteStore(


nit: rename to getWriteStoreFromReadStore?

liviazhu-db · 2025-04-29T22:34:49Z

...rc/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBStateStoreProvider.scala

+      readStore: ReadStateStore,
+      version: Long,
+      uniqueId: Option[String] = None): StateStore = {
+    assert(version == readStore.version)


Can you leave a comment or more informative error msg here?

liviazhu-db · 2025-04-29T22:35:30Z

sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStore.scala

@@ -565,6 +582,11 @@ trait StateStoreProvider {
      version: Long,
      stateStoreCkptId: Option[String] = None): StateStore

+  def getWriteStore(


add docs comment?

liviazhu-db · 2025-04-29T22:36:56Z

sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStore.scala

+    if (version < 0) {
+      throw QueryExecutionErrors.unexpectedStateStoreVersion(version)
+    }
+    hadoopConf.set(StreamExecution.RUN_ID_KEY, storeProviderId.queryRunId.toString)


Why are we setting this twice? Can you add more comments about what is going on here

Yea prob dont need to set multiple times

liviazhu-db · 2025-04-29T22:48:38Z

sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStoreRDD.scala

+    partitionStores.put(partitionId, (store, false))
+
+    // Register a cleanup callback to be executed when the task completes
+    ctxt.addTaskCompletionListener[Unit](_ => {


Why are we adding the listeners here? Is it different from the one in mapPartitionsWithReadStateStore?

@ericm-db - i guess we did not register any listener before in the ReadStoreRDD path ?

...rc/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBStateStoreProvider.scala

anishshri-db · 2025-04-29T23:39:04Z

@ericm-db - in the PR description, please update it to say what the new usage paradigm will look like as well

liviazhu-db · 2025-04-29T23:57:27Z

@ericm-db Can you update the PR description to be more specific about the inefficiency we are addressing here? Basically that in the current impl, we always abort read store, triggering unnecessary reload of the state store.

ericm-db · 2025-04-29T23:59:07Z

@anishshri-db @liviazhu-db Sure yeah sounds good

anishshri-db · 2025-04-30T00:00:08Z

sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStore.scala

+   * @param uniqueId Optional unique identifier for checkpointing
+   * @return A writable StateStore instance that can be used to update and commit changes
+   */
+  def getWriteStoreFromReadStore(


ni: should we rename as upgradeReadStoreToWriteStore ?

anishshri-db · 2025-04-30T00:01:17Z

sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStoreRDD.scala

@@ -34,7 +59,7 @@ abstract class BaseStateStoreRDD[T: ClassTag, U: ClassTag](
    operatorId: Long,
    sessionState: SessionState,
    @transient private val storeCoordinator: Option[StateStoreCoordinatorRef],
-    extraOptions: Map[String, String] = Map.empty) extends RDD[U](dataRDD) {
+    extraOptions: Map[String, String] = Map.empty) extends RDD[U](dataRDD) with Logging {


intentional ?

anishshri-db · 2025-04-30T00:01:46Z

sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStoreRDD.scala


    val inputIter = dataRDD.iterator(partition, ctxt)
    val store = StateStore.getReadOnly(
      storeProviderId, keySchema, valueSchema, keyStateEncoderSpec, storeVersion,
-      stateStoreCkptIds.map(_.apply(partition.index).head),
+      stateStoreCkptIds.map(_.apply(partitionId).head),


Was this a bug before ?

No these are equivalent though. I'll change it back

anishshri-db · 2025-04-30T00:05:34Z

sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/package.scala

-        TaskContext.get().addTaskCompletionListener[Unit](_ => {
-          store.abort()
+        val taskContext = TaskContext.get()
+        taskContext.addTaskCompletionListener[Unit](_ => {


Why do we need it again here ?

ericm-db · 2025-04-30T00:51:52Z

cc @cloud-fan

anishshri-db · 2025-04-30T00:51:55Z

cc - @cloud-fan - could you PTAL too ? especially around the RDD interactions ? Thx

...rc/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBStateStoreProvider.scala

liviazhu-db

Looks good! Could you add a test in StateStoreRDDSuite to check the ThreadLocal logic correctly passes the readstore to the writestore too?

ericm-db · 2025-05-01T18:31:15Z

Looks good! Could you add a test in StateStoreRDDSuite to check the ThreadLocal logic correctly passes the readstore to the writestore too?

Yup, working on that rn!

anishshri-db · 2025-05-01T19:05:15Z

sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStoreRDD.scala

+ * This allows a ReadStateStore to be reused by a subsequent StateStore operation.
+ */
+object StateStoreThreadLocalTracker {
+  private val readStore: ThreadLocal[ReadStateStore] = new ThreadLocal[ReadStateStore]


Can we combine these into a single thread local ? Just make them members of a case class ?

anishshri-db · 2025-05-01T19:12:00Z

sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStoreRDD.scala

+ */
+object StateStoreThreadLocalTracker {
+  /** Case class to hold both the store and its usage state */
+  case class StoreInfo(store: ReadStateStore, usedForWriteStore: Boolean = false)


nit: maybe move members to a new line each ?

liviazhu-db · 2025-05-02T18:26:56Z

...main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala

@@ -194,6 +196,8 @@ private[sql] class HDFSBackedStateStoreProvider extends StateStoreProvider with
        log"for ${MDC(LogKeys.STATE_STORE_PROVIDER, this)}")
    }

+    override def release(): Unit = {}


Can you add a new state and update the state here?

github-actions bot added SQL STRUCTURED STREAMING labels Apr 28, 2025

ericm-db force-pushed the read-store-changes branch from 14565a8 to c4c6c07 Compare April 29, 2025 21:06

ericm-db changed the title ~~[WIP] Adding release() to ReadStateStore interface~~ Adding release() to ReadStateStore interface and reusing ReadStore for Streaming Aggregations Apr 29, 2025

ericm-db changed the title ~~Adding release() to ReadStateStore interface and reusing ReadStore for Streaming Aggregations~~ [SPARK-51955] Adding release() to ReadStateStore interface and reusing ReadStore for Streaming Aggregations Apr 29, 2025

liviazhu-db reviewed Apr 29, 2025

View reviewed changes

...rc/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBStateStoreProvider.scala Show resolved Hide resolved

anishshri-db reviewed Apr 30, 2025

View reviewed changes

ericm-db added 3 commits April 30, 2025 11:43

using threadlocal instead

b3c584b

unnecessary change

63573db

removing clera

f7d0e70

ericm-db force-pushed the read-store-changes branch from 0c913e2 to f7d0e70 Compare April 30, 2025 22:48

ericm-db added 2 commits April 30, 2025 16:01

Added cleanup

560c5c7

adding assertion

81c0eed

liviazhu-db reviewed May 1, 2025

View reviewed changes

...rc/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBStateStoreProvider.scala Outdated Show resolved Hide resolved

setting usedForWriteStore in StateStoreRDD

0465454

liviazhu-db reviewed May 1, 2025

View reviewed changes

anishshri-db reviewed May 1, 2025

View reviewed changes

case class

93f014a

anishshri-db reviewed May 1, 2025

View reviewed changes

anishshri-db approved these changes May 1, 2025

View reviewed changes

ericm-db added 3 commits May 1, 2025 16:31

changes

226f99b

test

a892142

upgradeable read store

768346a

liviazhu-db reviewed May 2, 2025

View reviewed changes

adding state

55664c4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-51955] Adding release() to ReadStateStore interface and reusing ReadStore for Streaming Aggregations #50742

[SPARK-51955] Adding release() to ReadStateStore interface and reusing ReadStore for Streaming Aggregations #50742

ericm-db commented Apr 28, 2025 •

edited

Loading

liviazhu-db Apr 29, 2025

liviazhu-db Apr 29, 2025

anishshri-db Apr 29, 2025

liviazhu-db Apr 29, 2025

liviazhu-db Apr 29, 2025

anishshri-db Apr 29, 2025

liviazhu-db Apr 29, 2025

anishshri-db Apr 30, 2025

anishshri-db commented Apr 29, 2025

liviazhu-db commented Apr 29, 2025 •

edited

Loading

ericm-db commented Apr 29, 2025

anishshri-db Apr 30, 2025

anishshri-db Apr 30, 2025

anishshri-db Apr 30, 2025

ericm-db Apr 30, 2025 •

edited

Loading

anishshri-db Apr 30, 2025

ericm-db commented Apr 30, 2025

anishshri-db commented Apr 30, 2025

liviazhu-db left a comment

ericm-db commented May 1, 2025 •

edited

Loading

anishshri-db May 1, 2025

ericm-db May 1, 2025

anishshri-db May 1, 2025

liviazhu-db May 2, 2025 •

edited

Loading

[SPARK-51955] Adding release() to ReadStateStore interface and reusing ReadStore for Streaming Aggregations #50742

Are you sure you want to change the base?

[SPARK-51955] Adding release() to ReadStateStore interface and reusing ReadStore for Streaming Aggregations #50742

Conversation

ericm-db commented Apr 28, 2025 • edited Loading

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

anishshri-db commented Apr 29, 2025

liviazhu-db commented Apr 29, 2025 • edited Loading

ericm-db commented Apr 29, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ericm-db Apr 30, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ericm-db commented Apr 30, 2025

anishshri-db commented Apr 30, 2025

liviazhu-db left a comment

Choose a reason for hiding this comment

ericm-db commented May 1, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

liviazhu-db May 2, 2025 • edited Loading

Choose a reason for hiding this comment

ericm-db commented Apr 28, 2025 •

edited

Loading

liviazhu-db commented Apr 29, 2025 •

edited

Loading

ericm-db Apr 30, 2025 •

edited

Loading

ericm-db commented May 1, 2025 •

edited

Loading

liviazhu-db May 2, 2025 •

edited

Loading