[SPARK-51947] Spark connect model cache offloading #50752

WeichenXu123 · 2025-04-29T11:13:52Z

What changes were proposed in this pull request?

Support offloading model cache to spark driver local disk

Why are the changes needed?

Motivation: this feature helps to reduce spark driver memory pressure.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Manually.

Was this patch authored or co-authored using generative AI tooling?

No.

Signed-off-by: Weichen Xu <[email protected]>

xi-db · 2025-05-02T09:52:16Z

sql/connect/server/src/main/scala/org/apache/spark/sql/connect/ml/MLCache.scala

+        .build[String, CacheItem]()
+        .asMap()
+    } else {
+      new ConcurrentHashMap[String, CacheItem]()


Do you think it is risky that mlcache has no memory limit when offloading is disabled? Probably it makes sense to keep the old behaviour in this case: mlcache also has memory limit and retention time when offloading is disabled, but once a model is evicted, future access to it will throw an error CACHE_INVALID. WDYT?

init

9b30983

Signed-off-by: Weichen Xu <[email protected]>

github-actions bot added SQL ML CONNECT labels Apr 29, 2025

WeichenXu123 added 8 commits April 29, 2025 19:26

update

5f115e1

Signed-off-by: Weichen Xu <[email protected]>

Merge branch 'master' into mlcache-offload

3e33bbd

update

0697e41

Signed-off-by: Weichen Xu <[email protected]>

update

2d5ae26

Signed-off-by: Weichen Xu <[email protected]>

update

ad97210

Signed-off-by: Weichen Xu <[email protected]>

update

0736b37

Signed-off-by: Weichen Xu <[email protected]>

format

281cb9b

Signed-off-by: Weichen Xu <[email protected]>

update

fbad3b5

Signed-off-by: Weichen Xu <[email protected]>

github-actions bot added the PYTHON label Apr 30, 2025

WeichenXu123 added 3 commits May 1, 2025 13:51

update

8f9e7a4

Signed-off-by: Weichen Xu <[email protected]>

format

4b0f238

Signed-off-by: Weichen Xu <[email protected]>

update

060844f

Signed-off-by: Weichen Xu <[email protected]>

WeichenXu123 changed the title ~~[WIP][SPARK-51947] Spark connect model cache offloading~~ [SPARK-51947] Spark connect model cache offloading May 1, 2025

WeichenXu123 requested a review from zhengruifeng May 1, 2025 06:36

WeichenXu123 added 7 commits May 1, 2025 14:48

refine err

64c18e5

Signed-off-by: Weichen Xu <[email protected]>

update

6e3902f

Signed-off-by: Weichen Xu <[email protected]>

format

83fc19f

Signed-off-by: Weichen Xu <[email protected]>

format

cf015b5

Signed-off-by: Weichen Xu <[email protected]>

Merge branch 'master' into mlcache-offload

642509b

update

a1577be

Signed-off-by: Weichen Xu <[email protected]>

Merge branch 'master' into mlcache-offload

3a58af0

xi-db reviewed May 2, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-51947] Spark connect model cache offloading #50752

[SPARK-51947] Spark connect model cache offloading #50752

WeichenXu123 commented Apr 29, 2025

xi-db May 2, 2025

[SPARK-51947] Spark connect model cache offloading #50752

Are you sure you want to change the base?

[SPARK-51947] Spark connect model cache offloading #50752

Conversation

WeichenXu123 commented Apr 29, 2025

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

xi-db May 2, 2025

Choose a reason for hiding this comment