Skip to content

Investigate and potentially add support for spark connect #284

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
22 of 24 tasks
razvan opened this issue Sep 18, 2023 · 3 comments
Open
22 of 24 tasks

Investigate and potentially add support for spark connect #284

razvan opened this issue Sep 18, 2023 · 3 comments

Comments

@razvan
Copy link
Member

razvan commented Sep 18, 2023

Spark Connect

Spark 3.5 introduces a new client called Spark Connect.

The use case seems to be thin clients that connect to a running spark driver.

This probably means that the operator needs to be able to start spark connect servers without spark applications and publish a service for "connect" clients.

Roadmap

Rough roadmap to GA:

  • POC: can set up a spark-connect server with kubernetes as resource manager, basic integration test
  • minimal CRD: drop the stateful set, minimum configuration for the server (jvm props, logging)
    • server
      • deployment with one replica
      • jvm arg overrides
      • config overrides
      • env overrides
      • log configuration and aggregation with vector
      • pod overrides
      • resource requests
      • status and transition events
      • reconciliation operation (paused, stopped, etc)
    • executor
      • jvm arg overrides
      • config overrides
      • env overrides
      • log configuration and aggregation
      • resource requests
      • pod affinity
  • add preliminary documentation
  • expose Prometheus metrics
  • integrate with the history server See: doc: comment on spark history integration #559
  • integrate with the listener op
  • create a new demo

Related PRs

@razvan razvan changed the title Investigate and potentially ad support for spark connect Investigate and potentially add support for spark connect Sep 18, 2023
@adwk67 adwk67 self-assigned this Dec 20, 2023
@adwk67 adwk67 removed their assignment Aug 30, 2024
@timrobertson100
Copy link

We have started exploring Spark Connect at GBIF.org. Our primary use case is to explore having a long running spark cluster hold an in memory cached table, for apps to do filtered data egress with minimal startup cost.

@timrobertson100
Copy link

@razvan - thank you for your work. When you are ready, we will be interested in helping to test.

@razvan
Copy link
Member Author

razvan commented Apr 11, 2025

@timrobertson100 - we merged preliminary support for spark connect deployments in the main branch. Looking forward for your feedback!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: In Progress
Status: Development: In Progress
Development

No branches or pull requests

4 participants