Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Qual tool should recommend spark.executor.cores based on best TCO value from internal benchmark #1334

Open
viadea opened this issue Sep 6, 2024 · 0 comments
Assignees
Labels
core_tools Scope the core module (scala) feature request New feature or request

Comments

@viadea
Copy link
Collaborator

viadea commented Sep 6, 2024

Is your feature request related to a problem? Please describe.
Qual tool should recommend spark.executor.cores based on best TCO value from internal benchmark instead of inheriting CPU run.
For example in an extreme situation, if CPU run's spark.executor.cores=1 , currently Qual tool will recommend GPU run uses spark.executor.cores=1 as well which does not seem right from TCO point of view.

Imagine if CPU run is using 160 x spark.executor.cores=1, should GPU run also use:
160 x (spark.executor.cores=1, 1 GPU) -- Qual tool recommendation
Or
10 x (spark.executor.cores=16, 1 GPU)? -- the proposal here

Describe the solution you'd like
My proposal is:
For on-prem cluster at least, we set the recommended GPU run’s spark.executor.cores to 16 or 8 instead of inheriting the CPU’s value.
Of course, the 8 or 16 or whatever value should come from internal benchmark results to prove this setting can achieve best TCO.

@viadea viadea added ? - Needs Triage feature request New feature or request labels Sep 6, 2024
@amahussein amahussein added core_tools Scope the core module (scala) and removed ? - Needs Triage labels Sep 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core_tools Scope the core module (scala) feature request New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants