Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split AutoTuner for Profiling and Qualification and Override BATCH_SIZE_BYTES #1471

Merged
merged 4 commits into from
Dec 31, 2024

Conversation

parthosa
Copy link
Collaborator

@parthosa parthosa commented Dec 18, 2024

Fixes #1470.
Fixes #1399.

This PR refactors the AutoTuner into different classes for the Qualification and Profiling tools. This separation allows us to provide recommendations based on the specific tool being used.

For example, with this change, the AutoTuner for the Qualification tool will now recommend BATCH_SIZE_BYTES as 1GB, while the AutoTuner for Profiling tool will recommend 2GB.

Changes:

  1. Refactored object AutoTuner into trait AutoTunerConfigsProvider:

    • Introduced concrete objects QualificationAutoTunerConfigsProvider and ProfilingAutoTunerConfigsProvider.
    • Overridden BATCH_SIZE_BYTES in QualificationAutoTunerConfigsProvider.
  2. Extended class AutoTuner to QualificationAutoTuner:

    • This can be to add logic specific to the Qualification AutoTuner in the future.
  3. Refactored AutoTunerSuite into BaseAutoTunerSuite:

    • Created test suites ProfilingAutoTunerSuite and QualificationAutoTunerSuite.
    • Added unit test for batch size in Qualification AutoTuner.
  4. Renamed the existing class QualificationAutoTuner to QualificationAutoTunerRunner:

    • This class was not an implementation of the Qualification AutoTuner but a wrapper for running the AutoTuner with the Qualification tool.

@parthosa parthosa added feature request New feature or request core_tools Scope the core module (scala) labels Dec 18, 2024
@parthosa parthosa self-assigned this Dec 18, 2024
@parthosa parthosa marked this pull request as ready for review December 18, 2024 17:29
amahussein
amahussein previously approved these changes Dec 20, 2024
Copy link
Collaborator

@amahussein amahussein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @parthosa
LGTME!

nartal1
nartal1 previously approved these changes Dec 30, 2024
Copy link
Collaborator

@nartal1 nartal1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @parthosa !

amahussein
amahussein previously approved these changes Dec 30, 2024
Copy link
Collaborator

@amahussein amahussein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTME
Thanks @parthosa

cindyyuanjiang
cindyyuanjiang previously approved these changes Dec 30, 2024
Copy link
Collaborator

@cindyyuanjiang cindyyuanjiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @parthosa! Very minor nits.

val minOverhead = executorMemOverhead + (MIN_PINNED_MEMORY_MB + MIN_SPILL_MEMORY_MB)
val minOverhead = executorMemOverhead + (
autoTunerConfigsProvider.MIN_PINNED_MEMORY_MB + autoTunerConfigsProvider.MIN_SPILL_MEMORY_MB
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QQ: is there a reason we perform (autoTunerConfigsProvider.MIN_PINNED_MEMORY_MB + autoTunerConfigsProvider.MIN_SPILL_MEMORY_MB) first?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the braces were added for readability, to distinguish that MIN_PINNED_MEMORY_MB and MIN_SPILL_MEMORY_MB are constants, while executorMemOverhead is a calculated value.

Signed-off-by: Partho Sarthi <[email protected]>
@parthosa parthosa dismissed stale reviews from cindyyuanjiang, amahussein, and nartal1 via 026e735 December 30, 2024 22:32
Copy link
Collaborator

@cindyyuanjiang cindyyuanjiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @parthosa! LGTM.

@parthosa parthosa merged commit 5380342 into NVIDIA:dev Dec 31, 2024
15 checks passed
@parthosa parthosa deleted the spark-rapids-tools-1399 branch December 31, 2024 00:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core_tools Scope the core module (scala) feature request New feature or request
Projects
None yet
4 participants