Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Infra] [Spark] Reduce delta-spark CI test runtime by 33 mins (1h46m to 1h13m) #3712

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

scottsand-db
Copy link
Collaborator

@scottsand-db scottsand-db commented Sep 23, 2024

Which Delta project/connector is this regarding?

  • Spark
  • Standalone
  • Flink
  • Kernel
  • Other (Infra)

Description

This PR reduces delta-spark CI test runtime by 33 mins. Previously the max shard duration was 1h 46 mins, and now it is 1h 13 mins.

This PR does so by the following

  1. We add an extra shard
  2. I used [Infra] [WIP] Add test report listener to delta-spark and bucket test suites by estimated runtime #3694 to collect some metrics about delta-spark test runtime execution.
  3. I specifically identified (a) the 50 slowest test suites and (b) the average suite duration excluding those top 50 (it was 0.71 minutes)
  4. I used this information to update TestParallelization to do smarter test suite assignment. The logic is as follows:
    • For the top 50 slowest test suites, we assign them deterministically by, in sorted descending order, assigning the suites to the shard + group (group means thread) with the lowest duration so far.
    • For the remaining tests that are not in the top 50, we assign them to a random shard, and within that shard we assign it to the group with the lowest duration so far, too
  5. We also update the hash function used to me MurmurHash3 which is known to create balanced assignments in scenarios where the input strings (test names) might have similar prefixes or patterns

Note that purely adding another shard and using a better hash function does NOT yield any better results. That was attempted here: #3715.

How was this patch tested?

GitHub Ci tests.

https://github.com/delta-io/delta/actions/runs/11004181545?pr=3712

image

Does this PR introduce any user-facing changes?

No.

@scottsand-db scottsand-db force-pushed the improved_delta_spark_test_assignment branch from c01294c to f3833ac Compare September 23, 2024 18:57
@scottsand-db scottsand-db force-pushed the improved_delta_spark_test_assignment branch from f3833ac to 5a3e12c Compare September 23, 2024 19:14
@scottsand-db scottsand-db changed the title [Infra] [Spark] Improve delta-spark test assignment / distribution among CI Shards / Groups [Infra] [Spark] Improve delta-spark test assignment / distribution among CI Shards / Groups [Attempt 1] Sep 24, 2024
@scottsand-db scottsand-db changed the title [Infra] [Spark] Improve delta-spark test assignment / distribution among CI Shards / Groups [Attempt 1] [Infra] [Spark] Reduce delta-spark CI test runtime by 25 mins Sep 24, 2024
@scottsand-db scottsand-db self-assigned this Sep 24, 2024
@scottsand-db scottsand-db changed the title [Infra] [Spark] Reduce delta-spark CI test runtime by 25 mins [Infra] [Spark] Reduce delta-spark CI test runtime by 25 mins (1h46m to 1h21m) Sep 24, 2024
@scottsand-db scottsand-db changed the title [Infra] [Spark] Reduce delta-spark CI test runtime by 25 mins (1h46m to 1h21m) [Infra] [Spark] Reduce delta-spark CI test runtime by 33 mins (1h46m to 1h13m) Sep 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants