Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-36270][Connectors/DynamoDB] Assign only children splits of a split when it is marked as finished #164

Merged
merged 1 commit into from
Sep 13, 2024

Conversation

gguptp
Copy link
Contributor

@gguptp gguptp commented Sep 12, 2024

Purpose of the change

When a split is marked as finished, we are iterating over all splits and assigning available splits to readers. On a large enough stream with enough shards, we'd be spending a lot of time in finishedSplits because we are always calling getSplitsForAssignment function which is O(n). This leads to performance issues. Making this an O(1) algo to solve the issues

Before the fix:

  • Generally we see long checkpoint duration
    image

  • We see that the start delay is very long
    image

  • App is stuck like this for long
    image

After the fix:

  • We see checkpoint duration now reduced to 1second
image
  • No more failed checkpoints due to long checkpoints

Verifying this change

Please make sure both new and modified tests in this PR follows the conventions defined in our code quality guide: https://flink.apache.org/contributing/code-style-and-quality-common.html#testing

(Please pick either of the following options)

  • This change is already covered by existing UTs

Significant changes

(Please check any boxes [x] if the answer is "yes". You can first publish the PR and check them afterwards, for convenience.)

  • Dependencies have been added or upgraded
  • Public API has been changed (Public API is any class annotated with @Public(Evolving))
  • Serializers have been changed
  • New feature has been introduced
    • If yes, how is this documented? (not applicable / docs / JavaDocs / not documented)

Copy link
Contributor

@hlteoh37 hlteoh37 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution @gguptp ! Left some comments

@gguptp
Copy link
Contributor Author

gguptp commented Sep 12, 2024

Thanks for the review @hlteoh37! The review comments have been addressed now

@hlteoh37
Copy link
Contributor

hlteoh37 commented Sep 13, 2024

Add some pictures for the performance bottleneck!

We see the checkpoints taking a long time to complete

image

We see that the time is spent on "Start Delay"

image

We also see new checkpoints being stuck in an "unacknowledged" state

image

@gguptp
Copy link
Contributor Author

gguptp commented Sep 13, 2024

Add some pictures for the performance bottleneck!

We see the checkpoints taking a long time to complete

image

We see that the time is spent on "Start Delay"

image

We also see new checkpoints being stuck in an "unacknowledged" state

image

Updated the PR overview

Copy link
Contributor

@hlteoh37 hlteoh37 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for the contribution @gguptp

@hlteoh37 hlteoh37 merged commit 9b0d983 into apache:main Sep 13, 2024
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants