-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLINK-36270][Connectors/DynamoDB] Assign only children splits of a split when it is marked as finished #164
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution @gguptp ! Left some comments
...a/org/apache/flink/connector/dynamodb/source/enumerator/DynamoDbStreamsSourceEnumerator.java
Outdated
Show resolved
Hide resolved
...rc/main/java/org/apache/flink/connector/dynamodb/source/enumerator/tracker/SplitTracker.java
Outdated
Show resolved
Hide resolved
...rc/main/java/org/apache/flink/connector/dynamodb/source/enumerator/tracker/SplitTracker.java
Outdated
Show resolved
Hide resolved
...rc/main/java/org/apache/flink/connector/dynamodb/source/enumerator/tracker/SplitTracker.java
Outdated
Show resolved
Hide resolved
...rc/main/java/org/apache/flink/connector/dynamodb/source/enumerator/tracker/SplitTracker.java
Outdated
Show resolved
Hide resolved
a709e8e
to
2a5cf8b
Compare
Thanks for the review @hlteoh37! The review comments have been addressed now |
...rc/main/java/org/apache/flink/connector/dynamodb/source/enumerator/tracker/SplitTracker.java
Outdated
Show resolved
Hide resolved
…plit when it is marked as finished
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for the contribution @gguptp
Purpose of the change
When a split is marked as finished, we are iterating over all splits and assigning available splits to readers. On a large enough stream with enough shards, we'd be spending a lot of time in finishedSplits because we are always calling
getSplitsForAssignment
function which is O(n). This leads to performance issues. Making this an O(1) algo to solve the issuesBefore the fix:
Generally we see long checkpoint duration
We see that the start delay is very long
App is stuck like this for long
After the fix:
Verifying this change
Please make sure both new and modified tests in this PR follows the conventions defined in our code quality guide: https://flink.apache.org/contributing/code-style-and-quality-common.html#testing
(Please pick either of the following options)
Significant changes
(Please check any boxes [x] if the answer is "yes". You can first publish the PR and check them afterwards, for convenience.)
@Public(Evolving)
)