-
Notifications
You must be signed in to change notification settings - Fork 920
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[flink] Remove Flink state from write-only writers #4185
Conversation
@@ -52,37 +55,54 @@ public TableWriteOperator( | |||
this.initialCommitUser = initialCommitUser; | |||
} | |||
|
|||
private boolean needState() { | |||
if (table.coreOptions().writeOnly()) { | |||
// Commit user for writers are used to avoid conflicts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But users may use the commit user to determine if a table's snapshot is generated by the other job. See : #3474
So, I think you can add a config for users to decide whether they need the commitUser state (default true). And explain in the doc that in bounded stream scenarios, if this configuration is true, there will be side effects(when chaining source and write operator together, if some source parallelism have finished, checkpoint cannot be performed).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This commitUser
is only for writers. What you have mentioned is the commitUser
for comitters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
+1 |
Closing because write-only writers should also wait for snapshot. They need to read latest |
Purpose
Currently Paimon writer operators in Flink have states, which record the
commitUser
(for all writers) and the list of active buckets (for lookup or full-compaction changelog producer). These states prevent a writer object to be closed too early and cause conflicts.However for write-only writers, because they only create new files, there is no conflict. So for these writers we can remove Flink state. This optimization is also useful for bounded stream jobs, because when chaining source and write operator together, if some source parallelism have finished, checkpoint cannot be performed if there are union list states in the operators which are still running.
Tests
Existing tests should cover this change.
API and Format
No format changes.
Documentation
No new feature.