forked from apache/kafka
-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
KAFKA-16077: Streams with state updater fails to close task upon fenc…
…ing (apache#15117) * KAFKA-16077: Streams fails to close task after restoration when input partitions are updated There is a race condition in the state updater that can cause the following: 1. We have an active task in the state updater 2. We get fenced. We recreate the producer, transactions now uninitialized. We ask the state updater to give back the task, add a pending action to close the task clean once it’s handed back 3. We get a new assignment with updated input partitions. The task is still owned by the state updater, so we ask the state updater again to hand it back and add a pending action to update its input partition 4. The task is handed back by the state updater. We update its input partitions but forget to close it clean (pending action was overwritten) 5. Now the task is in an initialized state, but the underlying producer does not have transactions initialized This can cause an IllegalStateException: `Invalid transition attempted from state UNINITIALIZED to state IN_TRANSACTION` when running in EOSv2. To fix this, we introduce a new pending action CloseReviveAndUpdateInputPartitions that is added when we handle a new assignment with updated input partitions, but we still need to close the task before reopening it. We should not remove the task twice, otherwise, we'll end up in this situation 1. We have an active task in the state updater 2. We get fenced. We recreate the producer, transactions now uninitialized. We ask the state updater to give back the task, add a pending action to close the task clean once it’s handed back 3. The state updater moves the task from the updating tasks to the removed tasks 4. We get a new assignment with updated input partitions. The task is still owned by the state updater, so we ask the state updater again to hand it back (adding a task+remove into the task and action queue) and add a pending action to close, revive and update input partitions 5. The task is handed back by the state updater. We close revive and update input partitions, and add the task back to the state updater 6. The state updater executes the "task+remove" action that is still in its task + action queue, and hands the task immediately back to the main thread 7. The main thread discoveres a removed task that was not restored and has no pending action attached to it. IllegalStateException Reviewers: Bruno Cadonna <[email protected]>
- Loading branch information
Showing
6 changed files
with
227 additions
and
30 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.