-
Notifications
You must be signed in to change notification settings - Fork 312
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug fix for switch_controller when using controller chaining #1591
base: master
Are you sure you want to change the base?
Bug fix for switch_controller when using controller chaining #1591
Conversation
…check from the beginning whenever there is a change in the (de)activate request (but use goto)
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #1591 +/- ##
==========================================
+ Coverage 87.30% 87.35% +0.04%
==========================================
Files 108 108
Lines 9866 9929 +63
Branches 890 893 +3
==========================================
+ Hits 8614 8673 +59
- Misses 929 932 +3
- Partials 323 324 +1
Flags with carried forward coverage won't be shown. Click here to find out more.
|
This pull request is in conflict. Could you fix it @TakashiSato? |
I have fixed the conflict. Due to the changes in the recently merged #1021, BUG3 no longer manifests. |
This PR is separated to address only the changes related to bug fix #1568 , fixing a bug that occurs under specific conditions when handling chainable controllers in the controller_manager's switch_controller.
Note: To focus on meaningful changes, it is recommended to compare the code using "hide whitespace" mode as follows.
https://github.com/ros-controls/ros2_control/compare/master...TakashiSato:ros2_control:feature/fix_switch_controller_bug?diff=split&w=1
Explanation of the Bugs
Using the typical model for controller chaining, which is also used in this test, the following explanations introduce three situations where bugs occur.
Note that these bugs occur only when the strictness value is set to BEST_EFFORT. When the strictness value is set to STRICT, switch_controller fails, and these bugs do not arise.
Additionally, the test cases to detect these bugs are implemented in this commit.
BUG1
pid_left
(not_chained),pid_right
(not_chained)position_tracking
,diff_drive
diff_drive
pid_right
pid_left
(in_chained)position_tracking
,diff_drive
,pid_right
In this situation, the start request for
diff_drive
should be rejected, and only the stop command forpid_right
should be accepted. While this result is achieved, there is an issue wherepid_left
incorrectly enters in_chained mode.The cause of this bug lies in this section.
When checking the activate request for
diff_drive
, the first following controller,pid_left
, is checked for any inconsistencies along with the request. Since no issues are detected,pid_left
is added to theto_chained_mode_request
.However, when the check is performed with the next controller,
pid_right
, inconsistencies are detected becausepid_right
is targeted for deactivation, leading to the rejection of thediff_drive
activate request.Since the
to_chained_mode_request
is not re-evaluated in subsequent processes, this request remains and results in the actual transition to chained mode during manage_switch.BUG2
position_tracking
,diff_drive
,pid_left
,pid_right
diff_drive
,pid_left
pid_left
(in_chained)position_tracking
,diff_drive
,pid_right
This situation is similar to BUG1, where the start request for diff_drive is rejected, and only
pid_left
is activated. However,pid_left
still incorrectly enters in_chained mode.The cause of this bug is almost the same as BUG1. During the activate check for
diff_drive
, since the following controllerpid_left
is also a target for activation,pid_left
is added to theto_chained_mode_request
.Then, during the subsequent check with
pid_right
, sincepid_right
is not a target for activation, the activate request fordiff_drive
is rejected, leaving theto_chained_mode_request
forpid_left
.Incidentally, in this situation, the checks are performed in the order of
pid_left
followed bypid_right
. Therefore, ifpid_right
andpid_left
in the switch requests for BUG1 and BUG2 are swapped, the checks will function correctly, and these issues will not occur.BUG3
position_tracking
(not_chained),diff_drive
(in_chained),pid_left
(in_chained),pid_right
(in_chained)diff_drive
In this situation, since it is impossible to deactivate only
diff_drive
, the request should be rejected and no changes should occur.However, the following switch requests are internally generated, and since these switches cannot be handled properly, a deadlock occurs.
pid_left
,pid_right
pid_left
,pid_right
The sequence in which such switch requests are generated is as follows:
diff_drive
is a deactivate target, inpropagate_deactivation_of_chained_mode
, its following controllers,pid_left
andpid_right
, are added to thefrom_chained_mode_request
.check_following_controllers_for_activate
does nothing. (Note: The process to erase from from_chained_mode_request exists only within this function.)check_preceding_controllers_for_deactivate
, the preceding controller ofdiff_drive
,position_tracking
, is active and not present in the deactivate_request, resulting in the rejection of thediff_drive
deactivate request.pid_left
andpid_right
are included in thefrom_chained_mode_request
and meet the conditions, they are added to the(de)activate_request
as restart targets due to this process.Bug Fix Proposal
In this part of the check process,
propagate_deactivation_of_chained_mode
,check_following_controllers_for_activate
, andcheck_preceding_controllers_for_deactivate
perform judgments based on the content of the(de)activate_request
. Therefore, if the content of the (de)activate_request changes, the process should be retried from the beginning. However, during this process, it is also possible that the content offrom/to_chained_mode_request
has changed, so it is necessary to properly clear these as well.This commit demonstrates the simplest change to fix the bug based on this idea.
However, since the above commit uses
goto
, which is generally not recommended, the final implementation in this PR has been rewritten using lambda functions and a while loop for the retry process (The corresponding commit).