-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Disabled the setting reboot.host.and.alert.management.on.heartbeat.timeout
by default
#10111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Disabled the setting reboot.host.and.alert.management.on.heartbeat.timeout
by default
#10111
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## 4.19 #10111 +/- ##
============================================
- Coverage 15.13% 15.12% -0.01%
+ Complexity 11268 11262 -6
============================================
Files 5408 5408
Lines 473867 473867
Branches 57778 57778
============================================
- Hits 71700 71684 -16
- Misses 394165 394185 +20
+ Partials 8002 7998 -4
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
@slavkap , have you tested this with HA enabled? |
@slavkap this changes the current behaviour. |
`reboot.host.and.alert.management.on.heartbeat.timeout` has to be disabled. Even the high availability isn't enabled when there is an issue with a storage CloudStack will reboot the host
79a5f78
to
78180ff
Compare
This pull request has merge conflicts. Dear author, please fix the conflicts and sync your branch with the base branch. |
@DaanHoogland, I've tested this with and without HA |
do-not-reboot-host-on-heartbeat-timeout
to not reboot a host on heartbeat timeoutreboot.host.and.alert.management.on.heartbeat.timeout
by default
@slavkap , I changed the title . Hope you don't mind. It was a bit confusing to me. |
@DaanHoogland, I don't mind the change, thanks! |
moved forward |
@DaanHoogland, I rebased it on main as @weizhouapache suggested merging it possibly in a major release. |
We experienced the unfortunate event of this issue, causing cascading reboots of all our hosts while the NFS server had no running VM. It was an operational nightmare that resulted in approximately 45 minutes of downtime. Changing its default value to false offers us more gain than loss. We adjusted it to our settings; thank you, Wei. This was simply catastrophic! |
As someone who works with VMware products, I never had an experience where a host reboots when datastore are inaccessible. I believe changing the default for CloudStack to "false" is a great move. |
@blueorangutan package |
@sureshanaparti a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress. |
Packaging result [SF]: ✖️ el8 ✖️ el9 ✖️ debian ✖️ suse15. SL-JID 13621 |
Packaging result [SF]: ✖️ el8 ✖️ el9 ✔️ debian ✖️ suse15. SL-JID 13671 |
Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 13677 |
@blueorangutan test |
@DaanHoogland a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code lgtm
@sureshanaparti , I think we can merge this one, pending smoke tests. But it merits a note in the release notes page for the next version. |
[SF] Trillian test result (tid-13502)
|
Description
This PR disables the setting
reboot.host.and.alert.management.on.heartbeat.timeout
. When there is a storage issue, even if the high availability isn't enabled, CloudStack will reboot the host.Types of changes
Bug Severity
Screenshots (if appropriate):
How Has This Been Tested?