You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Sep 17, 2024. It is now read-only.
While testing mysqlfailover with a simple master-slave replication setup on two separate Rackspace VMs, I was intentionally forcing the master server to power off using the poweroff --force command with the hope that the mysqlfailover daemon would see that the master server was unavailable and failover to the slave server.
I thought that a hard shutdown or immediate/forced loss of power would be handled by the mysqlfailover utility, but the utility, running as a daemon, seemed to do nothing while the master server was powered off until that server was brought back online without MySQL running (LUKS LVM volume for mysql not open and mounted, mysql service in error/failed state). At that point, the daemon recognized that the MySQL server wasn't running on the 'failed' master and failed over to the slave normally.
The last entry in the failover log was a Health Status INFO messaged timestamped a few seconds before I executed the poweroff command on the master server.
I expected that setting the connection-timeout=5 option for the daemon would cause the connections to the master to time out after 5 seconds and override any setting in the Connector/Python library that might have been set differently than the default of 10 seconds. I'm assuming the connections were killed when the master server was forced to power off, but I don't know if that is true.
I also assumed that setting master-fail-retry=10 would mean that the daemon would run the failover check again after that delay expired.
Is there another option I can try? Is this an unhandled edge case? Is there a way I can see more information about the failover daemon or the connections it makes? Is there a MySQL setting that could have kept the connections alive/retrying much longer than I expected?
5.7.22-0ubuntu0.16.04.1-log (Ubuntu)
MySQL Utilities mysqlfailover version 1.6.5
mysqlfailover --master=/<redacted>/.my.cnf[clientprimary] --slaves=/<redacted>/.my.cnf[clientsecondary] --log=/var/log/mysql_failover.log --verbose --interval=5 --ping=1 --connection-timeout=5 --master-fail-retry=10 --exec-after=/<redacted>/call_failover_py.sh --daemon=start --force
2018-08-13 19:09:23 PM INFO host: <redacted>, port: 3306, role: MASTER, state: UP, gtid_mode: ON, health: OK, version: 5.7.22-0ubuntu0.16.04.1-log, master_log_file: mysql-bin.000001, master_log_pos: 102517, IO_Thread: , SQL_Thread: , Secs_Behind: , Remaining_Delay: , IO_Error_Num: , IO_Error: , SQL_Error_Num: , SQL_Error: , Trans_Behind:
2018-08-13 19:09:23 PM INFO host: <redacted>, port: 3306, role: SLAVE, state: UP, gtid_mode: ON, health: OK, version: 5.7.22-0ubuntu0.16.04.1-log, master_log_file: mysql-bin.000001, master_log_pos: 108539, IO_Thread: Yes, SQL_Thread: Yes, Secs_Behind: 0, Remaining_Delay: No, IO_Error_Num: 0, IO_Error: , SQL_Error_Num: 0, SQL_Error: , Trans_Behind: 0
2018-08-13 19:11:44 PM INFO Master may be down. Waiting for 3 seconds.
2018-08-13 19:11:59 PM INFO Failed to reconnect to the master after 3 attempts.
2018-08-13 19:11:59 PM CRITICAL Master is confirmed to be down or unreachable.
2018-08-13 19:11:59 PM INFO Failover starting in 'auto' mode...
2018-08-13 19:11:59 PM INFO Checking eligibility of slave <redacted>:3306 for candidate.
2018-08-13 19:11:59 PM INFO GTID_MODE=ON ... Ok
2018-08-13 19:11:59 PM INFO Replication user exists ... Ok
2018-08-13 19:11:59 PM INFO Candidate slave <redacted>:3306 will become the new master.
2018-08-13 19:11:59 PM INFO Checking slaves status (before failover).
2018-08-13 19:11:59 PM INFO Preparing candidate for failover.
2018-08-13 19:11:59 PM INFO Reading events in relay log for slave <redacted>:3306
2018-08-13 19:11:59 PM INFO Creating replication user if it does not exist.
2018-08-13 19:11:59 PM INFO Stopping slaves.
2018-08-13 19:11:59 PM INFO Performing STOP on all slaves.
2018-08-13 19:11:59 PM WARNING Executing stop on slave <redacted>:3306 WARN - slave is not configured with this master
2018-08-13 19:12:00 PM INFO Executing stop on slave <redacted>:3306 Ok
2018-08-13 19:12:00 PM INFO Switching slaves to new master.
2018-08-13 19:12:00 PM INFO Disconnecting new master as slave.
2018-08-13 19:12:00 PM INFO Execute on <redacted>:3306: RESET SLAVE ALL
2018-08-13 19:12:00 PM INFO Starting slaves.
2018-08-13 19:12:00 PM INFO Performing START on all slaves.
2018-08-13 19:12:00 PM INFO Spawning external script.
- (my exec-post script output
2018-08-13 19:12:00 PM INFO Executing failover.py
2018-08-13 19:12:00 PM INFO roles master updated to clientsecondary
2018-08-13 19:12:00 PM INFO roles slave updated to clientprimary
2018-08-13 19:12:00 PM INFO Updating Hosts Files
-
2018-08-13 19:12:01 PM INFO Script completed Ok.
2018-08-13 19:12:01 PM INFO Checking slaves for errors.
2018-08-13 19:12:01 PM INFO Failover complete.
The text was updated successfully, but these errors were encountered:
I checked the source code and saw that the connection function did not properly reflect the connection timeout option. Also, Connector/Python's default is not 10 seconds. With the server down, it takes more than 2 minutes in default state.
While testing
mysqlfailover
with a simple master-slave replication setup on two separate Rackspace VMs, I was intentionally forcing the master server to power off using thepoweroff --force
command with the hope that themysqlfailover
daemon would see that the master server was unavailable and failover to the slave server.I thought that a hard shutdown or immediate/forced loss of power would be handled by the mysqlfailover utility, but the utility, running as a daemon, seemed to do nothing while the master server was powered off until that server was brought back online without MySQL running (LUKS LVM volume for mysql not open and mounted, mysql service in error/failed state). At that point, the daemon recognized that the MySQL server wasn't running on the 'failed' master and failed over to the slave normally.
The last entry in the failover log was a Health Status INFO messaged timestamped a few seconds before I executed the poweroff command on the master server.
I expected that setting the
connection-timeout=5
option for the daemon would cause the connections to the master to time out after 5 seconds and override any setting in the Connector/Python library that might have been set differently than the default of 10 seconds. I'm assuming the connections were killed when the master server was forced to power off, but I don't know if that is true.I also assumed that setting
master-fail-retry=10
would mean that the daemon would run the failover check again after that delay expired.Is there another option I can try? Is this an unhandled edge case? Is there a way I can see more information about the failover daemon or the connections it makes? Is there a MySQL setting that could have kept the connections alive/retrying much longer than I expected?
The text was updated successfully, but these errors were encountered: