Skip to content
This repository has been archived by the owner on Sep 17, 2024. It is now read-only.

mysqlfailover daemon not doing anything during forced poweroff of master server #38

Open
harrisonsp opened this issue Aug 13, 2018 · 4 comments

Comments

@harrisonsp
Copy link

While testing mysqlfailover with a simple master-slave replication setup on two separate Rackspace VMs, I was intentionally forcing the master server to power off using the poweroff --force command with the hope that the mysqlfailover daemon would see that the master server was unavailable and failover to the slave server.

I thought that a hard shutdown or immediate/forced loss of power would be handled by the mysqlfailover utility, but the utility, running as a daemon, seemed to do nothing while the master server was powered off until that server was brought back online without MySQL running (LUKS LVM volume for mysql not open and mounted, mysql service in error/failed state). At that point, the daemon recognized that the MySQL server wasn't running on the 'failed' master and failed over to the slave normally.

The last entry in the failover log was a Health Status INFO messaged timestamped a few seconds before I executed the poweroff command on the master server.

I expected that setting the connection-timeout=5 option for the daemon would cause the connections to the master to time out after 5 seconds and override any setting in the Connector/Python library that might have been set differently than the default of 10 seconds. I'm assuming the connections were killed when the master server was forced to power off, but I don't know if that is true.

I also assumed that setting master-fail-retry=10 would mean that the daemon would run the failover check again after that delay expired.

Is there another option I can try? Is this an unhandled edge case? Is there a way I can see more information about the failover daemon or the connections it makes? Is there a MySQL setting that could have kept the connections alive/retrying much longer than I expected?

 5.7.22-0ubuntu0.16.04.1-log (Ubuntu)

MySQL Utilities mysqlfailover version 1.6.5

mysqlfailover --master=/<redacted>/.my.cnf[clientprimary] --slaves=/<redacted>/.my.cnf[clientsecondary] --log=/var/log/mysql_failover.log --verbose --interval=5 --ping=1 --connection-timeout=5 --master-fail-retry=10 --exec-after=/<redacted>/call_failover_py.sh --daemon=start --force

2018-08-13 19:09:23 PM INFO host: <redacted>, port: 3306, role: MASTER, state: UP, gtid_mode: ON, health: OK, version: 5.7.22-0ubuntu0.16.04.1-log, master_log_file: mysql-bin.000001, master_log_pos: 102517, IO_Thread: , SQL_Thread: , Secs_Behind: , Remaining_Delay: , IO_Error_Num: , IO_Error: , SQL_Error_Num: , SQL_Error: , Trans_Behind:
2018-08-13 19:09:23 PM INFO host: <redacted>, port: 3306, role: SLAVE, state: UP, gtid_mode: ON, health: OK, version: 5.7.22-0ubuntu0.16.04.1-log, master_log_file: mysql-bin.000001, master_log_pos: 108539, IO_Thread: Yes, SQL_Thread: Yes, Secs_Behind: 0, Remaining_Delay: No, IO_Error_Num: 0, IO_Error: , SQL_Error_Num: 0, SQL_Error: , Trans_Behind: 0
2018-08-13 19:11:44 PM INFO Master may be down. Waiting for 3 seconds.
2018-08-13 19:11:59 PM INFO Failed to reconnect to the master after 3 attempts.
2018-08-13 19:11:59 PM CRITICAL Master is confirmed to be down or unreachable.
2018-08-13 19:11:59 PM INFO Failover starting in 'auto' mode...
2018-08-13 19:11:59 PM INFO Checking eligibility of slave <redacted>:3306 for candidate.
2018-08-13 19:11:59 PM INFO GTID_MODE=ON ... Ok
2018-08-13 19:11:59 PM INFO Replication user exists ... Ok
2018-08-13 19:11:59 PM INFO Candidate slave <redacted>:3306 will become the new master.
2018-08-13 19:11:59 PM INFO Checking slaves status (before failover).
2018-08-13 19:11:59 PM INFO Preparing candidate for failover.
2018-08-13 19:11:59 PM INFO Reading events in relay log for slave <redacted>:3306
2018-08-13 19:11:59 PM INFO Creating replication user if it does not exist.
2018-08-13 19:11:59 PM INFO Stopping slaves.
2018-08-13 19:11:59 PM INFO Performing STOP on all slaves.
2018-08-13 19:11:59 PM WARNING Executing stop on slave <redacted>:3306 WARN - slave is not configured with this master
2018-08-13 19:12:00 PM INFO Executing stop on slave <redacted>:3306 Ok
2018-08-13 19:12:00 PM INFO Switching slaves to new master.
2018-08-13 19:12:00 PM INFO Disconnecting new master as slave.
2018-08-13 19:12:00 PM INFO Execute on <redacted>:3306: RESET SLAVE ALL
2018-08-13 19:12:00 PM INFO Starting slaves.
2018-08-13 19:12:00 PM INFO Performing START on all slaves.
2018-08-13 19:12:00 PM INFO Spawning external script.
- (my exec-post script output
2018-08-13 19:12:00 PM INFO Executing failover.py
2018-08-13 19:12:00 PM INFO roles master updated to clientsecondary
2018-08-13 19:12:00 PM INFO roles slave updated to clientprimary
2018-08-13 19:12:00 PM INFO Updating Hosts Files
-
2018-08-13 19:12:01 PM INFO Script completed Ok.
2018-08-13 19:12:01 PM INFO Checking slaves for errors.
2018-08-13 19:12:01 PM INFO Failover complete.

@dbpia
Copy link

dbpia commented Apr 7, 2022

It took me more than 10 minutes to print all the messages below. It's a serious problem. (mysql 5.7.16 + util 1.6.4)

Master may be down. Waiting for 3 seconds.
Failed to reconnect to the master after 3 attempts.

@dbpia
Copy link

dbpia commented Apr 8, 2022

util 1.6.5 upgrade
--interval=5 --ping=1 --connection-timeout=5 --master-fail-retry=10 --timeout=10 -vvv --failover-mode=auto

Failed!! Sometimes it's fast, but most of the time it takes a long time.

@dbpia
Copy link

dbpia commented Apr 10, 2022

I checked the source code and saw that the connection function did not properly reflect the connection timeout option. Also, Connector/Python's default is not 10 seconds. With the server down, it takes more than 2 minutes in default state.

BUG#22932375 => https://fossies.org/windows/misc/mysql-utilities-1.6.5.zip/mysql-utilities-1.6.5/CHANGES.txt

@dbpia
Copy link

dbpia commented Apr 10, 2022

To solve the problem, manually add the connection_timeout value to the source(./common/server.py - line 1140) and compile it again.

    try:
        parameters = {
            'user': self.user,
            'host': self.host,
            'port': self.port,
            'connection_timeout': 5
        }

python setup.py build
python setup.py install

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants