mysqlfailover daemon not doing anything during forced poweroff of master server

While testing `mysqlfailover` with a simple master-slave replication setup on two separate Rackspace VMs, I was intentionally forcing the master server to power off using the `poweroff --force` command with the hope that the `mysqlfailover` daemon would see that the master server was unavailable and failover to the slave server.

I thought that a hard shutdown or immediate/forced loss of power would be handled by the mysqlfailover utility, but the utility, running as a daemon, seemed to do nothing while the master server was powered off until that server was brought back online without MySQL running (LUKS LVM volume for mysql not open and mounted, mysql service in error/failed state). At that point, the daemon recognized that the MySQL server wasn't running on the 'failed' master and failed over to the slave normally.

The last entry in the failover log was a Health Status INFO messaged timestamped a few seconds before I executed the poweroff command on the master server. 

I expected that setting the `connection-timeout=5` option for the daemon would cause the connections to the master to time out after 5 seconds and override any setting in the Connector/Python library that might have been set differently than the default of 10 seconds. I'm assuming the connections were killed when the master server was forced to power off, but I don't know if that is true.

I also assumed that setting `master-fail-retry=10` would mean that the daemon would run the failover check again after that delay expired.

Is there another option I can try? Is this an unhandled edge case? Is there a way I can see more information about the failover daemon or the connections it makes? Is there a MySQL setting that could have kept the connections alive/retrying much longer than I expected?
```
 5.7.22-0ubuntu0.16.04.1-log (Ubuntu)

MySQL Utilities mysqlfailover version 1.6.5

mysqlfailover --master=/<redacted>/.my.cnf[clientprimary] --slaves=/<redacted>/.my.cnf[clientsecondary] --log=/var/log/mysql_failover.log --verbose --interval=5 --ping=1 --connection-timeout=5 --master-fail-retry=10 --exec-after=/<redacted>/call_failover_py.sh --daemon=start --force

2018-08-13 19:09:23 PM INFO host: <redacted>, port: 3306, role: MASTER, state: UP, gtid_mode: ON, health: OK, version: 5.7.22-0ubuntu0.16.04.1-log, master_log_file: mysql-bin.000001, master_log_pos: 102517, IO_Thread: , SQL_Thread: , Secs_Behind: , Remaining_Delay: , IO_Error_Num: , IO_Error: , SQL_Error_Num: , SQL_Error: , Trans_Behind:
2018-08-13 19:09:23 PM INFO host: <redacted>, port: 3306, role: SLAVE, state: UP, gtid_mode: ON, health: OK, version: 5.7.22-0ubuntu0.16.04.1-log, master_log_file: mysql-bin.000001, master_log_pos: 108539, IO_Thread: Yes, SQL_Thread: Yes, Secs_Behind: 0, Remaining_Delay: No, IO_Error_Num: 0, IO_Error: , SQL_Error_Num: 0, SQL_Error: , Trans_Behind: 0
2018-08-13 19:11:44 PM INFO Master may be down. Waiting for 3 seconds.
2018-08-13 19:11:59 PM INFO Failed to reconnect to the master after 3 attempts.
2018-08-13 19:11:59 PM CRITICAL Master is confirmed to be down or unreachable.
2018-08-13 19:11:59 PM INFO Failover starting in 'auto' mode...
2018-08-13 19:11:59 PM INFO Checking eligibility of slave <redacted>:3306 for candidate.
2018-08-13 19:11:59 PM INFO GTID_MODE=ON ... Ok
2018-08-13 19:11:59 PM INFO Replication user exists ... Ok
2018-08-13 19:11:59 PM INFO Candidate slave <redacted>:3306 will become the new master.
2018-08-13 19:11:59 PM INFO Checking slaves status (before failover).
2018-08-13 19:11:59 PM INFO Preparing candidate for failover.
2018-08-13 19:11:59 PM INFO Reading events in relay log for slave <redacted>:3306
2018-08-13 19:11:59 PM INFO Creating replication user if it does not exist.
2018-08-13 19:11:59 PM INFO Stopping slaves.
2018-08-13 19:11:59 PM INFO Performing STOP on all slaves.
2018-08-13 19:11:59 PM WARNING Executing stop on slave <redacted>:3306 WARN - slave is not configured with this master
2018-08-13 19:12:00 PM INFO Executing stop on slave <redacted>:3306 Ok
2018-08-13 19:12:00 PM INFO Switching slaves to new master.
2018-08-13 19:12:00 PM INFO Disconnecting new master as slave.
2018-08-13 19:12:00 PM INFO Execute on <redacted>:3306: RESET SLAVE ALL
2018-08-13 19:12:00 PM INFO Starting slaves.
2018-08-13 19:12:00 PM INFO Performing START on all slaves.
2018-08-13 19:12:00 PM INFO Spawning external script.
- (my exec-post script output
2018-08-13 19:12:00 PM INFO Executing failover.py
2018-08-13 19:12:00 PM INFO roles master updated to clientsecondary
2018-08-13 19:12:00 PM INFO roles slave updated to clientprimary
2018-08-13 19:12:00 PM INFO Updating Hosts Files
-
2018-08-13 19:12:01 PM INFO Script completed Ok.
2018-08-13 19:12:01 PM INFO Checking slaves for errors.
2018-08-13 19:12:01 PM INFO Failover complete.

```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

mysqlfailover daemon not doing anything during forced poweroff of master server #38

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

mysqlfailover daemon not doing anything during forced poweroff of master server #38

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions